Hacking inside BigDFT
BigDFT code is based on wavelets for the internal wavefunction storage and for several classical operations of standard DFT codes. The purpose of this tutorial is to show the numerical objects that describe the wavefunctions and how to use them in different cases.
BigDFT provides two independent libraries: libbigdft.a and libpoissonsolver.a. The public functions provided by these libraries are described in the Fortran modules BigDFT_API and Poisson_Solver respectively. Every program that would like to use BigDFT should thus look like this (let's call it toy_model.f90):
program wvl use Poisson_Solver use BigDFT_API implicit none call print_logo() end program wvl
To compile our little program, we must give the path to the previously mentioned module files. To link it, we should also provide the two libraries. In addition, BigDFT itself depends on several external libraries (that may be optional):
- libXC, for the calculation of the exchange-correlation energy using your preferred approximation;
- ABINIT, for several DFT low level routines (symmetry, k-points, mixing);
- Lapack and Blas, for linear algebra (or optimised versions like MKL);
- libarchive, for on the fly compression and decompression of atomic position files (optional);
- ETSF_IO, for NetCDF input/output support following the ETSF file format specification.
We provide a Makefile to compile the exercices of this tutorial. This Makefile should be adapted to your own installation of BigDFT.
Initialising input variables and atomic informations
Input variables, input atomic coordinates and pseudo-potential information are stored inside user specified input files, as explained in other tutorials. BigDFT_API provides two simple routines to load all of these in memory:
type(input_variables) :: inputs type(atoms_data) :: atoms real(gp), dimension(:,:), pointer :: rxyz ! Setup names for input and output files. call standard_inputfile_names(inputs, "toy") ! Read all input stuff, variables and atomic coordinates and pseudo. call read_input_variables(iproc, "posinp", inputs, atoms, rxyz) [...] deallocate(rxyz) call deallocate_atoms(atoms, "main") call free_input_variables(inputs)
The first routine is used to setup all filenames based on a command line prefix (here "toy"). For more information on this prefix, see Run a calculation#Running a job with a naming scheme manual page. The second reads all provided input files and stores the input variables into the inputs data structure, the atomic informations into the atoms data structure and the positions into the array rxyz. The corresponding deallocating routines are also mentioned.
One can notice the iproc argument. Indeed, BigDFT is made for parallel runs and most of the routines will take their proc ID for itH1. In BigDFT, the input files are read only by the master proc and are then broadcasted to all procs. We must thus initialise MPI before anything else:
integer :: ierr, iproc, nproc ! Start MPI in parallel version call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, iproc, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr) [...] !wait all processes before finalisation call MPI_BARRIER(MPI_COMM_WORLD, ierr) call MPI_FINALIZE(ierr)
Exercise: Compile our toy_model program to initialise input variables and make the master proc print inputs%crmult, inputs%frmult, inputs%hx, inputs%hy and inputs%hz values.
We can see that BigDFT can run without input files, but requires a position file. We can provide this position file for the N2 molecule. By default the values are 0.45 for hgrid and 5 and 8 respectively for crmult and frmult.
The wavefunction storage in memory
The storage of the wavefunction is explained in the wavelet formalism and in the main JCP paper. To summarise, the wavefunctions are expanded on a basis set made of Daubechies wavelets centered at different positions in space, i.e. we store only in memory the coefficients of these wavelets. The number of grid points around the atoms is governed by three input parameters: hgrid, crmult and frmult. The hgrid parameter specifies the spacing between each consecutive grid point, while crmult (frmult) specifies the extent of the coarse (fine) regular grid in real space. The grid thus created for CH4 is represented on the figure to the left.
The description of this grid is stored in a structure called wavefunction_descriptors (wfd) in the code.
A wavefunction descriptor is valid only if we provide information on the full grid, i.e. the one extending outside crmult (frmult) to fill the whole rectangular space. This information is stored inside a localisation region structure in the code (Glr). The wavefunction descriptor is a member of the localisation region (Glr%wfd). The localisation region is additionaly defined by its boundary conditions (Glr%geocode) and its number of grid points in the x, y and z directions (Glr%n1, Glr%n2 and Glr%n3 respectively for the coarse discretisation and Glr%n1i, Glr%n2i and Glr%n3i respectively for the fine discretisation).
Let's create the global localisation region Glr for our toy model:
type(locreg_descriptors) :: Glr type(orbitals_data) :: orbs integer :: nelec real(gp), dimension(3) :: shift real(gp), allocatable :: radii_cf(:,:) ! Setting up the size of the calculation (description of the box and ! calculation area). allocate(radii_cf(atoms%ntypes,3)) call system_properties(iproc,nproc,inputs,atoms,orbs,radii_cf,nelec) call system_size(iproc,atoms,rxyz,radii_cf,inputs%crmult,inputs%frmult, & & inputs%hx,inputs%hy,inputs%hz,Glr,shift) [...] call deallocate_bounds(Glr%geocode,Glr%hybrid_on,Glr%bounds,"main") call deallocate_orbs(orbs,"main")
The radii_cf variable stores the physical distances used by crmult and frmult to define the expansion of the wavelet grid. It is read from the pseudo-potential, or given usual values based on atomic properties. Up to now, Glr contains the description of the whole regular mesh. Let's create the wavefunction descriptors containing information only on the selected points :
type(communications_arrays) :: comms ! Setting up the wavefunction representations (descriptors for the ! compressed form...). call createWavefunctionsDescriptors(iproc,inputs%hx,inputs%hy,inputs%hz, & & atoms,rxyz,radii_cf,inputs%crmult,inputs%frmult,Glr) call orbitals_communicators(iproc,nproc,Glr,orbs,comms) [...] call deallocate_wfd(Glr%wfd,"main")
The additional objects orbs present here stores the repartition of bands among k-points, processors, spin... The comms object is used in parallel for band comminucations (we will use it later).
Exercise: Generate Glr%wfd and orbs for our toy model and print the number of coefficients that each processor should store for its wavefunctions (namely orbs%npsidim). Change hgrid, crmult, frmult and the number of processors to see how it evolves.
Even with more processors than orbitals (namely 6 processors in our example), the processors will allocate some memory space to store the coefficients. This is due to different representations, by orbitals or by components, as we will se later.
Exercise: Run BigDFT on our N2 example and generate the converged wavefunctions by putting output_wf to 1 in input.dft.
Let's load the generated wavefunctions of the above exercice into memory. Before doing this, we need to allocate the array that will store the coefficients. We call it psi. It stores the coefficients of all wavefunctions distributed on the current processor. As seen in the exercice, the size is defined by orbs%npsidim. The allocated space is enough to store orbs%norbp orbitals.
real(wp), dimension(:), pointer :: psi ! Read wavefunctions from disk and store them in psi. allocate(orbs%eval(orbs%norb*orbs%nkpts)) call to_zero(orbs%norb*orbs%nkpts,orbs%eval(1)) allocate(psi(orbs%npsidim)) allocate(rxyz_old(3, atoms%nat)) call readmywaves(iproc,"data/wavefunction", orbs,Glr%d%n1,Glr%d%n2,Glr%d%n3, & & inputs%hx,inputs%hy,inputs%hz,atoms,rxyz_old,rxyz,Glr%wfd,psi) call mpiallred(orbs%eval(1),orbs%norb*orbs%nkpts,MPI_SUM,MPI_COMM_WORLD,ierr) [...] deallocate(rxyz_old) deallocate(psi)
The code is able to reformat the wavefunctions automatically in case the system between the stored wavefunctions and the system described by the input variables would have changed (that's why we have access to the stored atomic positions).
Exercise: Run our toy model to read the stored wavefunctions. Change hgrid or crmult to see that the wavefunctions can be automaticaly upgraded.
The compressed form distributed over orbitals
The basic distribution of all orbitals over processors is by orbitals. One can see with our toy model that our system of 5 orbitals is divided into (3, 2) in case of two processors. In that distribution, the description of the coefficients is stored in the wavefunction descriptor Glr%wfd. The number of coefficients is the sum of the number of coefficients on the coarse grid (Glr%wfd%nvctr_c) and the number of coefficients on the fine grid (7 * Glr%wfd%nvctr_f). This storage is called a compressed form since only non null coefficient are stored. These coefficients don't represent the value of the function on given grid points.
The coefficients are stored in psi in a contiguous way, orbital after orbital (in case of spin or k-points, it's the same, down after up, k point after k point). So, the coefficients of the ith stored orbital for a given processor, is in:
psi((i - 1) * (Glr%wfd%nvctr_c + 7 * Glr%wfd%nvctr_f) + 1: i * (Glr%wfd%nvctr_c + 7 * Glr%wfd%nvctr_f))
The number of orbitals in a given processor is stored in orbs%norbp and these orbitals correspond to the orbital ranging from orbs%isorb + 1 to orbs%isorb + orbs%norbp.
Exercise: Check that the norm of all loaded wavefunctions in our toy model is 1. One can use the routine provided by BigDFT wnrm_wrap(spinor, nvctr_c, nvctr_f, psi, nrm).
The norm is equal to 1 for all wavefunctions, this is guaranteed by the readmywaves() function previously called, because it normalises the input wavefunctions.
! We can do some arithmetic on wavefunctions under compressed form. do i = 1, orbs%norbp, 1 ! Norm calculation. call wnrm_wrap(1, Glr%wfd%nvctr_c, Glr%wfd%nvctr_f, & & psi((i - 1) * (Glr%wfd%nvctr_c + 7 * Glr%wfd%nvctr_f) + 1), nrm) write(*,*) "Proc", iproc, " orbital", orbs%isorb + i, " is of norm ", nrm end do
This distribution is convenient when one wants to apply operators that work on orbitals one by one, like the local potential operator.
The compressed form distributed over components
When some operators must by applied to several orbitals at once, it is more convenient to use the component distribution. In this distribution, each processor stores a given subset of coefficients for ALL orbitals. The scalar products is an obvious instance in which this distribution is more convenient. Due to the orthogonality property of the basis set, scalar products of functions expressed in a Daubechie basis are scalar products of the coefficients only. We may thus fragment the whole calculation of the scalar product of two orbitals into the scalar products of small chuncks of these orbitals.
To switch from an orbital distribution to a component distribution, one should use a transposition which is use global communication among the processors:
real(wp), dimension(:), pointer :: w allocate(w(orbs%npsidim)) ! Transpose the psi wavefunction call transpose_v(iproc,nproc,orbs,Glr%wfd,comms,psi, work=w) ! Retranspose the psi wavefunction call untranspose_v(iproc,nproc,orbs,Glr%wfd,comms,psi, work=w) deallocate(w)
In the component distribution, psi is still the array containing the coefficients. It stores comms%nvctr_par(iproc, 0) * orbs%norb coefficients in all. So for each orbital, it stores comms%nvctr_par(iproc, 0) coefficients.
Exercise: After adding the transposition routines in our toy model, compute the overlap matrix ψi.ψj.
Thanks to the orthogonality property of Daubechies wavelets, one can simply use dot products on the subsets of coefficient on each processor and then use the all reduce routine from MPI to sum the results of each processors.
real(wp), dimension(:,:), pointer :: ovrlp allocate(ovrlp(orbs%norb, orbs%norb)) do j = 1, orbs%norb, 1 do i = 1, orbs%norb, 1 ovrlp(i, j) = dot_double(comms%nvctr_par(iproc, 0), & & psi((i - 1) * comms%nvctr_par(iproc, 0) + 1), 1, & & psi((j - 1) * comms%nvctr_par(iproc, 0) + 1), 1) end do end do ! This double loop can be expressed with BLAS DSYRK function. ! call syrk('L','T',orbs%norb,comms%nvctr_par(iproc, 0),1.0_wp,psi(1), & ! & max(1,comms%nvctr_par(iproc, 0)),0.0_wp,ovrlp(1,1),orbs%norb) call mpiallred(ovrlp(1,1),orbs%norb * orbs%norb,MPI_SUM,MPI_COMM_WORLD,ierr) if (iproc == 0) then write(*,*) "The overlap matrix is:" do j = 1, orbs%norb, 1 write(*, "(A)", advance = "NO") "(" do i = 1, orbs%norb, 1 write(*,"(G18.8)", advance = "NO") ovrlp(i, j) end do write(*, "(A)") ")" end do end if deallocate(ovrlp)
In fact, BigDFT proposes an all in one routine to do just this (also taking care of the case of spin polarised calculations or presence of k-points): getOverlap().
The direct-value uncompressed form
Up to now, we have delt with the compressed form of the wavefunctions. We have seen that it is possible to do some operations on them in this form (scalar product, linear combination, norm...). But, sometimes, it is also required to access the value of the wavefunction on a given point of the simulation box (to apply local operators for instance). This is done by a fast wavelet transform followed by a so-called magic filter to change from the Daubechie to the interpolating scaling function family. It is done in a all in one routine called daub_to_isf :
type(workarr_sumrho) :: wisf real(wp), dimension(:), pointer :: psir allocate(psir(Glr%d%n1i * Glr%d%n2i * Glr%d%n3i)) call razero(Glr%d%n1i * Glr%d%n2i * Glr%d%n3i,psir) call initialize_work_arrays_sumrho(Glr,wisf) call daub_to_isf(Glr,wisf, & & psi((i - 1) * (Glr%wfd%nvctr_c + 7 * Glr%wfd%nvctr_f) + 1),psir) [...] call deallocate_work_arrays_sumrho(wisf)
In the above code, the ith orbital of the current processor is represented by its values at mesh points and is stored in psir. Notice that this routine expresses the wavefunction as represented on a mesh doubled in each direction as denoted by Glr%d%ni. Indeed, in BigDFT, all scalar fields (potential, rho, ...) are represented on a mesh that coincides with the fine grid representation for wavefunctions.
Exercise: Allocate and compute the density rho by summing the square of the values of each wavefunction on a given processor and then all reducing the result.
Here we allocate rho as an array on the whole fine mesh for simplicity reasons. In BigDFT, this array is also distributed among processors.
real(dp), dimension(:), pointer :: rhor allocate(rhor(Glr%d%n1i * Glr%d%n2i * Glr%d%n3i)) call razero(Glr%d%n1i*Glr%d%n2i*Glr%d%n3i,rhor) call initialize_work_arrays_sumrho(Glr,wisf) do i = 1, orbs%norbp, 1 ! Calculate values of psi_i on each grid points. call daub_to_isf(Glr,wisf, & & psi((i - 1) * (Glr%wfd%nvctr_c + 7 * Glr%wfd%nvctr_f) + 1),psir) ! Compute partial densities coming from occup * psi_i * psi_i. do j = 1, Glr%d%n1i * Glr%d%n2i * Glr%d%n3i, 1 rhor(j) = rhor(j) + orbs%occup(orbs%isorb + i) * psir(j) * psir(j) end do end do call mpiallred(rhor(1),Glr%d%n1i * Glr%d%n2i * Glr%d%n3i,MPI_SUM,MPI_COMM_WORLD,ierr) if (iproc == 0) write(*,*) "System has", sum(rhor), "electrons." deallocate(rhor)
In the provided solution file <a href="dev/toy_model.f90" target="_blank">toy_model.f90</a>, we give details on how it is done in BigDFT thanks to the routine sumrho(). It is a bit more tricky than the simple exercice above because of the distribution of rho (the arrays nscatterarr and ngatherarr are used to describe this distribution), the possibility of spin polarised calculations and the possibility of symmetrisation (see the irrzon and phnons arrays which should sound familiar to ABINIT developers).
As stated before, the direct-value representation can be used to express local operators. Let's take the example of the local part arising from pseudo-potentials. One can use the BigDFT routine createIonicPotential() to calculate this potential (remember that this potential is distributed among processors):
real(gp) :: psoffset real(dp), dimension(:), pointer :: pkernel real(dp), dimension(:), pointer :: pot_ion, potential type(rho_descriptors) :: rhodsc integer, dimension(:,:), allocatable :: nscatterarr,ngatherarr call createKernel(iproc,nproc,atoms%geocode,Glr%d%n1i,Glr%d%n2i,Glr%d%n3i, & & inputs%hx / 2._gp,inputs%hy / 2._gp,inputs%hz / 2._gp,16,pkernel,.false.) allocate(nscatterarr(0:nproc-1,4)) allocate(ngatherarr(0:nproc-1,2)) call createDensPotDescriptors(iproc,nproc,atoms,Glr%d, & & inputs%hx / 2._gp,inputs%hy / 2._gp,inputs%hz / 2._gp, & & rxyz,inputs%crmult,inputs%frmult,radii_cf,inputs%nspin,'D',inputs%ixc, & & inputs%rho_commun,n3d,n3p,n3pi,i3xcsh,i3s,nscatterarr,ngatherarr,rhodsc) allocate(pot_ion(Glr%d%n1i * Glr%d%n2i * n3p)) call createIonicPotential(atoms%geocode,iproc,nproc,atoms,rxyz,& & inputs%hx / 2._gp,inputs%hy / 2._gp,inputs%hz / 2._gp, & & inputs%elecfield,Glr%d%n1,Glr%d%n2,Glr%d%n3, & & n3pi,i3s+i3xcsh,Glr%d%n1i,Glr%d%n2i,Glr%d%n3i, & & pkernel,pot_ion,psoffset,0,.false.) !allocate the potential in the full box call full_local_potential(iproc,nproc,Glr%d%n1i*Glr%d%n2i*n3p, & & Glr%d%n1i*Glr%d%n2i*Glr%d%n3i,inputs%nspin, & & Glr%d%n1i*Glr%d%n2i*n3d,0, & & orbs%norb,orbs%norbp,ngatherarr,pot_ion,potential) [...] call free_full_potential(nproc,potential,"main") call deallocate_rho_descriptors(rhodsc,"main")
The local potential from pseudo is calculated by using the Poisson solver on fictius charges that would create the radial potential Vloc(r) as described in a pseudo. So, one has to create the kernel array pkernel used by the Poisson solver (see createKernel() call). Since the potential is distributed among processors, one need to calculate the descriptors for this distribution (namely the size of each slab among z for each processor, as stored in n3p). This is done by createDensPotDescriptors(). Finally, for the convenience of this tutorial, we would prefer to have a global array for the potential so we call full_local_potential() to obtain the desired potential on all the fine mesh stored in potential.
Exercise: Calculate the energy coming from the local part of the pseudo (namely the integral on volume of products <ψ|V|ψ> (check that the obtained value does not depend on the number of processors!).
We use the direct-value representation for each orbital, do the product point by point and all reduce among processors.
real(dp) :: epot_sum epot_sum = 0._dp do i = 1, orbs%norbp, 1 call daub_to_isf(Glr,wisf, & & psi((i - 1) * (Glr%wfd%nvctr_c + 7 * Glr%wfd%nvctr_f) + 1),psir) do j = 1, Glr%d%n1i * Glr%d%n2i * Glr%d%n3i, 1 epot_sum = epot_sum + psir(j) * potential(j) * psir(j) end do end do epot_sum = epot_sum * inputs%hx / 2._gp * inputs%hy / 2._gp * inputs%hz / 2._gp call mpiallred(epot_sum,1,MPI_SUM,MPI_COMM_WORLD,ierr) if (iproc == 0) write(*,*) "System pseudo energy is", epot_sum, "Ht."
The value for our toy model should be -0.4042004 Ht.