Home

PWscf User`s Guide (v.5.2.1)

1. e a failure of the algorithm performing subspace diagonalization The LAPACK algorithms used by cdiaghg for generic k points or rdiaghg for only case are very robust and extensively tested Still it may seldom happen that such algorithms fail Try to use conjugate gradient diagonalization diagonalization cg a slower but very robust algorithm and see what happens 19 e buggy libraries Machine optimized mathematical libraries are very fast but sometimes not so robust from a numerical point of view Suspicious behavior you get an error that is not reproducible on other architectures or that disappears if the calculation is repeated with even minimal changes in parameters Known cases HP Compaq alphas with cxml libraries Mac OS X with system BLAS LAPACK Try to use compiled BLAS and LAPACK or better ATLAS instead of machine optimized libraries pw x crashes with no error message at all This happens quite often in parallel execu tion or under a batch queue or if you are writing the output to a file When the program crashes part of the output including the error message may be lost or hidden into error files where nobody looks into It is the fault of the operating system not of the code Try to run interactively and to write to the screen If this doesn t help move to next point pw x crashes with segmentation fault or similarly obscure messages Possible rea sons e too much RAM memory or stack requested see
2. Q ESPRESSO PWscf User s Guide v 5 2 1 Contents 1 Bede a tear ae atu a koe gy ee ee ee 1 buen eee eee el ee ee ee ee ee eee S 2 et te e ee ee be goes Gee tock Potts Gees aaa ee aa 3 3 3 Using PWscf 5 A EII 5 32 Data flesh eii m a aa ooh ae da heehee aw 6 3 3 Electronic structure calculations occiso nera ra ee RS 6 eA fe Mino a a A a a 8 3 5 Direct interface with CASINO 2 ee 9 4 Performances 11 4 1 Execution time 2 11 Se pork We ok een Woe a de AE as egos ge es eg a es eae 12 fd ee a ele a ee eee da 12 4 4 Parallelization issues o o oo a a a 12 4 5 Understanding the time report oa aa a a a a 14 4 5 1 Serial execution ooo a 14 A Bye ere EE ee ee ee ee 14 A e s e s k ee eee Bee Pee eee ee ee eee we 16 4 6 1 Signal trapping experimental o o 17 5 Troubleshooting 17 5 1 Compilation problems with PLUMED 24 1 Introduction This guide covers the usage of the PWscf Plane Wave Self Consistent Field package a core component of the QUANTUM ESPRESSO distribution Further documentation beyond what is provided in this guide can be found in the directory PW Doc containing a copy of this guide 1 This guide assumes that you know the physics that PWscf describes and the methods it implements It also assumes that you have already installed or know how to install QUANTUM ESPRESSO If not please read the general User s Guide for QUANTUM ESPRESSO
3. found in directory Doc two levels above the one containing this guide or consult the web site http www quantum espresso org People who want to modify or contribute to PWscf should read the Developer Manual Doc developer man pdf 1 1 What can PWscf do PWscf performs many different kinds of self consistent calculations of electronic structure prop erties within Density Functional Theory DFT using a Plane Wave PW basis set and pseu dopotentials PP In particular e ground state energy and one electron Kohn Sham orbitals atomic forces stresses e structural optimization also with variable cell e molecular dynamics on the Born Oppenheimer surface also with variable cell e macroscopic polarization and finite electric fields via the modern theory of polarization Berry Phases e modern theory of orbital magnetization e free energy surface calculation at fixed cell through meta dynamics if patched with PLUMED All of the above works for both insulators and metals in any crystal structure for many exchange correlation XC functionals including spin polarization DFT U meta GGA non local and hybrid functionals for norm conserving Hamann Schluter Chiang PPs NCPPs in separable form or Ultrasoft Vanderbilt PPs USPPs or Projector Augmented Waves PAW method Noncollinear magnetism and spin orbit interactions are also implemented An im plementation of finite electric fields with a sawtooth potential in a
4. then do a non SCF calculation with the desired k point grid and number nbnd of bands Use calculation bands if you are interested in calculating only the Kohn Sham states for the given set of k points e g along symmetry lines see for instance http www cryst ehu es cryst get_kvec htm1 Specify instead calculation nscf if you are interested in further processing of the results of non SCF calculations for instance in DOS calculations In the latter case you should specify a uniform grid of points For DOS calculations you should choose occupations tetrahedra together with an automatically generated uniform k point grid card K_POINTS with option automatic Specify nosym true to avoid generation of additional k points in low symme try cases Variables prefix and outdir which determine the names of input or output files should be the same in the two runs See Examples 01 06 07 NOTA BENE Since v 4 1 both atomic positions and the scf potential are read from the data file so that consistency is guaranteed Noncollinear magnetization spin orbit interactions The following input variables are relevant for noncollinear and spin orbit calculations noncolin lspinorb starting magnetization one for each type of atoms To make a spin orbit calculation noncolin must be true If starting magnetization is set to zero or not given the code makes a spin orbit calculation without spin magnetization it assumes that t
5. Gironcoli R Gebauer U Gerstmann C Gougoussis A Kokalj M Lazzeri L Martin Samos N Marzari F Mauri R Mazzarello S Paolini A Pasquarello L Paulatto C Sbraccia S Scandolo G Sclauzero A P Seitsonen A Smo gunov P Umari R M Wentzcovitch J Phys Condens Matter 21 395502 2009 http arxiv org abs 0906 2569 References for all exchange correlation functionals can be found in the header of file Modules funct f90 Note the form QUANTUM ESPRESSO for textual citations of the code Pseudopotentials should be cited as for instance We used the pseudopotentials C pbe rrjkus UPF and O pbe vbc UPF from http www quantum espresso org 2 Compilation PWscf is included in the core QUANTUM ESPRESSO distribution Instruction on how to in stall it can be found in the general documentation User s Guide for QUANTUM ESPRESSO Typing make pw from the main QUANTUM ESPRESSO directory or make from the PW subdirectory produces the pw x executable in PW src and a link to the bin directory In addition several utility programs and related links in bin are produced in PW tools e PW tools dist x reads input data for PWscf calculates distances and angles between atoms in a cell taking into account periodicity e PW tools ev x fits energy vs volume data to an equation of state e PW tools kpoints x produces lists of k points e PW tools pwi2xsf sh pwo2xsf sh process respectively input and output files not data f
6. b1 depending on user requests see below If the files are produced from an MD run the files have a suffix 0001 0002 0003 etc corresponding to the sequence of timesteps CASINO support is implemented by three routines in the PW directory of the espresso distri bution e pw2casino f90 the main routine e pw2casino write f90 writes the CASINO xwfn data file in various formats e pw2blip f90 does the plane wave to blip conversion if requested Relevant behavior of PWscf may be modified through an optional auxiliary input file named pw2casino dat see below Note that in versions prior to 4 3 this functionality was provided through separate post processing utilities available in the PP directory these are no longer supported For QMC MD runs PWSCF etc previously needed to be patched using the patch script PP pw2casino MDloop sh this is no longer necessary How to generate xwfn data files with PWscf Use the pw2casino option when invoking pw x e g pw x pw2casino lt input_file gt output_file The xfwn data file will then be generated automatically PWscf is capable of doing the plane wave to blip conversion directly the blip utility provided in the CASINO distribution is not required and so by default PWscf produces the binary blip wave function file bwfn data b1 Various options may be modified by providing a file pw2casino dat in outdir with the following format 10 amp inputpp blip
7. e maybe your calculation will take more time than you expect pw x yields weird results If results are really weird as opposed to misinterpreted e if this happens after a change in the code or in compilation or preprocessing options try make clean recompile The make command should take care of all dependencies but do not rely too heavily on it If the problem persists recompile with reduced optimization level e maybe your input data are weird FFT grid is machine dependent Yes they are The code automatically chooses the small est grid that is compatible with the specified cutoff in the specified cell and is an allowed value for the FFT library used Most FFT libraries are implemented or perform well only with dimensions that factors into products of small numbers 2 3 5 typically sometimes 7 and 11 Different FFT libraries follow different rules and thus different dimensions can result for the same system on different machines or even on the same machine with a different FFT See function allowed in Modules fft_scalar f90 As a consequence the energy may be slightly different on different machines The only piece that explicitly depends on the grid parameters is the XC part of the energy that is computed numerically on the grid The differences should be small though especially for LDA calculations Manually setting the FFT grids to a desired value is possible but slightly tricky using input variables nr1 nr2 nr3 and
8. next item e if you are using highly optimized mathematical libraries verify that they are designed for your hardware e If you are using aggressive optimization in compilation verify that you are using the appropriate options for your machine e The executable was not properly compiled or was compiled on a different and incompat ible environment e buggy compiler or libraries this is the default explanation if you have problems with the provided tests and examples pw x works for simple systems but not for large systems or whenever more RAM is needed Possible solutions e Increase the amount of RAM you are authorized to use which may be much smaller than the available RAM Ask your system administrator if you don t know what to do In some cases the stack size can be a source of problems if so increase it with command limits or ulimit e Reduce nbnd to the strict minimum for insulators the default is already the minimum though e Reduce the work space for Davidson diagonalization to the minimum by setting diago david _ndim also consider using conjugate gradient diagonalization diagonalization cg slow but very robust which requires almost no work space e If the charge density takes a significant amount of RAM reduce mixing _ndim from its default value 8 to 4 or so e In parallel execution use more processors or use the same number of processors with less pools Remember that parallelization with re
9. supercell is also available Please note that NEB calculations are no longer performed by pw x but are instead carried out by neb x see main user guide a dedicated code for path optimization which can use PWscf as computational engine 1 2 People The PWscf package which in earlier releases included PHonon and PostProc was originally developed by Stefano Baroni Stefano de Gironcoli Andrea Dal Corso SISSA Paolo Giannozzi Univ Udine and many others We quote in particular e David Vanderbilt s group at Rutgers for Berry s phase calculations e Paolo Umari Univ Padua for finite electric fields e Ralph Gebauer ICTP Trieste and Adriano Mosca Conte SISSA Trieste for non collinear magnetism Andrea Dal Corso for spin orbit interactions Carlo Sbraccia Princeton for improvements to structural optimization and to many other parts Dario Alf University College London for implementation of Born Oppenheimer molec ular dynamics Renata Wentzcovitch and collaborators Univ Minnesota for variable cell molecular dynamics Lorenzo Paulatto Univ Paris VI for PAW implementation built upon previous work by Guido Fratesi Univ Milano Bicocca and Riccardo Mazzarello ETHZ USI Lugano Matteo Cococcioni Univ Minnesota for DFT U implementation Timo Thonhauser WFU for vaW DF svdW DF and variants and the more recent contributors Jong Won Song RIKEN for Gau PBE functional Alberto Otero de la Roz
10. _convert true blip_binary true blip_single_prec false blip_multiplicity 1 d0 n_points_for_test 0 Some or all of the 5 keywords may be provided in any order The default values are as given above and these are used if the pw2casino dat file is not present The meanings of the keywords are as follows blip_convert reexpand the converged plane wave orbitals in localized blip functions prior to writing the CASINO wave function file This is almost always done since wave functions expanded in blips are considerably more efficient in quantum Monte Carlo calculations If blip convert false a pwfn data file is produced orbitals expanded in plane waves if blip convert true either a bwfn data file or a bwfn data b1 file is produced depending on the value of blip_binary see below blip_binary if true and if blip_convert is also true write the blip wave function as an un formatted binary bwfn data b1 file This is much smaller than the formatted bwfn data file but is not generally portable across all machines blip_single_prec if false the orbital coefficients in bwfn data b1 are written out in double precision if the user runs into hardware limits blip_single_prec can be set to true in which case the coefficients are written in single precision reducing the memory and disk requirements at the cost of a small amount of accuracy blip_multiplicity the quality of the blip expansion i e the fineness of the blip
11. a Merced Univ for XDM exchange hole dipole moment model of dispersions PW86 unrevised and B86B functionals Hannu Pekka Komsa CSEA Lausanne for the HSE functional Gabriele Sclauzero IRRMA Lausanne for DFT U with on site occupations obtained from pseudopotential projectors Alexander Smogunov CEA for DFT U with noncollinear magnetization Burak Himmetoglou UCSB for DFT U J Xiaochuan Ge SISSA for Smart MonteCarlo Langevin dynamics Andrei Malashevich Univ Berkeley for calculation of orbital magnetization Minoru Otani AIST Yoshio Miura Tohoku U Nicephore Bonet MIT Nicola Marzari Univ Oxford Brandon Wood LLNL Tadashi Ogitsu LLNL for ESM Effective Screening Method PRB 73 115407 2006 Dario Alfe Mike Towler University College London Norbert Nemec U Cambridge for the interface with CASINO This guide was mostly written by Paolo Giannozzi Mike Towler wrote the PWscf to CASINO subsection 1 3 Terms of use QUANTUM ESPRESSO is free software released under the GNU General Public License See http www gnu org licenses old licenses gp1 2 0 txt or the file License in the distribution We shall greatly appreciate if scientific work done using this code will contain an explicit acknowledgment and the following reference P Giannozzi S Baroni N Bonini M Calandra R Car C Cavazzoni D Ceresoli G L Chiarotti M Cococcioni I Dabo A Dal Corso S Fabris G Fratesi S de
12. bands is used by routines cegterg k points or regterg Gamma point only performing iterative diagonalization of the Kohn Sham Hamiltonian in the PW basis set e Most of the time spent in egterg is used by routine h_psi calculating Hy products cdiaghg k points or rdiaghg Gamma only performing subspace diagonalization should take only a small fraction e Among the general routines most of the time is spent in FFT on Kohn Sham states fFtw and to a smaller extent in other FFTs fft and ffts and in calbec calculating 1 8 products e Forces and stresses typically take a fraction of the order of 10 to 20 of the total time For PAW and Ultrasoft PP you will see a larger contribution by sum band and a nonnegligible newd contribution to the time spent in electrons but the overall picture is unchanged You may drastically reduce the overhead of Ultrasoft PPs by using input option tqr true 4 5 2 Parallel execution The various parallelization levels should be used wisely in order to achieve good results Let us summarize the effects of them on CPU e Parallelization on FFT speeds up with varying efficiency almost all routines with the notable exception of cdiaghg and rdiaghg 15 Parallelization on k points speeds up almost linearly c_bands and called routines speeds up partially sum_band does not spe
13. ble to cell degrees of freedom in non damped Variable Cell MD Test calculations are advisable before extensive calculation I have tested the damping algorithm that I have developed and it has worked well so far It allows for a much longer time step dt 100 150 than the RMW one and is much more stable with very small cell masses which is useful when the cell shape not the internal degrees of freedom is far out of equilibrium It also converges in a smaller number of steps than RMW Info from Cesar Da Silva the new damping algorithm is the default since v 3 1 3 5 Direct interface with CASINO PWscf now supports the Cambridge quantum Monte Carlo program CASINO directly For more information on the CASINO code see http www tcm phy cam ac uk mdt26 casino html CASINO may take the output of PWSCF and improve it giving considerably more accurate total energies and other quantities than DFT is capable of PWscf users wishing to learn how to use CASINO may like to attend one of the annual CASINO summer schools in Mike Towler s Apuan Alps Centre for Physics in Tuscany Italy More information can be found at http www vallico net tti tti html Practicalities The interface between PWscf and CASINO is provided through a file with a standard format containing geometry basis set and orbital coefficients which PWscf will pro duce on demand For SCF calculations the name of this file may be pwfn data bwfn data or bwfn data
14. communication hardware at least Gigabit ethernet in order 14 to have acceptable performances with PW parallelization Do not expect good scaling with cheap hardware PW calculations are by no means an embarrassing parallel problem Also note that multiprocessor motherboards for Intel Pentium CPUs typically have just one memory bus for all processors This dramatically slows down any code doing massive access to memory as most codes in the QUANTUM ESPRESSO distribution do that runs on processors of the same motherboard 4 5 Understanding the time report The time report printed at the end of a pw x run contains a lot of useful information that can be used to understand bottlenecks and improve performances 4 5 1 Serial execution The following applies to calculations taking a sizable amount of time at least minutes for short calculations seconds the time spent in the various initializations dominates Any discrepancy with the following picture signals some anomaly e For a typical job with norm conserving PPs the total wall time is mostly spent in routine electrons calculating the self consistent solution e Most of the time spent in electrons is used by routine c_bands calculating Kohn Sham states sum_band calculating the charge density v_of_rho calculating the potential mix_rho charge density mixing should take a small fraction of the time e Most of the time spent in c_
15. e smearing type smearing and the smearing width degauss Spin polarized systems are as a rule treated as metallic system unless the total magnetization tot_magnetization is set to a fixed value or if occupation numbers are fixed occupations from input and card OCCUPATIONS Explanations for the meaning of variables ibrav and celldm as well as on alternative ways to input structural data are in files PW Doc INPUT_PW txt and PW Doc INPUT_PW html These files are the reference for input data and describe a large number of other variables as well Almost all variables have default values which may or may not fit your needs Comment lines in namelists can be introduced by a exactly as in fortran code After the namelists you have several fields cards introduced by keywords with self explanatory names ATOMIC_SPECIES ATOMIC_POSITIONS K_POINTS CELL_PARAMETERS optional OCCUPATIONS optional The keywords may be followed on the same line by an option Unknown fields are ignored See the files mentioned above for details on the available cards Comments lines in cards can be introduced by either a or a character in the first position of a line Note about k points The k point grid can be either automatically generated or manually provided as a list of k points and a weight in the Irreducible Brillouin Zone only of the Bravais lattice of the crystal The code will generate unless instructed no
16. ed Some examples With PBS e send the default signal SIGTERM 120 seconds before the end PBS 1 signal 0120 e send signal SIGUSR1 10 minutes before the end PBS 1 signal SIGUSR1 600 e you cand also send a signal manually with qsig e or send a signal and then stop qdel W 120 jobid will send SIGTERM wait 2 minutes than force stop With LoadLeveler untested the SIGXCPU signal will be sent when wall softlimit is reached it will then stop the job when hardlimit is reached You can specify both limits as wall_clock_limit hardlimit softlimit e g you can give pw x thirty minutes to stop using wall_clock_limit 5 00 4 30 5 Troubleshooting pw x says error while loading shared libraries or cannot open shared object file and does not start Possible reasons e If you are running on the same machines on which the code was compiled this is a library configuration problem The solution is machine dependent On Linux find the path to the missing libraries then either add it to file etc 1d so conf and run ldconfig must be done as root or add it to variable LD_LIBRARY_PATH and export it Another possibility is to load non shared version of libraries ending with a instead of shared ones ending with so e If you are not running on the same machines on which the code was compiled you need either to have the same shared libraries installed on both machines or to load statically all libraries using ap
17. ed up at all v_of rho newd mix rho yl Linear algebra parallelization speeds up not always cdiaghg and rdiaghg task group parallelization speeds up fftw OpenMP parallelization speeds up fftw plus selected parts of the calculation plus depending on the availability of OpenMP aware libraries some linear algebra operations and on RAM e Parallelization on FFT distributes most arrays across processors i e all G space and R spaces arrays but not all of them in particular not subspace Hamiltonian and overlap matrices e Linear algebra parallelization also distributes subspace Hamiltonian and overlap matrices e All other parallelization levels do not distribute any memory In an ideally parallelized run you should observe the following e CPU and wall time do not differ by much e Time usage is still dominated by the same routines as for the serial run e Routine fft_scatter called by parallel FFT takes a sizable part of the time spent in FFTs but does not dominate it Quick estimate of parallelization parameters You need to know e the number of k points Ny e the third dimension of the smooth FFT grid N3 e the number of Kohn Sham states M These data allow to set bounds on parallelization e k point parallelization is limited to Nz processor pools nk Nk e FFT parallelization shouldn t exceed N3 processors i e if you run with nk Nk use N N x N3 MPI processes at
18. erative diagonalization Taag is dida Noth T Torth F Tsub where N number of Hy products needed by iterative diagonalization T time per Hw product Tori CPU time for orthonormalization Tsup CPU time for subspace diagonaliza tion The time T required for a Hw product is Th aMN azMN N2N3log N N2N3 azMPN The first term comes from the kinetic term and is usually much smaller than the others The second and third terms come respectively from local and nonlocal potential a a2 a3 are prefactors i e small numbers O 1 M number of valence bands nbnd N number of PW basis set dimension npw M1 Na N3 dimensions of the FFT grid for wavefunctions nris nr2s nr3s N N2N3 8N P number of pseudopotential projectors summed on all atoms on all values of the angular momentum l and m 1 2 4 1 The time Torin required by orthonormalization is Torth NN b NM 12 and the time Tsup required by subspace diagonalization is Fab bo M where b and bz are prefactors M number of trial wavefunctions this will vary between M and 2 4M depending on the algorithm The time T for the calculation of charge density from wavefunctions is The c MNy1N 2N 3l0g Nr Ny2N 3 FT ca M Nri N 2Nr3 T Tis where Cj C2 C3 are prefactors N 1 N 2 N 3 dimensions of the FFT grid for charge density nri nr2 nr3 N iN 2N 3 8N where Ng number of G vectors for the charge density ngm and Tu
19. files coming from Windows or produced with smart editors Both may cause the code to crash with rather mysterious error messages If none of the above applies and the code stops at the first namelist amp CONTROL and you are running in parallel see the previous item pw x mumbles something like cannot recover or error reading recover file You are trying to restart from a previous job that either produced corrupted files or did not do what you think it did No luck you have to restart from scratch pw x stops with inconsistent DFT error As a rule the flavor of DFT used in the calculation should be the same as the one used in the generation of pseudopotentials which should all be generated using the same flavor of DFT This is actually enforced the type of DFT is read from pseudopotential files and it is checked that the same DFT is read from all PPs If this does not hold the code stops with the above error message Use at your own risk input variable input_dft to force the usage of the DFT you like pw x stops with error in cdiaghg or rdiaghg Possible reasons for such behavior are not always clear but they typically fall into one of the following cases e serious error in data such as bad atomic positions or bad crystal structure supercell e a bad pseudopotential typically with a ghost or a USPP giving non positive charge density leading to a violation of positiveness of the S matrix appearing in the USPP formalism
20. for insulators with a gap In all other cases use smearing tetrahedra for DOS calculations See input reference documentation for more details pw x stops with internal error cannot bracket Ef Possible reasons e serious error in data such as bad number of electrons insufficient number of bands absurd value of broadening 21 e the Fermi energy is found by bisection assuming that the integrated DOS N E is an in creasing function of the energy This is not guaranteed for Methfessel Paxton smearing of order 1 and can give problems when very few k points are used Use some other smearing function simple Gaussian broadening or better Marzari Vanderbilt cold smearing pw x yields internal error cannot bracket Ef message but does not stop This may happen under special circumstances when you are calculating the band structure for selected high symmetry lines The message signals that occupations and Fermi energy are not correct but eigenvalues and eigenvectors are Remove occupations tetrahedra in the input data to get rid of the message pw x runs but nothing happens Possible reasons e in parallel execution the code died on just one processor Unpredictable behavior may follow e in serial execution the code encountered a floating point error and goes on producing NaNs Not a Number forever unless exception handling is on and usually it isn t In both cases look for one of the reasons given above
21. formats This can be done through the casino2upf and upf2casino tools included in the upftools directory see the upftools README file for instructions An alternative converter casinogon is included in the CASINO distribution which produces the deprecated GON format but which can be useful when using non standard grids A Performances 4 1 Execution time The following is a rough estimate of the complexity of a plain scf calculation with pw x for NCPP USPP and PAW give raise additional terms to be calculated that may add from a few percent up to 30 40 to execution time For phonon calculations each of the 3N modes requires a time of the same order of magnitude of self consistent calculation in the same system possibly times a small multiple For cp x each time step takes something in the order of Ty Torin Toup defined below The time required for the self consistent solution at fixed ionic positions Tsef is Deez Niter Titer Tini where Nier number of self consistency iterations niter Tirer time for a single iteration Tinit initialization time usually much smaller than the first term The time required for a single self consistency iteration Titer is Titer NT diag Trho Vaj where N number of k points Tgiag time per Hamiltonian iterative diagonalization Trho time for charge density calculation T time for Hartree and XC potential calculation The time for a Hamiltonian it
22. grid can be improved by increasing the grid multiplicity parameter given by this keyword Increasing the grid multiplicity results in a greater number of blip coefficients and therefore larger memory requirements and file size but the CPU time should be unchanged For very accurate work one may want to experiment with grid multiplicity larger that 1 0 Note however that it might be more efficient to keep the grid multiplicity to 1 0 and increase the plane wave cutoff instead n_points_for_test if this is set to a positive integer greater than zero PWscf will sample the wave function the Laplacian and the gradient at a large number of random points in the simulation cell and compute the overlap of the blip orbitals with the original plane wave orbitals lt BW PW gt a y lt BW BW gt lt PW PW gt The closer a is to 1 the better the blip representation By increasing blip multiplicity or by increasing the plane wave cutoff one ought to be able to make a as close to 1 as desired The number of random points used is given by n_points_for_test Finally note that DFT trial wave functions produced by PWSCF must be generated using the same pseudopotential as in the subsequent QMC calculation This requires the use of tools to switch between the different file formats used by the two codes 11 CASINO uses the CASINO tabulated format PWSCF officially supports the UPFv2 format though it will read other deprecated
23. he tests and examples distributed with QUANTUM ESPRESSO as templates for writing your own input files In the following whenever we mention Example N we refer to those Input files are those in the results subdirectories with names ending with in they will appear after you have run the examples 3 1 Input data Input data is organized as several namelists followed by other fields cards introduced by keywords The namelists are amp CONTROL general variables controlling the run amp SYSTEM structural information on the system under investigation amp ELECTRONS electronic variables self consistency smearing amp IONS optional ionic variables relaxation dynamics amp CELL optional variable cell optimization or dynamics Optional namelist may be omitted if the calculation to be performed does not require them This depends on the value of variable calculation in namelist amp CONTROL Most variables in namelists have default values Only the following variables in amp SYSTEM must always be specified ibrav integer Bravais lattice index celldm real dimension 6 crystallographic constants nat integer number of atoms in the unit cell ntyp integer number of types of atoms in the unit cell ecutwfc real kinetic energy cutoff Ry for wavefunctions For metallic systems you have to specify how metallicity is treated in variable occupations If you choose occupations smearing you have to specify th
24. iles for pw xand produce an XSF formatted file suitable for plotting with XCrySDen http www xcrysden org powerful crystalline and molecular structure visualization program BEWARE the pwi2xsf sh shell script requires the pwi2xsf x executables to be located somewhere in your PATH e PW tools band plot x undocumented and possibly obsolete e PW tools bs awk PW tools mv awk are scripts that process the output of pw x not data files Usage awk f bs awk lt my pw file gt myfile bs awk f mv awk lt my pw file gt myfile mv The files so produced are suitable for use with xbs a very simple X windows utility to display molecules available at http www ccl net cca software X WINDOW xbsa README shtml e PW tools kvecs_FS x PW tools bands_FS x utilities for Fermi Surface plotting using XCrySDen contributed by the late Prof Eyvaz e PW tools cif2qe sh script converting from CIF Crystallographic Information File to a format suitable for QUANTUM ESPRESSO Courtesy of Carlo Nervi Univ Torino Italy Documentation for the auxiliary codes can be found in the codes themselves e g in the header of files 3 Using PWscf Input files for pw x may be either written by hand or produced via the PWgui graphical interface by Anton Kokalj included in the QUANTUM ESPRESSO distribution See PWgui x y z INSTALL where x y z is the version number for more info on PWgui or GUI README if you are using SVN sources You may take t
25. ime reversal symmetry holds and it does not calculate the magnetization The states are still two component spinors but the total magnetization is zero If starting magnetization is different from zero it makes a noncollinear spin polarized calculation with spin orbit interaction The final spin magnetization might be zero or different from zero depending on the system Note that the code will look only for symmetries that leave the starting magnetization unchanged Furthermore to make a spin orbit calculation you must use fully relativistic pseudopoten tials at least for the atoms in which you think that spin orbit interaction is large If all the pseudopotentials are scalar relativistic the calculation becomes equivalent to a noncollinear cal culation without spin orbit Andrea Dal Corso 2007 07 27 See Example 06 for noncollinear magnetism Example 07 and references quoted therein for spin orbit interactions DFT U DFT U formerly known as LDA U calculation can be performed within a sim plified rotationally invariant form of the U Hubbard correction Note that for all atoms having a U value there should be an item in function flib set_hubbard_1 90 and one in subroutine PW src tabd f90 defining respectively the angular momentum and the occupancy of the or bitals with the Hubbard correction If your Hubbard corrected atoms are not there you need to edit these files and to recompile See Example 08 and its README Dispersion Interaction
26. ined symbol init_metadyn ERROR Undefined symbol meta_force_calculation eliminate the _ from the definition of init_metadyn and meta_force_calculation i e change at line 529 void meta_force_calculation_ real cell int istep real xxx real yyy real zzz with void meta_force_calculation real cell int istep real xxx real yyy real zzz and at line 961 void init_metadyn_ int atoms real ddt real mass void init_metadyn_ int atoms real ddt real mass 26
27. ions both optimization and dynamics are performed with plane waves and G vectors calculated for the starting cell This means that if you re run a self consistent calculation for the final cell and atomic positions using the same cutoff ecutwfc and or ecutrho if applicable you may not find exactly the same results unless your final and initial cells are very similar or unless your cutoff s are very high In order to provide a further check a last step is performed in which a scf calculation is performed for the converged structure with plane waves and G vectors calculated for the final cell Small differences between the two last steps are thus to be expected and give an estimate of the reliability of the variable cell optimization If you get a large difference you are likely quite far from convergence in the plane wave basis set and you need to increase the cutoff s Variable cell molecular dynamics A common mistake many new users make is to set the time step dt improperly to the same order of magnitude as for CP algorithm or not setting dt at all This will produce a not evolving dynamics Good values for the original RMW RM Wentzcovitch dynamics are dt 50 70 The choice of the cell mass is a delicate matter An off optimal mass will make convergence slower Too small masses as well as too long time steps can make the algorithm unstable A good cell mass will make the oscillation times for internal degrees of freedom compara
28. irection is not divisible respectively by 2 or by 3 the symmetry operation will not transform the FFT grid into itself Solution you can either force your FFT grid to be commensurate with fractional translation set variables nr1 nr2 nr3 to suitable values or set variable use_all_frac to true in namelist amp SYSTEM Note however that the latter is incompatible with hybrid functionals and with phonon calculations Self consistency is slow or does not converge at all Bad input data will often result in bad scf convergence Please carefully check your structure first e g using XCrySDen Assuming that your input data is sensible 23 1 Verify if your system is metallic or is close to a metallic state especially if you have few k points If the highest occupied and lowest unoccupied state s keep exchanging place during self consistency forget about reaching convergence A typical sign of such behavior is that the self consistency error goes down down down than all of a sudden up again and so on Usually one can solve the problem by adding a few empty bands and a small broadening 2 Reduce mixing_beta to 0 3 0 1 or smaller Try the mixing_mode value that is more appropriate for your problem For slab geometries used in surface problems or for elon gated cells mixing mode local TF should be the better choice dampening charge sloshing You may also try to increase mixing ndim to more than 8 default value Beware
29. job scales with the number of processors and depends upon 13 e the size and type of the system under study e the judicious choice of the various levels of parallelization detailed in Sec e the availability of fast interprocess communications or lack of it Ideally one would like to have linear scaling i e T To N for Np processors where To is the estimated time for serial execution In addition one would like to have linear scaling of the RAM per processor Oy Op N so that large memory systems fit into the RAM of each processor Parallelization on k points e guarantees almost linear scaling if the number of k points is a multiple of the number of pools e requires little communications suitable for ethernet communications e does not reduce the required memory per processor unsuitable for large memory jobs Parallelization on PWs e yields good to very good scaling especially if the number of processors in a pool is a divisor of N3 and N 3 the dimensions along the z axis of the FFT grids nr3 and nr3s which coincide for NCPPs e requires heavy communications suitable for Gigabit ethernet up to 4 8 CPUs at most specialized communication hardware needed for 8 or more processors e yields almost linear reduction of memory per processor with the number of processors in the pool A note on scaling optimal serial performances are achieved when the data are as much as possible kept into the cache As a
30. most mpirun np N e Unless M is a few hundreds or more don t bother using linear algebra parallelization You will need to experiment a bit to find the best compromise In order to have good load balancing among MPI processes the number of k point pools should be an integer divisor of Npk the number of processors for FFT parallelization should be an integer divisor of N3 16 Typical symptoms of bad inadequate parallelization e a large fraction of time is spent in v_of_rho newd mix_rho or the time doesn t scale well or doesn t scale at all by increasing the number of processors for k point parallelization Solution use also FFT parallelization if possible e a disproportionate time is spent in cdiaghg rdiaghg Solutions use also k point parallelization if possible use linear algebra parallelization with scalapack if possible e a disproportionate time is spent in fft_scatter or in fft_scatter the difference between CPU and wall time is large Solutions if you do not have fast better than Gigabit ethernet communication hardware do not try FFT parallelization on more than 4 or 8 procs use also k point parallelization if possible e the time doesn t scale well or doesn t scale at all by increasing the number of processors for FFT parallelization Solutions use task groups try command line option ntg 4 or ntg 8 This may improve your
31. nput variable lorbm true in nscf run If finite electric field is present lelfield true only Kubo terms are computed see New J Phys 12 053032 2010 for details 3 4 Optimization and dynamics Structural optimization For fixed cell optimization specify calculation relax and add namelist amp IONS All options for a single SCF calculation apply plus a few others You may follow a structural optimization with a non SCF band structure calculation since v 4 1 you do not need any longer to update the atomic positions in the input file for non scf calculation See Example 02 Molecular Dynamics Specify calculation md the time step dt and possibly the num ber of MD stops nstep Use variable ion_dynamics in namelist amp IONS for a fine grained control of the kind of dynamics Other options for setting the initial temperature and for thermalization using velocity rescaling are available Remember this is MD on the electronic ground state not Car Parrinello MD See Example 03 Free energy surface calculations Once PWscf is patched with the PLUMED plug in it is possible to use most PLUMED functionalities by running PWscf as pw x plumed plus the other usual PWscf arguments The input file for PLUMED must be found in the specified outdir with fixed name plumed dat Variable cell optimization Since v 4 2 the newer BFGS algorithm covers the case of variable cell optimization as well Note however that variable cell calculat
32. nris nr2s nr3s The code will still increase them if not acceptable Automatic FFT grid dimensions are slightly overestimated so one may try very carefully to reduce them a little bit The code will stop if too small values are required it will waste CPU time and memory for too large values Note that in parallel execution it is very convenient to have FFT grid dimensions along z that are a multiple of the number of processors 22 pw x does not find all the symmetries you expected pw x determines first the symmetry operations rotations of the Bravais lattice then checks which of these are symmetry operations of the system including if needed fractional translations This is done by rotating and translating if needed the atoms in the unit cell and verifying if the rotated unit cell coincides with the original one Assuming that your coordinates are correct please carefully check you may not find all the symmetries you expect because e the number of significant figures in the atomic positions is not large enough In file PW eqvect f90 the variable accep is used to decide whether a rotation is a symmetry operation Its current value 10 is quite strict a rotated atom must coincide with another atom to 5 significant digits You may change the value of accep and recompile e they are not acceptable symmetry operations of the Bravais lattice This is the case for Ceo for instance the I icosahedral group of Ceo contains 5 fold r
33. nslates into a slight but detectable loss of translational invariance the energy changes if all atoms are displaced by the same quantity not commensurate with the FFT grid This sets a limit to the accuracy of forces The situation improves somewhat by increasing the ecutrho cutoff pw x stops during variable cell optimization in checkallsym with non orthogonal operation error Variable cell optimization may occasionally break the starting symmetry of the cell When this happens the run is stopped because the number of k points calculated for the starting configuration may no longer be suitable Possible solutions e start with a nonsymmetric cell e use asymmetry conserving algorithm the Wentzcovitch algorithm cell dynamics damp w should not break the symmetry 5 1 Compilation problems with PLUMED xlc compiler If you get an error message like Operation between types char and int is not allowed change in file clib metadyn h define snew ptr nelem ptr nelem 0 NULL typeof ptr calloc melem sizeof define srenew ptr nelem ptr typeof ptr realloc ptr nelem sizeof ptr 25 with define snew ptr nelem ptr nelem 0 NULL void calloc nelem sizeof ptr define srenew ptr nelem ptr void realloc ptr nelem sizeof ptr Calling C from fortran PLUMED assumes that fortran compilers add a single _ at the end of C routines You may get an error message as ERROR Undef
34. ors including maybe the phase of the moon reported execution times may vary quite a lot for the same job Warning N eigenvectors not converged This is a warning message that can be safely ignored if it is not present in the last steps of self consistency If it is still present in the last steps of self consistency and if the number of unconverged eigenvector is a significant part of the total it may signal serious trouble in self consistency see next point or something badly wrong in input data Warning negative or imaginary charge or core charge or npt with rhoup lt 0 or rho dw lt 0 These are warning messages that can be safely ignored unless the negative or imaginary charge is sizable let us say of the order of 0 1 If it is something seriously wrong is going on Otherwise the origin of the negative charge is the following When one transforms a positive function in real space to Fourier space and truncates at some finite cutoff the positive function is no longer guaranteed to be positive when transformed back to 24 real space This happens only with core corrections and with USPPs In some cases it may be a source of trouble see next point but it is usually solved by increasing the cutoff for the charge density Structural optimization is slow or does not converge or ends with a mysterious bfgs error Typical structural optimizations based on the BFGS algorithm converge to the default thresholds et
35. ot_conv_thr and forc_conv_thr in 15 25 BFGS steps depending on the starting configuration This may not happen when your system is characterized by floppy low energy modes that make very difficult and of little use anyway to reach a well converged structure no matter what Other possible reasons for a problematic convergence are listed below Close to convergence the self consistency error in forces may become large with respect to the value of forces The resulting mismatch between forces and energies may confuse the line minimization algorithm which assumes consistency between the two The code reduces the starting self consistency threshold conv thr when approaching the minimum energy configura tion up to a factor defined by upscale Reducing conv_thr or increasing upscale yields a smoother structural optimization but if conv_thr becomes too small electronic self consistency may not converge You may also increase variables etot_conv_thr and forc_conv_thr that determine the threshold for convergence the default values are quite strict A limitation to the accuracy of forces comes from the absence of perfect translational in variance If we had only the Hartree potential our PW calculation would be translationally invariant to machine precision The presence of an XC potential introduces Fourier components in the potential that are not in our basis set This loss of precision more serious for gradient corrected functionals tra
36. otations that are incompatible with translation symmetry e the system is rotated with respect to symmetry axis For instance a Ceo molecule in the fcc lattice will have 24 symmetry operations T group only if the double bond is aligned along one of the crystal axis if Cg is rotated in some arbitrary way pw x may not find any symmetry apart from inversion e they contain a fractional translation that is incompatible with the FFT grid see next paragraph Note that if you change cutoff or unit cell volume the automatically com puted FFT grid changes and this may explain changes in symmetry and in the number of k points as a consequence for no apparent good reason only if you have fractional translations in the system though e a fractional translation without rotation is a symmetry operation of the system This means that the cell is actually a supercell In this case all symmetry operations containing fractional translations are disabled The reason is that in this rather exotic case there is no simple way to select those symmetry operations forming a true group in the mathematical sense of the term Warning symmetry operation N not allowed This is not an error If a symmetry operation contains a fractional translation that is incompatible with the FFT grid it is discarded in order to prevent problems with symmetrization Typical fractional translations are 1 2 or 1 3 of a lattice vector If the FFT grid dimension along that d
37. propriate configure or loader options The same applies to Beowulf style parallel machines the needed shared libraries must be present on all PCs errors in examples with parallel execution If you get error messages in the example scripts i e not errors in the codes on a parallel machine such as e g run example n command not found you may have forgotten the in the definitions of PARA_PREFIX and PARA POSTFIX 18 pw x prints the first few lines and then nothing happens parallel execution If the code looks like it is not reading from input maybe it isn t the MPI libraries need to be properly configured to accept input redirection Use pw x i and the input file name see Sec or inquire with your local computer wizard if any Since v 4 2 this is for sure the reason if the code stops at Waiting for input pw x stops with error while reading data There is an error in the input data typically a misspelled namelist variable or an empty input file Unfortunately with most compilers the code just reports Error while reading XXX namelist and no further useful information Here are some more subtle sources of trouble e Out of bound indices in dimensioned variables read in the namelists e Input data files containing M Control M characters at the end of lines or non ASCII characters e g non ASCII quotation marks that at a first glance may look the same as the ASCII character Typically this happens with
38. s time required by PAW USPPs contribution if any Note that for NCPPs the FFT grids for charge and wavefunctions are the same The time T f for calculation of potential from charge density is Tsef daN 1 Nr2 N73 an d3 Nri Ny2N3log Ny1 Ny2 N 3 where d dz are prefactors The above estimates are for serial execution In parallel execution each contribution may scale in a different manner with the number of processors see below 4 2 Memory requirements A typical self consistency or molecular dynamics run requires a maximum memory in the order of O double precision complex numbers where O mM N PN pN N2N3 aN 1 N2N 3 with m p q small factors all other variables have the same meaning as above Note that if the P point only k 0 is used to sample the Brillouin Zone the value of N will be cut into half The memory required by the phonon code follows the same patterns with somewhat larger factors m p q 4 3 File space requirements A typical pw x run will require an amount of temporary disk space in the order of O double precision complex numbers O N MN IN Nr2N 3 where q 2X mixing_ndim number of iterations used in self consistency default value 8 if disk_io is set to high q 0 otherwise 4 4 Parallelization issues pw x can run in principle on any number of processors The effectiveness of parallelization is ultimately judged by the scaling i e how the time needed to perform a
39. s DFT D For DFT D DFT semiempirical dispersion interac tions see the description of input variables london sample files PW tests vdw and the comments in source file Modules mm_dispersion f90 Hartree Fock and Hybrid functionals Since v 5 0 calculations in the Hartree Fock ap proximation or using hybrid XC functionals that include some Hartree Fock exchange no longer require a special preprocessing before compilation See EXX_example and its README file Dispersion interaction with non local functional vdW DF See example vdwDF example and references quoted in file README therein Polarization via Berry Phase See Example 04 its file README the documentation in the header of PW src bp_c_phase 90 Finite electric fields There are two different implementations of macroscopic electric fields in pw x via an external sawtooth potential input variable tefield true and via the modern theory of polarizability lelfield true The former is useful for surfaces especially in conjunction with dipolar corrections dipfield true see examples dipole_example for an example of application Electric fields via modern theory of polarization are documented in example 10 The exact meaning of the related variables for both cases is explained in the general input documentation Orbital magnetization Modern theory of orbital magnetization Phys Rev Lett 95 137205 2005 for insulators The calculation is performed by setting i
40. s to a batch queue do not use the same outdir and the same prefix unless you are sure that one job doesn t start before a preceding one has finished pw x crashes in parallel execution with an obscure message related to MPI errors Random crashes due to MPI errors have often been reported typically in Linux PC clusters We cannot rule out the possibility that bugs in QUANTUM ESPRESSO cause such behavior but we are quite confident that the most likely explanation is a hardware problem defective RAM for instance or a software bug in MPI libraries compiler operating system Debugging a parallel code may be difficult but you should at least verify if your problem is reproducible on different architectures software configurations input data sets and if there is some particular condition that activates the bug If this doesn t seem to happen the odds are that the problem is not in QUANTUM ESPRESSO You may still report your problem but consider that reports like it crashes with obscure MPI error contain 0 bits of information and are likely to get 0 bits of answers pw x stops with error message the system is metallic specify occupations You did not specify state occupations but you need to since your system appears to have an odd number of electrons The variable controlling how metallicity is treated is occupations in namelist amp SYSTEM The default occupations fixed occupies the lowest N electrons 2 states and works only
41. scaling 4 6 Restarting Since QE 5 1 restarting from an arbitrary point of the code is no more supported The code must terminate properly in order for restart to be possible A clean stop can be triggered by one the following three conditions 1 The amount of time specified by the input variable max_seconds is reached 2 A file named Sprefix EXIT is found in the execution directory prefix is the prefix specified in the control namelist 3 experimental The code is compiled with signal trapping support and one of the trapped signals is received see the next section for details After the condition is met the code will try to stop cleanly as soon as possible which can take a while for large calculation Writing the files to disk can also be a long process In order to be safe you need to reserve sufficient time for the stop process to complete If the previous execution of the code has stopped properly restarting is possible setting restart_mode restart in the control namelist 17 4 6 1 Signal trapping experimental In order to compile signal trapping add D_ TERMINATE_GRACEFULLY to MANUAL_DFLAGS in the make sys file Currently the code intercepts SIGINT SIGTERM SIGUSR1 SIGUSR2 SIGXCPU signals can be added or removed editing the file clib custom_signals c Common queue systems will send a signal some time before killing a job The exact be haviour depends on the queue systems and could be configur
42. side effect PW parallelization may yield superlinear better than linear scaling thanks to the increase in serial speed coming from the reduction of data size making it easier for the machine to keep data in the cache VERY IMPORTANT For each system there is an optimal range of number of processors on which to run the job A too large number of processors will yield performance degradation If the size of pools is especially delicate N should not exceed Nz and N 3 and should ideally be no larger than 1 2 1 4N3 and or N 3 In order to increase scalability it is often convenient to further subdivide a pool of processors into task groups When the number of processors exceeds the number of FFT planes data can be redistributed to task groups so that each group can process several wavefunctions at the same time The optimal number of processors for linear algebra parallelization taking care of mul tiplication and diagonalization of M x M matrices should be determined by observing the performances of cdiagh rdiagh pw x or ortho cp x for different numbers of processors in the linear algebra group must be a square integer Actual parallel performances will also depend on the available software MPI libraries and on the available communication hardware For PC clusters OpenMPI http www openmpi org seems to yield better performances than other implementations info by Kostantin Kudin Note however that you need a decent
43. spect to k points pools does not distribute memory only parallelization with respect to R and G space does 20 e If none of the above is sufficient or feasible you have to either reduce the cutoffs and or the cell size or to use a machine with more RAM pw x crashes with error in davcio davcio is the routine that performs most of the I O operations read from disk and write to disk in pw x error in davcio means a failure of an I O operation e If the error is reproducible and happens at the beginning of a calculation check if you have read write permission to the scratch directory specified in variable outdir Also check if there is enough free space available on the disk you are writing to and check your disk quota if any e If the error is irreproducible your might have flaky disks if you are writing via the network using NFS which you shouldn t do anyway your network connection might be not so stable or your NFS implementation is unable to work under heavy load e If it happens while restarting from a previous calculation you might be restarting from the wrong place or from wrong data or the files might be corrupted Note that since QE 5 1 restarting from arbitrary places is no more supported the code must terminate cleanly e If you are running two or more instances of pw x at the same time check if you are using the same file names in the same temporary directory For instance if you submit a series of job
44. t to do so see variable nosym all required k points and weights if the symmetry of the system is lower than the symmetry of the Bravais lattice The automatic generation of k points follows the convention of Monkhorst and Pack 3 2 Data files The output data files are written in the directory outdir prefix save as specified by vari ables outdir and prefix a string that is prepended to all file names whose default value is prefix pwscf outdir can be specified as well in environment variable ESPRESSO_TMPDIR The iotk toolkit is used to write the file in a XML format whose definition can be found in the Developer Manual In order to use the data directory on a different machine you need to convert the binary files to formatted and back using the bin iotk script The execution stops if you create a file prefix EXIT either in the working directory i e where the program is executed or in the outdir directory Note that with some versions of MPI the working directory is the directory where the executable is The advantage of this procedure is that all files are properly closed whereas just killing the process may leave data and output files in an unusable state 3 3 Electronic structure calculations Single point fixed ion SCF calculation Set calculation scf this is actually the default Namelists amp IONS and amp CELL will be ignored See Example 01 Band structure calculation First perform a SCF calculation as above
45. this will increase the amount of memory you need 3 Specific to USPP the presence of negative charge density regions due to either the pseudization procedure of the augmentation part or to truncation at finite cutoff may give convergence problems Raising the ecutrho cutoff for charge density will usually help I do not get the same results in different machines If the difference is small do not panic It is quite normal for iterative methods to reach convergence through different paths as soon as anything changes In particular between serial and parallel execution there are operations that are not performed in the same order As the numerical accuracy of computer numbers is finite this can yield slightly different results It is also normal that the total energy converges to a better accuracy than its terms since only the sum is variational i e has a minimum in correspondence to ground state charge density Thus if the convergence threshold is for instance 1079 you get 8 digit accuracy on the total energy but one or two less on other terms e g XC and Hartree energy It this is a problem for you reduce the convergence threshold for instance to 1071 or 107 The differences should go away but it will probably take a few more iterations to converge Execution time is time dependent Yes it is On most machines and on most operating systems depending on machine load on communication load for parallel machines on various other fact

PWscf User`s Guide (v.5.2.1)

Contents

Download Pdf Manuals

Related Search

Related Contents