Home

User's Guide to AGMG

1. set inverse of the mesh size feel free to change nhinv 500 maximal number of iterations iter 50 tolerance on relative residual norm tol 1 e 6 unit number for output messages 6 gt standard output iprint 6 generate the matrix in required format CSR Se Cy A iy first allocate the vectors with correct size N nhinv 1 x x 2 allocate a 5 N ja 5 N ia N 1 f N x N next call subroutine to set entries call uni2d nhinv 1 f a ja ia call agmg argument 5 ijob is 0 because we want a complete solve argument 7 nrest is 1 because we want to use flexible CG the matrix is symmetric positive definite ba Way Ce Fy A ay call dagmg N a ja ia f x 0 iprint 1 iter tol END program example_seq 12 J 2 4 Printed output The above example produces the following output gt ENTER LNG AGMG 23 3b GRR CORO OO GOR lolajojok 2k KOK 249001 1243009 per row Number of unknowns kkk Nonzeros 4 99 SETUP Coarsening by multiple pairwise aggregation ke Rmk Setup performed assuming the matrix symmetric KKK Quality threshold BlockD 8 00 Strong diag dom trs 1 29 FKK K Maximal number of passes 2 Target coarsening factor 4 00 KKK Threshold for rows with large pos offdiag 0 45 kkk Level 2 kkk Number of variables 62000 reduction ratio 4 02 eK Nonzeros 309006 per row 5 0 red ratio 4 02 eK Level 3 kkk Number of variables 15375 reduction ratio 4 03
2. PEPEEUEP EP URE EP EEEEP EEC EE EP UEP rra rara ana If you use AGMG for research please observe that your work benefits our past research efforts that allowed the development of AGMG Hence even if you do not see it directly the results obtained thanks to the use of AGMG depend on the results in publications 1 3 below where the main algorithms used in AGMG are presented and justified It is then a normal duty to cite these publications besides citing AGMG itself in any scientific work depending on the usage of AGMG as you would do with any former research result you are using 1 Y Notay An aggregation based algebraic multigrid method Electronic Transactions on Numerical Analysis vol 37 pp 123 146 2010 2 A Napov and Y Notay An algebraic multigrid method with guaranteed convergence rate SIAM J Sci Comput vol 34 pp A1079 A1109 2012 3 Y Notay Aggregation based algebraic multigrid for convection diffusion equations SIAM J Sci Comput vol 34 pp A2288 A2316 2012 2 Contents 1 Introduction Be ali a da a DU Be Bee ee e 1 2 Release notes je coa AR Yee Be A 13 Installation and external libraries o a a a ee l4 Error ha andling EII 1 5 Common errors rosca as a AR 2 Sequential version 2 1 Basic usage and matrix format 4 vocera da e eens e AA A A IS ica a a A A AA a a 222 el e A Oa HRS A A 223 NESI ii as a a paa Rave ad a Ed A a AA EN TON pj eras bieen 2 4 Printed U
3. ulb ac be ynotay AGMG for instructions to obtain a copy of the software and possible updgrade Key words FORTRAN Matlab multigrid linear systems iterative methods AMG preconditioning parallel computing software AMS subject classification 65F10 Supported by the Belgian FNRS Directeur de recherches COPYRIGHT c 2012 Universite Libre de Bruxelles ULB ALL USAGE OF AGMG 15 SUBJECT TO LICENSE PLEASE REFER TO THE FILE LICENSE IF YOU OBTAINED A COPY OF THIS SOFTWARE WITHOUT THIS FILE PLEASE CONTACT ynotay ulb ac be In particular if you have a free academic license 1 You must be a member of an educational academic or research institution The license agreement automatically terminates once you no longer fulfill this requirement 2 You are obliged to cite AGMG in any publication or report as Yvan Notay AGMG software and documentation see http homepages ulb ac be ynotay AGMG 3 You may not make available to others the software in any form either as source or as a precompiled object 4 You may not use AGMG for the benefit of any third party or for any commercial purposes Note that this excludes the use within the framework of a contract with an industrial partner PEPEEEEEEE PEEP EEE EEE EEE e eee bebe eee eee ee eee eee eee eee eee eee ee eee eee eens DICLAIMER AGMG is provided on an AS 15 basis without any explicit or implied WARRANTY see the file LICENSE for more details
4. 13 an alternative implementation of GMRES 12 which is the default main iteration routine 7 A nonpositive value is converted to 10 suggested default If nrest 1 Flexible CG is used instead of GCR when ijob 0 10 2 12 100 110 102 112 and also ijob 0 1 100 101 some simplification are performed during the set up based on the assumption that the input matrix is symmetric there is then no more difference be tween ijob 1 and ijob 101 This is recommended if and only if the matrix is symmetric and positive definite 2 2 4 ter iter is the maximal number of iterations on input and the effective number of iterations on output Since it is both an input and an output parameter it is important not to forget to reinitialize it between successive call 2 2 5 tol tol is the tolerance on the relative residual norm that is iterations will be pursued within the allowed limit until the residual norm is below tol times the norm of the input right hand side 11 2 3 Example The source file of the following example is provided with the package Listing 1 source code of sequential Example program example_seq Solves the discrete Laplacian on the unit square by simple call to agmg The right hand side is such that the exact solution is the vector of all A A Ry implicit none real kind 0d0 allocatable a f x integer allocatable ja ia integer n iter iprint nhinv real kind 0d0 tol
5. 5 0 red ratio 8 01 Ox Nonzeros in local rows 155996 per row 5 0 red ratio 8 00 Ox Level 3 Global number of variables 15904 reduction ratio 7 83 Ox Number of local rows 3984 reduction ratio 7 84 k Global number of nonzeros 80716 per row 5 1 red ratio 7 71 Ox Nonzeros in local rows 20135 per row 5 1 red ratio 7 75 Ox Level 4 Global number of variables 2034 reduction ratio 7 82 Ox Number of local rows 497 reduction ratio 8 02 kk Global number of nonzeros 10516 per row 5 2 red ratio 7 68 Ox Nonzeros in local rows 2513 per row 5 1 red ratio 8 01 Ox Exact factorization 0 123 work units RK Global grid complexity 1 14 RK Global Operator complexity 1 14 Ox Local grid complexity 1 14 Ox Local Operator complexity 1 14 x x Theoretical Weighted complexity 1 33 K cycle at each level RK Effective Weighted complexity 1 33 V cycle enforced where needed 24 Ox memory used peak 2 82 real 8 words per nnz 26 86 Mb Ox Setup time Elapsed 1 27E 00 seconds O SOLUTION flexible conjugate gradient iterations FCG 1 RK AMG preconditioner with 1 Gauss Seidel pre and post smoothing sweeps KKK at each level eee Tter 0 Resid 0 63E 02 Relat res 0 10E 01 KK Tter 1 Resid 0 20E 02 Relat res 0 32E 00 xo Tter 2 Resid 0 54E 01 Relat res 0 85E 01 eee Tter 3 Resid 0 20E 01 Relat res 0 31E 01 eK Iter 4 Resid 0 53E 00 Relat res 0 84E 02
6. KKK Nonzeros 76381 per row 5 0 red ratio 4 05 kkk Level 4 kkk Number of variables 3721 reduction ratio 4 13 KKK Nonzeros 18361 per row 4 9 red ratio 4 16 eK Level 5 kkk Number of variables 899 reduction ratio 4 14 eK Nonzeros 4377 per row 4 9 red ratio 4 19 KK Exact factorization 0 158 work units RK Grid complexity 1 33 RK Operator complexity 1 33 x x Theoretical Weighted complexity 1 92 K cycle at each level RK Effective Weighted complexity 1 92 V cycle enforced where needed KKK memory used peak 2 54 real 8 words per nnz 24 11 Mb KKK Setup time Elapsed 1 13E 01 seconds kSOLUTION flexible conjugate gradient iterations FCG 1 2K KKK 2K KKK AMG preconditioner with 1 Gauss Seidel pre and post smoothing sweeps at ea ch level 13 eee Tter 0 Resid 0 45E 02 Relat res 0 10E 01 kKK Tter 1 Resid 0 14E 02 Relat res 0 32E 00 eK Tter 2 Resid 0 23E 01 Relat res 0 51E 01 eee Tter 3 Resid 0 87E 00 Relat res 0 19E 01 xx Tter 4 Resid 0 14E 00 Relat res 0 31E 02 xx Tter 5 Resid 0 35E 01 Relat res 0 78E 03 xx k Iter 6 Resid 0 66E 02 Relat res 0 15E 03 xx Tter 7 Resid 0 23E 02 Relat res 0 52E 04 xx Tter 8 Resid 0 54E 03 Relat res 0 12E 04 xxx Iter 9 Resid 0 13E 03 Relat res 0 28E 05 eK Tter 10 Resid 0 33E 04 Relat res 0 74E 06 kk Convergence reached in 10 iterations RK level 2 call 10 tcycle 20 me
7. allocatable ja ia listrank integer n iter iprint nhinv NPROC IRANK mx my ifirstlistrank ierr real kind 0d0 tol characterx 10 filename set inverse of the mesh size feel free to change nhinv 1000 maximal number of iterations iter 50 tolerance on relative residual norm tol 1 e 6 Initialize MPI call MPI_INIT ierr call MPI_COMM_SIZE MPI_COMM_WORLD NPROC ierr call MPI_COMM_RANK MPI_COMM_WORLD IRANK ierr 20 43 48 ta a a ta Oey Cg ss Sa Cg Sy A Pag See ie Tey Ce Cee Fa Bae A om unit number for output messages alternative iprint 10 IRANK iprint 10 filename 1 8 res out_ write filename 9 10 i2 2 IRANK processor dependent open iprint file filename form formatted calculate local grid size mx nhinv 1 my nhinv 1 NPROC if IRANK lt mod nhinv 1 NPROC my my 1 generate the matrix in required format CSR first allocate the vectors with correct size N mx my allocate a 5 N ja 5 N ia N 1 f N x n listrank 2 mx external nodes connected with local ones on top and bottom internal boundaries will receive numbers N 1 N 2 ma ifirstlistrank N 1 next call subroutine to set entries before initialize listrank to zero so that entries that do not correspond to a nonlocal variable present in ja are anyway properly defined listrank 1 2 mx 0 call uni2dstrip mx my f a ja ia IRANK NPROC listrank ifirstlistra
8. coarsening by increasing also kaptg_blocdia and or kaptg dampJac after having checked effects on the convergence speed they are application dependent Eventually in the parallel case it is worth checking how the solver works on the coars est grid Large set up time may indicate that this latter is too large This can be re duced by decreasing maxcoarsesize and maxcoarsesizeslow It is also possible to tune their effect according to the number of processors in subroutine dagmgpar_init just be low agmgpar_mem Conversely increasing the size of the coarsest system may help re duce communications as long as the coarsest solver remains efficient which depends not only of this size but also on the number of processors and on the communication speed Professional version only with the variant NM that does not use the parallel MUMPS library the coarsest grid solver is a special iterative solver parameters that define the target accuracy and the maximal number of iterations at this level can also be tuned 26 4 Running several instances of AGMG In some contexts it is interesting to have simultaneously in memory several instances of the preconditioner For instance when a preconditioner for a matrix in block form is defined considering separately the different blocks and when one would like to use AGMG for several of these blocks Such usage of AGMG is not foreseen with the present version However there is a simple workaroun
9. differ from the input matrix that was supplied upon the previous call with ijob 1 or IJOB ijob 101 then AGMG attempts to solve a linear system with the new matrix using the precon ditioner set up for the old one The same remark applies to ijob gt 100 or not the value ijob 1 or 101 determines whether the preconditioner set up and stored in internal memory is based on the matrix or its transpose the value ijob 2 12 or 102 112 is used 10 to determine whether the linear system to be solved is with the matrix or its transpose independently of the set up Hence one may set up a preconditioner for a matrix and use 1t for its transpose These functionalities set up a preconditioner and use it for another matrix are provided for the sake of generality but should be used with care in general set up is fast with AGMG and hence it is recommended to rerun it even if the matrix changes only slightly Matlab the usage is the same except that one should not specify whether an initial approximation is present in x This is told via the presence or the absence of the argument Hence for instance ijob 0 gathers the functionality of both ijob 0 and ijob 10 2 2 2 iprint iprint is the unit number where information and possible error messages are to be printed A nonpositive number suppresses all messages except the error messages which will then be printed on standard output 2 2 3 nrest nrest is the restart parameter for GCR 2
10. ifirstlistrank Setting ifirstlistrank jj or ifirstlistrank N 1 allows to save on memory since listrank i is never referenced for i lt jmin hence in particular for i lt N ijob comments in Section apply except that the parallel version cannot work directly with the transpose of the input matrix Hence ijob 100 101 102 110 112 are not permitted 18 By way of illustration consider the same matrix as in Section 2 1 partitioned into 3 tasks task 0 receiving rows 1 amp 2 task 1 rows 3 amp 4 and task 2 row 5 Firstly one has to add a few entries into the structure to enforce the structural symmetry with respect to non local connections 10 1 2 11 3 A 4 12 5 13 8 9 14 Then A is given non uniquely by the following variables and vectors TASK 0 n a ja ia listrank TASK 1 n a ja ia listrank TASK 2 n a ja ia listrank 10 1 2 11 0 3 0 4 12 5 0 0 13 8 9 14 2 10 0 1 0 2 0 11 0 3 0 0 0 0 0 1 4 1 2 4 3 5 138 x x 1 1 2 gt 2 40 120 5 0 13 0 0 0 0 0 7 1 5 2 6 7 14 7 kk se 30 0 1 140 9 0 8 0 1 5 3 Laan x 0 1 1 Note that these vectors in particular the numbering of nonlocal variables were not con structed in a logical way but rather to illustrate the flexibility and also the requirements of the format One sees for instance with T
11. input parameters and the CSR matrix format Section and the meaning of output lines Section P 4 This information will not be repeated here where the focus is on the specific features of the parallel version The parallel implementation has been developed to work with MPI and we assume that the reader is somewhat familiar with this standard The parallel implementation is based on a partitioning of the rows of the system matrix and assumes that this partitioning has been applied before calling AGMGPAR Thus if the matrix has been generated sequentially some partitioning tool like METIS 8 should be called before going to AGMGPAR The partitioning of the rows induces inside AGMG a partitioning of the variables Part of the local work is proportional to the number of nonzero entries in the local rows and part of it is proportional to the number of local variables that is the number of local rows The provided partitioning should therefore aim at a good load balancing of both these quantities Besides the algorithm scalability with respect to the number of processors will be best when minimizing the sum of absolute values of offdiagonal entries connecting rows assigned to different processors or tasks 3 1 Basic usage and matrix format To work the parallel version needs that several instances of the application program have been launched in parallel and have initialized a MPI framework with a valid communicator Let MPI_COMM be
12. is slightly changed as it influences also the aggre gation process see above The rule however remains that nrest should be set to 1 only if the system matrix is symmetric positive definite see Section for details It is now possible to solve a system with the transpose of the input matrix This corresponds to further allowed values for parameter ijob see Section below for details The meaning of the input matrix when using ijob not equal to its default is also changed see Section below for details The naming schemes and the calling sequence of each driver routine remain un changed from releases 2 x but there are slight differences with respect to releases 1 x Whereas with releases 2 x routines were provided for backward compatibility these are no longer supported i e application programs based on the naming scheme and calling sequence of releases 1 x need to be adapted to the new scheme 1 3 Installation and external libraries AGMG does not need to be installed as a library For the sake of simplicity source files are provided and it is intended that they are compiled together with the application program in which AGMG routines are referenced See the README file provided with the package for additional details and example of use AGMG requires LAPACK I and BLAS libraries Most compilers come with an optimized implementation of these libraries and it is strongly recommended to use it If not avail able we provide needed r
13. when IJOB 0 10 2 12 100 110 102 112 and also IJOB 0 1 100 101 performs some simplification during the set up based on the assumption that the matriz supplied in A JA IA is symmetric there is then no more difference hers IJOB 1 and IJOB 101 II NREST 1 Should be used if and only if the matriz is really SYMMETRIC I ITER TOL and positive definite input output INTEGER On input the maximum number of iterations Should be positive On output actual number of iterations If this number of iteration was insufficient to meet convergence criterion ITER will be returned negative and equal to the opposite of the number of iterations performed Significant only if IJOB 0 2 10 12 100 102 110 112 input REAL kind 0 0e0 The tolerance on residual Iterations are stopped whenever Axa f lt TOL F Should be positive and less boas 1 0 Sign ficant oniy if 1J0B 0 2 10 12 100 108 110 112 11111 Remark Except insufficient number of iterations to achieve convergence pd de a begia a Te f 140 160 B Listing of AGMGPAR SUBROUTINE dagmgpar n a ja ia f x ijob iprint nrest iter tol amp MPI_COMM_AGMG listrank ifirstlistrank INTEGER n ia n 1 ja ijob iprint nrest iter REAL kind 0 0d0 a x f n x m REAL kind 0 0d0 tol INTEGER MPI_COMM_AGMG ifirstlistrank listrank x Actually listrank ifirstlistrank x listr
14. A that was supplied upon the previous call with LOB 1 then AGMG attempts to solve a linear system with the new matrix supplied when IJOB 2 12 using the preconditioner set up for the old one supplied when IJOB 1 These functionalities set up a preconditioner and use it for another matriz are provided for the sake of generality but should be used with care in general set up is fast with AGMG and hence it is recommended to rerun it even if the matriz changes only slightly input INTEGER Indicates the unit number where information is to be printed 33 225 230 240 lt 250 260 Sp Wy A A A o Pa ag A A A A See WSS NREST N B 5 is converted to 6 If nonpositive only error messages are printed on standard output input INTEGER Restart parameter for GCR an implementation of GMRES Nonpositive values are converted to NREST 10 default If NREST 1 Flexible CG is used instead of GCR when IJOB 0 10 2 12 and also IJOB 0 1 performs some simplification during the set up based on the assumption that the matriz supplied in A JA IA is symmetric 11 NREST 1 Should be used if and only if the matrix is really SYMMETRIC ft ITER TOL and positive definite input output INTEGER On input the mazimum number of iterations Should be positive On output actual number of iterations If this number of iteration was insufficient to meet convergence criterion ITER wil
15. ASK 2 that the numbering of nonlocal variables may be quite arbitrary as long as holes in listrank are filled with negative numbers 3 2 Example The source file of the following example is provided with the package The matrix corre sponding to the five point approximation of the Laplacian on the unit square is partitioned according a strip partitioning of the domain with internal boundaries parallel to the x di rection Note that this strip partitioning is not optimal and has been chosen for the sake of simplicity If TRANK gt 0 nodes at the bottom line have a non local connection with task IRANK 1 and if TRANK lt NPROC 1 nodes at the top line have a non local connection with task IRANK 1 Observe that these non local variables are given indexes gt N in subroutine uni2dstrip and that listrank is set up accordingly Note also that with this simple example the consistency of local orderings arises in a natural way Note also that each task will print its output in a different file This is recommended Listing 2 source code of parallel Example program example_par Solves the discrete Laplacian on the unit square by simple call to agmg The right hand side is such that the exact solution is the vector of all 1 Uses a strip partitioning of the domain with internal boundaries parallel to the g direction ie By A Oe Fee Bee implicit none include mpif h real kind 0d0 allocatable a f x integer
16. OB 1 or IJOB 101 then AGMG attempts to solve a linear system with the new matrix supplied when IJOB 2 12 102 or 112 using the preconditioner set up for the old one supplied when IJOB 1 or 101 The same remarks apply to IJOB gt 100 or not the value IJOB 1 or 101 determines whether the preconditioner set up and stored in internal memory is based on the matrix or its transpose the value IJOB 2 12 or 102 112 is used to determine whether the linear system to be solved is with the matrix or its transpose independently of the set up 30 92 I 97 102 107 112 117 127 f l l l l I E 1 7 l i f l IPRINT Hence one may set up a preconditioner for a matriz and use it for its transpose These functionalities set up a preconditioner and use it for another matriz are provided for the sake of generality but should be used with care in general set up is fast with AGMG and hence it is recommended to rerun it even if the matriz changes only slightly input INTEGER Indicates the unit number where information is to be printed N B 5 is converted to 6 If nonpositive only error messages are printed on standard output input INTEGER Restart parameter for GCR an implementation of GMRES Nonpositive values are converted to NREST 10 default If NREST 1 Flexible CG is used instead of GCR
17. User s Guide to AGMG Yvan Notay Service de M trologie Nucl aire Universit Libre de Bruxelles C P 165 84 50 Av F D Roosevelt B 1050 Brussels Belgium email ynotayQulb ac be February 2008 Revised March 2009 March 2010 July 2011 January 2012 October 2012 March 2014 Abstract This manual gives an introduction to the use of AGMG a program which im plements the algebraic multigrid method described in Y NOTAY An aggregation based algebraic multigrid method Electronic Trans Numer Anal 27 123 146 2010 with further improvements from A NAPOV AND Y NOTAY An algebraic multigrid method with guaranteed convergence rate SIAM J Sci Comput vol 34 pp A1079 A1109 2012 and from Y NOTAY Aggregation based algebraic multigrid for convection diffusion equations SIAM J Sci Comput vol 34 pp A2288 A2316 2012 This method solves algebraic systems of linear equations and is expected to be efficient for large systems arising from the discretization of scalar second order elliptic PDEs The method may however be tested on any problem as long as all diagonal entries of the system matriz are positive It is indeed purely algebraic that is no information has to be supplied besides the system matrix and the right hand side Both a sequential and a parallel FORTRAN 90 implementations are provided as well as an interface allowing to use the software as a Matlab function See the web site http homepages
18. an 2 00 max 2 RK level 3 call 20 cycle 40 mean 2 00 max 2 RK level 4 call 40 cycle 80 mean 2 00 max 2 KK Number of work units 11 89 per digit of accuracy xk memory used for solution 3 33 real 8 words per nnz 31 54 Mb KKK Solution time Elapsed 2 55E 01 seconds kk x 1 work unit represents the cost of 1 fine grid residual evaluation LEAVING AGMG MEMORY RELEASED 220 CORR RRR k kkk kkk kkk kkk kkk A 1 12124 24 kk Note that each output line issued by the package starts with x x AGMG first indicates the size of the matrix and the number of nonzero entries One then enters the setup phase and the name of the coarsening algorithm is recalled together with the basic parameters used Note that these parameters need not be defined by the user AGMG always use default values These however can be changed by expert users see Section below The quality threshold is the threshold used to accept or not a tentative aggregate when ap plying the coarsening algorithms from 5 8 BlockD indicates that the algorithm from is used quality for block diagonal smoother whereas Jacobi is printed instead when the algorithm from 8 is used quality for Jacobi smoother The strong diagonal dominance threshold is the threshold used to keep outside aggregation rows and columns that are strongly diagonally dominant by default it is set automatically according to the quality threshold as indicated in 5 8 The maximal nu
19. ank is only used in lower level subroutines FLEES ELT ESET ES EE PELE PPS SPE PEL EP ES PPE EE PPE PEE EEE PPP EEE PEPE RIEL ERES Arguments IA JA input INTEGER The dimension of the matriz input output REAL kind 0 0e0 Numerical values of the matriz input output INTEGER Pointers for every row input output INTEGER Column indices AGMG ASSUMES THAT ALL DIAGONAL ENTRIES ARE POSITIVE Detailed description of the matrix format On input TA I P 1 N refers to the physical start of row I That is the entries of row I are located in A K where K IA I IA I 1 1 JA K carries the associated column indices IA N 1 must be defined in such a way that the above rule also works for I N that is the last valid entry in arrays A JA should correspond to index K IA N 1 1 According what is written above AGMG assumes that some of these JA K for IA I lt K lt IA I 1 is equal to I with corresponding A K carrying the value of the diagonal element which must be positive A IA JA are output parameters because on exit the entries of each row may occur in a different order The matrix is mathematically the same but stored in different way input output REAL kind 0 0e0 On input the right hand side vector f 32 180 185 190 195 205 215 a A A Say o Pa gy A A A A A o aay ig ig A e A Hy a e Fa A A A A A A A A i A A A A A ss o Pe Sa eS IJOB Overwritten on output Sig
20. at should allow to figure out what happened Note that AGMG does not check the validity of all input parameters For instance all diagonal entries of the supplied matrix must be positive but no explicit test is implemented Feel free to contact the author for further explanations if errors persist after checking that the input parameters have been correctly defined 1 5 Common errors Be careful that iter and in the parallel case ifirstlistrank are also output parameters Hence declaring them as a constant in the calling program entails a fatal error Also don t forget to reinitialize iter between successive calls to AGMG or AGMGPAR 2 Sequential version 2 1 Basic usage and matrix format The application program has to call the main driver AGMG as follows call dagmg N a ja ia f x ijob iprint nrest iter tol N is the order of the linear system whose matrix is given in arrays a ja ia The right hand side must be supplied in array f on input and the computed solution is returned in x on output optionally x may contain an initial guess on input see Section below The required matrix format is the compressed sparse row CSR format described e g in 12 With this format nonzero matrix entries numerical values are stored row wise in a whereas ja carries the corresponding column indices entries in ia indicate then where every row starts That is nonzero entries numerical values and column indices of row i are loca
21. c information can also be obtained by entering help agmg in the Matlab environment For a sample of performances in sequential and comparison with other solvers see the paper 6 and the report 9 http homepages ulb ac be ynotay AGMG numcompsolv pdf For performances of the parallel version see 10 1 1 How to use this guide This guide is self contained but does not describe methods and algorithms used in AGMG for which we refer to 7 5 8 We strongly recommend their reading for an enlightened use of the package On the other hand main driver routines are reproduced in the Appendix with the comment lines describing precisely each parameter It is a good idea to have a look at the listing of a driver routine while reading the section of this guide describing its usage 1 2 Release notes This guide describes AGMG 3 2 1 aca academic version and AGMG 3 2 1 pro profes sional version New features from release 3 2 1 e Improved treatment of singular systems see Section e Professional version only Improved parallel performance verified weak scalability up to 370 000 cores 10 e Professional version only Compilation with the provided sequential MUMPS mod ule can be replaced by a link with the Intel Math Kernel Library MKL see Sec tion New features from release 3 2 0 e Professional version only A variant of the parallel version is provided that does not require the installation of the para
22. d In the sequential case copy the source file say dagmg f90 to dagmg1 f90 Then edit dagmg1 f90 and make a global replacement of all instances of dagmg lowercase by dagmg1 but not DAGMG_MUMSP uppercase which should remain unchanged Repeat this as many times as needed to generate dagmg2 f90 dagmg3 f90 etc Then compile all these source files together with the application program and dagmg_mumps f90 The application program can call dagmg1 dagm2 dagm3 defining several instances of the preconditioner that will not interact with each other You can proceed similarly with any arithmetic and also the parallel version In the latter case replace all instances of dagmgpar by dagmgpar1 no exception Matlab the same result is obtained in an even simpler way Copy the three files agmg m dmtlagmg mex and zmtlagmg mex where depends upon your OS and archi tecture to say agmgl m dmtlagmgl mex and zmtlagmgl mex Edit the m file agmgl m and replace the 3 calls to dmtlagmg by calls to dmtlagmg1 and the 3 calls to zmtlagmg by calls to zmtlagmg1 Repeat this as many times as needed Then functions agmg1 agmg2 are defined which can be called independently of each other 5 Solving singular systems Singular systems can be solved directly Further this is done automatically without need ing to tell the software that the provided system is singular However this usage should be considered with care and a
23. e MUMPS to be installed as a library Actually in the sequential case AGMG requires only a subset of MUMPS and for users convenience files one for each arithmetic gathering needed routines are provided together with AGMG This is possible because MUMPS is public domain see the files header for detailed copyright notice Note that all MUMPS routines we provide with AGMG have been renamed the prefixes d s have been exchanged for dagmg sagmg_ Hence you may use them in an application program that is also linked with the MUMPS library there will be no mismatch in particular no mismatch between the sequential version we provide and the parallel version of MUMPS installed by default On the other hand it means that if you have the sequential MUMPS library installed and that you would like to use it instead of the version we provide you need to edit the source file of AGMG and restore the classical calling sequence to MUMPS Professional version only When using the Intel compiler and linking with the Intel Math Kernel Library MKL the provided subset of MUMPS can be skipped because needed func tionalities are retrieved from the MKL using an appropriate flag when compiling AGMG see the README file for details 1 4 Error handling Except the case of a number of allowed iterations insufficient to achieve the convergence criterion any detected error is fatal and leads to a STOP statement with a printed infor mation th
24. eK Tter 5 Resid 0 30E 00 Relat res 0 47E 02 eeeK Tter 6 Resid 0 81E 01 Relat res 0 13E 02 eeeK Tter 7 Resid 0 37E 01 Relat res 0 58E 03 KK Iter 8 Resid 0 10E 01 Relat res 0 16E 03 eee Iter 9 Resid 0 36E 02 Relat res 0 56E 04 eK Tter 10 Resid 0 15E 02 Relat res 0 24E 04 xo Iter 11 Resid 0 53E 03 Relat res 0 84E 05 kKK Iter 12 Resid 0 21E 03 Relat res 0 33E 05 k K Iter 13 Resid 0 84E 04 Relat res 0 13E 05 K Iter 14 Resid 0 29E 04 Relat res 0 46E 06 x Convergence reached in 14 iterations KKK level 2 call 14 cycle 28 mean 2 00 max 2 RK level 3 call 28 cycle 56 mean 2 00 max 2 Ox Number of work units 11 84 per digit of accuracy Ox memory used for solution 2 87 real 8 words per nnz 27 35 Mb Ox Solution time Elapsed 6 54E 01 seconds O 1 work unit represents the cost of 1 fine grid residual evaluation O LEAVING AGMG MEMORY RELEASED ob RRR AAA lll ok For some items both a local and a global quantity are now given All output lines starting with correspond to global information and are printed only by the task with RANK 0 Other lines starts with xxx where xxx is the task RANK They are printed by each task based on local computation For instance the work for the exact factorization at the coarsest level is the local work as reported by MUMPS divided by the number of local floating point operations needed for one residual eva
25. f the matriz input output INTEGER Pointers for every row input output INTEGER Column indices AGMG ASSUMES THAT ALL DIAGONAL ENTRIES ARE POSITIVE Detailed description of the matrix format On input FA T f 1 N refers to the physical start of row I That is the entries of row I are located in A K where K IA I IA I 1 1 JA K carries the associated column indices IA N 1 must be defined in such a way that the above rule also works for I N that is the last valid entry in arrays A JA should correspond to index K IA N 1 1 According what is written above AGMG assumes that some of these JA K for IA I x K lt IA I 1 is equal to I with corresponding A K carrying the value of the diagonal element which must be positive A IA JA are output parameters because on exit the entries of each row may occur in a different order The matrix is mathematically the same but stored in different way input output REAL kind 0 0e0 On input the right hand side vector f Overwritten on output Stonificant oniy tf LJOB 08 2 3 20 18 180 102 110 112 29 47 52 57 62 67 72 EF 82 87 d A A A A Sey Pe ee A A A A See o o i a A Pe a Ge Ga Wa a We a A ey g a ey A is ds a A ORR CR a ey a A A Sey IJOB input output REAL kind 0 0e0 On input and if IJOB 10 12 110 112 initial guess for other values of IJOB the default is used the zero vector On output
26. few limitations apply Basically it works smoothly when the system is compatible and when the left null space is spanned by the vector of all ones i e the range of the input matrix is the set of vectors orthogonal to the constant vector and the right hand side belongs to this set Please note that failures in other cases are generally not fatal and hence not detected if the solution is not checked appropriately for instance AGMG may seemingly work well but returns a solution vector which is highly dominated by null space components To handle such cases note that AGMG can often be efficiently applied to solve the near singular system obtained by deleting one row and one column in a linear system originally singular and compatible This approach is not recommended in general because the truncated system obtained in this way tends to be ill conditioned and hence no very easy 21 to solve with iterative methods in general However it is sensible to use this approach with AGMG because one of the features of the method in AGMG is precisely its capability to solve efficiently ill conditioned linear systems 28 N A Listing of DAGMG SUBROUTINE dagmg n a ja ia f x ijob iprint nrest iter tol INTEGER REAL k REAL k n ia n l ja ijob iprint nrest iter ind 0 0d0 a f n x n ind 0 0d0 tol Arguments IA JA input INTEGER The dimension of the matriz input output REAL kind 0 0e0 Numerical values o
27. fluence on the software usage The aggregation algorithm is no longer the one described in 7 It has been exchanged for the one in 5 in the symmetric case and for the one in 8 for general nonsymmetric matrices Note that this is the user who tells if the problem is to be treated as a symmetric one or not via the parameter nrest see Section 2 2 3 below This is an internal change i e it has no influence on the usage of the software It makes the solver more robust and faster on average We can however not exclude that in some cases AGMG 3 0 is significantly slower than previous versions This may in particular happen for problems that are outside the standard scope of application Feel free to contact the author if you have such an example AGMG is an ongoing project and pointing out weaknesses of the new aggregation scheme can stimulate further progress Another internal change is that the systems on the coarsest grid are now solved with the sequential MUMPS sparse direct solver 4 Formerly MUMPS was only used with the parallel version Using it also in the sequential case allows to improve robustness i e improved performances on difficult problems The MUMPS library needs however not be installed when using the sequential ver sion We indeed provide an additional source file that contains needed subroutines from the MUMPS library See Section 1 3 and the README file for details The meaning of parameter nrest
28. l be returned negative and equal to the opposite of the number of iterations performed Significant oniy if JJ08 0 2 10 or 12 input REAL kind 0 0e0 The tolerance on residual norm Iterations are stopped whenever Axa f lt TOL f Should be positive and less than 1 0 Significant only if LJ OB 0 2 10 or 12 MPLCOMMAGMG input INTEGER MPI communicator listrank ifirstlistrank INTEGER input output Contains the rank of the tasks to which rows corresponding to nonlocal variable referenced in A JA IA are assigned Let Jmin and Jmax be respectively the smallest and the largest index of a nonlocal variable referenced in JA 1 IA4 N 1 1 Note that N lt Jmin lt Jmaz listrank i will be referenced if and only if Jmin lt i lt Jmaz and listrank i should then be equal to the rank of the home task of i if i is effectively present in JA 1 IA N 1 1 and equal to some arbitrary NEGATIVE integer otherwise listrank and ifirstlistrank may be modified on output according to the possible modification of the indexes of nonlocal variables 34 270 I I f 275 f that is on output listrank and ifirstlistrank still carry the correct information about nonlocal variables but for the matriz as defined in A JA IA on output 1 Remark Except insufficient number of iterations to achieve convergence characterized by a negative value returned in ITER al
29. l other detected errors are fatal and lead to a STOP statement PEPPET PETIT PETITE PLETE ERE IMA DIRE AFA TT PPT Eee PP PEP iti Prien Se E E SR a ee Se ee RRE A E A ew a E ea O eae E EA E E 35 References 1 11 12 13 E ANDERSON Z BAI C BISCHOF S BLACKFORD J DEMMEL J DON GARRA J D CROZ A GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Users Guide 3rd ed SIAM 1999 S C EISENSTAT H C ELMAN AND M H SCHULTZ Variational iterative methods for nonsymmetric systems of linear equations SIAM J Numer Anal 20 1983 pp 345 357 G Karypis METIS software and documentation Available online at http glaros dtc umn edu gkhome views metis MUMPS software and documentation Available online at http graal ens 1lyon fr MUMPS A NAPOV AND Y NOTAY An algebraic multigrid method with guaranteed conver gence rate SIAM J Sci Comput 34 2012 pp A1079 A1109 A NAPOV AND Y NOTAY Algebraic multigrid for moderate order finite elements Tech Rep GANMN 13 02 Universit Libre de Bruxelles Brussels Belgium 2013 Available online at http homepages ulb ac be ynotay Y Notay An aggregation based algebraic multigrid method Electron Trans Numer Anal 37 2010 pp 123 146 Y NOTAY Aggregation based algebraic multigrid for convection diffusion equations SIAM J Sci Comput 34 2012 pp A2288 A2316 Y Notay Numerical comparison of so
30. level cycle is the cumulated number of inner iterations mean and max are respectively the average and the max imal number of inner iteration performed on each call If V cycle formulation is enforced at some level see 7 Section 3 one will have cycle call and mean max 1 Finally the cost of this solution phase is reported again in term of work units but now per digit of accuracy that is how many digit of accuracy have been gained is computed as d logio lrllo llr ll where ro and rf are respectively the initial and final residual vectors and the total number of work units for the solution phase is divided by d to get the mean work needed per digit of accuracy The code also reports the memory used during this phase and the elapsed real time Note that the report on memory usage does 15 not take into account the memory internally allocated within the sparse direct solver at coarsest level 1 e MUMPS 2 5 Fine tuning Some parameters referenced in 7 5 8 are assigned default values within the code for the sake of simplicity However expert users may tune these values to improve performances and or if they experiment convergence problems This can be done by editing the module dagmg mem at the top of the source file 16 3 Parallel version We assume that the previous section about the sequential version has been carefully read Many features are common between both versions such as most
31. llel MUMPS library See Section 1 3 and the README file for details e The smoother is now SOR with automatic estimation of the relaxation parameter w instead of Gauss Seidel which corresponds to SOR with w 1 In many cases AGMG still uses w 1 as before but this new feature provides additional robustness especially in non symmetric cases e Default parameters for the parallel version have been modified to improve scalability when using many processors This may induce slight changes in the convergence that should however not affect overall performances e Some compilation issues raised by the most recent versions of gfortran and ifort have been solved as well as some compatibility issues with the latest releases of the MUMPS library New features from release 3 1 2 e The printed output has been slightly changed the work number of floating points operations is now reported in term of work units per digit of accuracy see Sec tion for details New features from release 3 1 1 e A few bugs that sometimes prevented the compilation have been fixed the code es pecially MUMPS relates routines has been also cleaned to avoid superfluous warning messages with some compilers e The parameters that control the aggregation are now automatically adapted in case of too slow coarsening e Some limited coverage of singular systems is now provided see Section New features from release 3 0 the first three have no in
32. luation to get the local number of work units On the other hand the work for the solution phase is the number of local floating point operations again relative to the local cost of one residual evaluation If one has about the same number on each task it means that the initial load balancing whether good or bad has been well preserved in AGMG 25 3 4 Tuning the parallel version As already written in Section 2 5 default parameters defined in the module dagmgpar_mem at the very top of the source file are easy to change Perhaps such tuning is more useful in the parallel case than in the sequential one Here performances not only depend on the application but also on the computer e g the parameters providing smallest average solution time may depend on the communication speed Compared with the sequential version by default the number of pairwise aggregation passes npass has been increased from 2 to 3 This favors more aggressive coarsening in general at the price of some increase of the setup time Hence on average this slightly increases the sequential time of roughly 10 But this turns out to be often beneficial in parallel because this reduces the time spent on small grids where communications are relatively more costly and more difficult to overlap with computation If however you experience convergence difficulties it may be wise to restore the default npass 2 On the other hand you may try to obtain even more aggressive
33. lvers for linear systems from the discretization of scalar PDEs 2012 nttp homepages ulb ac be ynotay AGMG numcompsolv pdf Y NOTAY AND A NAPOV A massively parallel solver for discrete poisson like prob lems Tech Rep GANMN 14 01 Universit Libre de Bruxelles Brussels Belgium 2014 Available online at http homepages ulb ac be ynotay Y SAAD SPARSKIT a basic tool kit for sparse matriz computations tech rep University of Minnesota Minneapolis 1994 http www users cs umn edu saad software SPARSKIT sparskit html Y SAAD Iterative Methods for Sparse Linear Systems SIAM Philadelphia PA 2003 Second ed H A VAN DER VORST AND C VuIk GMRESR a family of nested GMRES meth ods Numer Linear Algebra Appl 1 1994 pp 369 386 36
34. mber of passes and the target coarsening factor are the two remaining parameters described in these papers In addition nodes hav ing large positive offdiagonal elements in their row or column are transferred unaggregated to the coarse grid and AGMG print the related threshold 14 How the coarsening proceeds is then reported level by level The matrix on last level is factorized exactly and the associated cost is reported in term of number of work units that is AGMG reports the number of floating point operations needed for this factorization divided by the number of floating point operations needed for one residual evaluation at top level computation of b Ax for some b x which represents 1 work unit To summarize setup AGMG then reports on complexities The grid complexity is the sum over all levels of the number of variables divided by the matrix size the operator complexity is the complexity relative to the number of nonzero entries that is it is the sum over all levels of the number of nonzero entries divided by the number of nonzero entries in the input matrix see 7 eq 4 1 The theoretical weighted complexity reflects the cost of the preconditioner when two inner iterations are performed are each level see page 15 with y 2 for a precise definition The effective weighted complexity corrects the theoretical weighted complexity by taking into account that V cycle is enforced at some levels according to the stra
35. nificant only if 1JOB 0 2 3 10 or 12 input output REAL kind 0 0e0 On input and if IJOB 10 or IJOB 12 initial guess for other values of IJOB the default is used the zero vector On output the computed solution input INTEGER Tells AGMG what has to be done 0 performs setup solve memory release no initial guess 10 performs setup solve memory release initial guess in 1 n 1 performs setup only preprocessing prepares all parameters for subsequent solves 2 solves only based on previous setup no initial guess 12 solves only based on previous setup initial guess in z 1 n 3 the vector returned in z l n is not the solution of the linear system but the result of the action of the multigrid preconditioner on the right hand side in f 1 n 1 erases the setup and releases internal memory IJOB 2 3 12 require that one has previously called AGMG with IJOB 1 change with respect to versions 2 x The preconditioner defined when calling AGMG with IJOB 1 is entirely kept in internal Hence the arrays A JA and IA are not accessed upon subsequent calls IPRINT with IJOB 3 Upon subsequent calls with IJOB 2 12 matrix needs to be supplied in arrays A JA IA but it will be used to perform matriz vector product within the main iterative solution process and only for this Hence the system is solved with this matriz which may differ from the matrix in A JA I
36. nk call agmg argument 5 ijob is 0 because we want a complete solve argument 7 nrest is 1 because we want to use flexible CG the matrix is symmetric positive definite call dagmgpar N a ja ia f x 0 iprint 1l iter tol amp MPI_COMM_WORLD listrank ifirstlistrank uncomment the following to write solution on disk for checking filename 1 8 sol out_ write filename 9 10 i2 2 IRANK processor dependent open 11 file filename form formatted write 11 88 15 Trn close 11 END program example_par 21 subroutine uni2dstrip mx my f a ja ia IRANK NPROC listrank ifirstlistrank Fill a matriz im CSR format corresponding to a constant coefficient five point stencil on a rectangular grid Bottom boundary is an internal boundary if IRANK gt 0 and top boundary is an internal boundary if IRANK lt NPROC 1 ta ey Sey Ce Pa Ry 83 implicit none real kind 0d0 a integer mx my ia x ja ifirstlistrank listrank ifirstlistrank x integer IRANK NPROC k 1 i j real kind 0d0 parameter zero 0 0d0 cx 1 0d0 cy 1 0d0 cd 4 0d0 s8 0 0 ia 1 1 do i l my 93 do j 1 mx k k 1 1 1 1 a 1 ed ja 1 x 98 f k zero if j lt mx then 1 1 1 a lj ex ja 1 k 1 103 else f k f k cx end if if i lt my then 1 1 1 108 a 1 cy ja 1 k mx else if IRANK NPROC 1 then f k f k cy real boundary else 113 1 1 1 internal boundary top a 1 cy these e
37. ns ios 2 ROSE AAA 20 Fine MINE ee Gb A eB a be es a EEO RES 3 Parallel version 3 1 Basic usage and matrix format cisco de eee eke we aes 3 2 Example 3 3 Printed output s ss 4 44 4 2 ode eed A SE DED EE ES 3 4 Tuning the parallel Versi n 42 044 24 2 44 445544454 Running several instances of AGMG Solving singular systems A Listing of DAGMG B Listing of AGMGPAR Reference S 0 oN A A Ke 17 17 20 24 26 27 27 29 32 36 1 Introduction The software package provides subroutines written in FORTRAN 90 which implement the method described in 7 with further improvements from 5 8 The main driver subroutine is AGMG for the sequential version and AGMGPAR for the parallel version Each driver is available in double precision prefix D double complex prefix Z single precision prefix S and single complex prefix C arithmetic These subroutines are to be called from an application program in which are defined the system matrix and the right hand side of the linear system to be solved A Matlab interface is also provided that is an m file agmg m implementing a Matlab function agmg that allows to call sequential AGMG from the Matlab environment This guide is directed towards the use of the FORTRAN version however Section on additional parameters Section 2 4 on printed output and Sections 4 5 about some special usages apply to the Matlab version as well Basi
38. olve only or for just one application of the multigrid preconditioner that is a call to Algorithm 3 2 in 7 at top level to be exploited in a more complex fashion by the calling program ijob can also be used to tell AGMG to work with the transpose of the input matrix Details are as follows see also comments in the subroutine listing in Appendix A ijob usage or remark 0 performs setup solve memory release no initial guess 10 performs setup solve memory release initial guess in x 1 performs setup only preprocessing prepares all parameters for subsequent solves 2 solves only based on previous setup no initial guess 12 solves only based on previous setup initial guess in x 3 the vector returned in x is not the solution of the linear system but the result of the action of the multigrid preconditioner on the right hand side in f 1 erases the setup and releases internal memory 100 110 101 102 112 same as respectively 0 10 1 2 12 but use the TRANSPOSE of the input matrix 2 3 12 102 112 require that one has previously called AGMG with ijob 1 or ijob 101 1 101 the preconditioner defined when calling AGMG is entirely kept in internal memory hence 3 the input matrix is not accessed 2 12 102 112 the input matrix is used only to perform matrix vector product within the main iterative solution process It means that upon calls with ijob 2 12 102 112 the input matrix may
39. outines from LAPACK and BLAS together with the package Alternatively LAPACK may be downloaded from http www netlib org lapack and BLAS may be downloaded from http www netlib org blas Note that these software are free of use but copyrighted It is required the following Jf you modify the source for a routine we ask that you change the name of the routine and comment the changes made to the original This of course applies to the routines we redistribute together with AGMG In addition the parallel version requires the installation of the parallel MUMPS library 4 which is also public domain and may be downloaded from http graal ens lyon fr MUMPS Professional version only The parallel MUMPS library needs not to be installed when using the variant provided in the source files whose name contains NM for No MUMPS see the README file for details Indeed this variant uses only a sequential version of MUMPS actually the same that is used by the sequential version of AGMG Hence what is written below for sequential AGMG applies verbatim to this variant of the parallel version Note that the performances of both variants may differ As a general rule the MUMPS free variant is expected to be better and even much better above 1000 cores 10 Feel free to contact us for advice if you seemingly face scalability issues with that version Despite the sequential version of AGMG also uses MUMPS it does not requir
40. ro referencing an external variable corresponding to 7 CONSISTENCY OF LOCAL ORDERINGS Besides the condition that they are larger than N indexes of nonlocal variable may be chosen arbitrarily providing that their ordering is consistent with the local ordering on their home task the task to which the corresponding row is assigned That is if A and Aj are both present and such that j gt N and if further j and l have same home task as specified in listrank see below then one should have j lt l if and only if on their home task the variable corresponding to j has lower index than the variable corresponding to These constraints on the input matrix should not be difficult to meet in practice Thanks to them AGMG may set up the parallel solution process with minimal additional information In fact AGMG has only to know what is the rank of the home task of each referenced non local variable This information is supplied in input vector listrank Let Jmin and Jmax be respectively the smallest and the largest index of a non local variable referenced in ja N lt jmin lt Jmax Only entries listrank j for jmin lt 7 lt Jmax Will be referenced and need to be defined If j is effectively present in ja listrank j should be equal to the rank of the home task of the row corresponding to j otherwise listrank j should be equal to an arbitrary nonpositive integer listrank is declared in AGMGPAR as listrank
41. ted in a k ja k for k ia 1 ia i 1 1 ia must have length at least N 1 and ia N 1 must be defined in such a way that the above rule also works for i N that is the last valid entry in arrays a ja must correspond to index k ia N 1 1 By way of illustration consider the matrix 10 1 2 11 3 A 4 12 5 13 8 9 14 The compressed sparse row format of A is given non uniquely by the following three vectors a 10 0 1 0 2 0 110 3 0 4 0 12 0 5 0 13 0 14 0 9 0 8 0 ja 1 411 2 4 2 3 5 4 5 3 2 ia 1369 10 13 As the example illustrates see how is stored the last row of A entries in a same row may occur in any order AGMG performs a partial reordering inside each row so that on output a and ja are possibly different than on input they nevertheless still represent the same matrix Note that the SPARSKIT package http www users cs umn edu saad contains subroutines to convert various sparse matrix formats to CSR format 2 2 Additional parameter Other parameters to be supplied are ijob iprint nrest iter and tol 2 2 1 ijob ijob has default value 0 With this value the linear system will be solved in the usual way with the zero vector as initial approximation this implies a setup phase followed by an iterative solution phase 7 Non default values allow to tell AGMG that an initial approximation is given in x and or to call AGMG for setup only for s
42. tegy described in 7 Section 3 This allows to better control the complexity in cases where the coarsening is slow In most cases the coarsening is fast enough and both weighted complexities will be equal but when they differ only the effective weighted complexity reflects the cost of the preconditioner as defined in AGMG Memory usage is reported in term of real 8 double precision words per nonzero entries in the input matrix the absolute value is also given in Mbyte Note that AGMG reports the peak memory that is the maximal amount used at any stage of the setup This memory is dynamically allocated within AGMG Eventually AGMG reports on the real time elapsed during this setup phase Next one enters the solution phase AGMG informs about the used iterative method as defined via input parameter nrest the used smoother and the number of smoothing steps which may also be tuned by expert users How the convergence proceeds is then reported iteration by iteration with an indication of both the residual norm and the relative residual norm i e the residual norm divided by the norm of the right hand side Note that values reported for Iter 0 correspond to initial values nothing done yet When the iterative method is GCR AGMG also reports on how many restarts have been performed Upon completion AGMG reports for each intermediate level statistics about inner iter ations Ffcall is the number of times one entered this
43. the computed solution input INTEGER Tells AGMG what has to be done 0 performs setup solve ry release no initial guess 10 performs setup solve memory solemn initial guess in iin 1 performs setup only preprocessing prepares all parameters for subsequent solves 2 solves only based on previous setup no initial guess 12 solves only based on previous setup initial guess in x 1 n 3 the vector returned in z l n is not the solution of the linear system but the result of the action of the multigrid preconditioner on the right hand side in f 1 n i erases the setup and releases internal memory IJOB 100 110 101 102 112 same as respectively IJOB 0 10 1 2 12 but use the TRANSPOSE of the input matrix in A JA IA 11 IJOB 2 3 12 102 112 require that one has previously called AGMG with 1J08 1 or IJOB 10 change with respect to versions 2 1 The preconditioner defined when calling AGMG with IJOB 1 or IJOB 101 is entirely kept in internal memory Hence the arrays A JA and IA are not accessed upon subsequent calls with IJOB 3 Upon subsequent calls with IJOB 2 12 102 112 a matriz needs to be supplied in arrays A JA IA but it will be used to perform matriz vector product within the main iterative solution process and only for this Hence the system is solved with this matrix which may differ from the matrix in A JA IA that was supplied upon the previous call with IJ
44. this communicator by default it is MPI_COMM WORLD All instances or tasks sharing this MPI communicator must call simultaneously the main driver AGMGPAR as follows call dagmgpar N a ja ia f x ijob iprint nrest iter tol amp MPI_COMM listrank ifirstlistrank Each task should supply only a portion of the matrix rows in arrays a ja ia EACH ROW SHOULD BE REFERENCED IN ONE AND ONLY ONE TASK N is the number of local rows LOCAL ORDERING Local rows and thus local variables should have numbers 1 N If a global ordering is in use a conversion routine should be called before AGMGPAR see the SPARSKIT package I http ww users cs umn edu saad for the handling matrices in CSR format for instance extracting block of rows 17 NONLOCAL CONNECTIONS Offdiagonal entries present in the local rows but connecting with nonlocal variables are to be referenced in the usual way however the corresponding column indices must be larger than N The matrix supplied to AGMGPAR is thus formally a rectangular matrix with N rows and an entry A with 1 lt i lt N corresponds to a local connection if j lt N and to an external connection if j gt N IMPORTANT RESTRICTION The global matrix must be structurally symmetric with respect to nonlocal connections That is if A corresponds to an external connection the local row corresponding to j whatever the task to which it is assigned should also contain an offdiagonal entry possibly ze
45. xternal nodes are given the ja 1 k mx numbers maxmy 1 max my 1 listrank k mx IRANK 1 Thus listrank maxmy 1 max my 1 IRANK 1 end if 118 if j gt 1 then 1 1 1 E pr 22 a l ex ja 1 k 1 else f k f k cx end if if i gt 1 then 1 1 1 a 1 cy ja 1 k mx else if IRANK 0 then f k f k cy real boundary else 1 1 1 linternal boundary bottom a 1 cy these external nodes are given the ja 1 k mx my 1 numbers max my 1 1 max my 2 listrank k mx my 1 IRANK 1 Thus listrank max my 1 1 max my 2 IRANK 1 end if ia k 1 1 1 end do end do return end subroutine uni2dstrip 23 3 3 Printed output Running the above example with 4 tasks task 0 produces the following output OF ENTER LNG AGMG 2 CCR CR OR llololololokok Global number of unknowns 998001 Ox Number of local rows 249750 Global number of nonzeros 4986009 per row 5 00 Ox Nonzeros in local rows 1247251 per row 4 99 O SETUP Coarsening by multiple pairwise aggregation ke Rmk Setup performed assuming the matrix symmetric KK Quality threshold BlockD 8 00 Strong diag dom trs 1 29 KKK Maximal number of passes 3 Target coarsening factor 8 00 KK Threshold for rows with large pos offdiag 0 45 Ox Level 2 xok Global number of variables 124563 reduction ratio 8 01 Ox Number of local rows 31249 reduction ratio 7 99 x Global number of nonzeros 622693 per row

User's Guide to AGMG

Contents

Download Pdf Manuals

Related Search

Related Contents