Home

FSAIPACK: User's guide

1. 15 KOLOTILINA L YU AND YEREMIN A YU 1993 Factorized sparse approximate inverse preconditioning I Theory SIAM J Mat Anal Appl 14 45 58 19 16 KOLOTILINA L YU AND YEREMIN A Yu 1999 Factorized sparse ap proximate inverse preconditioning IV Simple approaches to rising efficiency Numerical Linear Algebra with Applications 6 515 531 17 OPENMP WEBSITE http www openmp org 18 SaaD Y 1994 ILUT a dual threshold incomplete ILU factorization Num Lin Alg Appl 1 387 402 19 SAAD Y 2003 Iterative Methods for Sparse Linear Systems 2nd edn SIAM Philadelphia PA 20 SAAD Y AND SUCHOMEL B 2002 ARMS an algebraic recursive multi level solver for general sparse linear systems Num Lin Alg Appl 9 359 378 21 VAN DER SLUIS A 1969 Condition numbers and equilibration of matri ces Numer Math 14 14 23 22 WANG K AND ZHANG J 2003 MSP a class of parallel multistep suc cessive sparse approximate inverse preconditioning strategies SIAM J Sci Comput 24 1141 1156 20
2. 1 26 with 7 another user specified parameter When either 4 has been reduced below the prescribed tolerance e or a maximum number of iterations Kiter are performed the adaptive procedure is stopped and each row G i is properly scaled back to ensure that G JAG 4 J 1 27 A summary of the above adaptive procedure as implemented in FSAIPACK is provided in the Algorithm 3 As to the computational cost each iteration requires the same operations as Algorithm 1 plus the computation of Vw If we denote by mg the number of non zeroes of the i th row of Ga the cost for computing Vyk is on the average proportional to Ma Mp While the cost for gathering and solving the dense linear subsystem are proportional to mz and mj respectively as already pointed out previously After kiter iterations of the above procedure starting from Go I with 7 0 the cost c for computing the i th row is kiter Ci A e ma sk sk sk aoma 01 kiter A0TTA a2 kher tasker tOak rer k 1 28 where a i 0 4 are five scalar coefficients depending on s For kiter large enough i e approximately kiter gt VMA the overall cost of the adaptive procedure turns out to be proportional to the fourth power of the preconditioner density n C GN khe n Maule 29 i 1 The large computational cost has to be compensated for by a faster PCG con vergence ALGORITHM 3 ADAPT_FSAI Dynamic FSAI with adaptive patter
3. APPEND_LEFT and APPEND_RIGHT that return a FSAIP_Prec variable 4 FSAIPACK Command Language FSAIPACK allows for the different algorithms to be combined in order to op timize the preconditioner performance For instance an initial pattern created by the MK_patt function can be improved by a few steps performed with ei ther CPT adaptive FSAI or CPT projection FSAI potentially obtaining a bet ter preconditioner than that computed using a static or a dynamic technique 14 Function Input parameters Output parameters MKk_patt CSRMAT A Pattern patt out integer k double Ti mins maz Pattern pattin optional CPT_static_ FSAI CSRMAT A CSRMAT G Pattern pattin CPT_adaptive FSAI CSRMAT A CSRMAT G integer kiter double T E CSRMAT Go optional CPT_projection FSAI CSRMAT A CSRMAT G integer kiter Mmax double T CSRMAT G CSRMAT Gp GP optional CPT_preconditioned_Matriz CSRMAT A CSRMAT B integer Wines double T CSRMAT G FILTER_FSAI CSRMAT A CSRMAT G integer Wimax double T CSRMAT G TRANSPOSE_FSAI CSRMAT G CSRMAT GT APPEND_LEFT CSRMAT G FSAI Prec FSAI APPEND_RIGHT CSRMAT GT FSAI Prec FSAI Table 2 Summary of the functions directly accessible at the user level with the Input and Output parameters 15 Type Character Comment Command gt Data Any number Table 3 First non blank character of each line type on
4. FSAIPACK User s guide July 2013 Contents 1 2 Introduction Theoretical background 2 1 2 2 2 3 2 4 2 5 2 6 Static ESAL s aia dod a IA A Static pattern generation oo a Dynamic FSAI computation with adaptive pattern generation Fully iterative FSAI construction ooa FSAI post filtration ooa 02000004 Recurrent FSAI computation 2 Package decription FSAIPACK Command Language 4 1 4 2 Sta TUS e yatta os Th ee Be a ee ee we ea a ae Examples o ans ae et ae oe ee A a ia a 1 Introduction The parallel solution to large sparse linear systems of equations Az b 1 where A R a b R is a central task in computational science and engineering While iterative methods based on Krylov subspaces are well suited for an effective implementation on High Performance Computing HPC plat forms most preconditioning techniques are not For a recent thorough review see e g 6 A well known preconditioner for Symmetric Positive Definite SPD linear systems is the Factorized Sparse Approximate Inverse FSAI The native FSAI algorithm was originally introduced by 15 about 20 years ago with the aim at computing an explicit sparse approximation of the inverse of the exact lower Cholesky factor of A by means of a Frobenius norm minimization procedure FSAI is a robust tool endowed with attractive theoretical properties such as its optimality for a given sparsity
5. amic way thus thoroughly exploiting the preconditioner full potential The combination of different strategies can be managed by direct library calls or by the FSAIPACK Command Language that provide the user with a great flexibility and easiness of use The strategy can be defined by a sequence of commands in ASCII format that can be interpreted by the driver The commands follow a number of syntax rules that must be complied with for a successful run FSAIPACK is programmed in Fortran90 using the OpenMP directives for shared memory parallel computations 2 Theoretical background The FSAI preconditioner M7 for SPD matrices is defined as the product M GTG 2 where G is an approximation of the inverse of the exact lower Cholesky factor L of A Theoretically FSAI is computed by minimizing the Frobenius norm GL r 3 over all G Ws where Ws is the set of matrices with the lower triangular non zero pattern S In contrast to other inverse ILU techniques 2 the computation of G can be performed without knowing L explicitly Differentiating 3 with respect to the G entries gij and setting to zero yields GA Ely VI ES 4 Since L is upper triangular while S is a lower triangular pattern the matrix equation 4 can be rewritten as jaya yeh HOS E Since L is unknown G is not available from 5 Instead we compute G such that i GA dy VEJES 6 where 6 is the Kronecker delta The FSAI factor G is fina
6. an be obtained with k steps of the recurrence B Low B 1 A NN 16 starting from Bo I In equation 16 Low denotes the lower triangular part of a matrix The non zero pattern computed according equation 16 can be come quite dense also for relatively small values of k Hence a safety parameter Hmazx 18 introduced that provides an upper limit to the preconditioner density Whenever Umax is reached the procedure 16 is stopped On the other hand in order to include in S also entries belonging to high powers of A pre filtration of A may be recommended 5 In the FSAIPACK implementation a sparsified matrix A is computed by dropping all the entries a of A such that Qij lt T anajg 17 with 7 a user specified real parameter Setting the optimal value for 7 is strongly problem dependent and may be a difficult task if the coefficients of A lie in a narrow interval In particular a large 7 value may cause the loss of too many entries For this reason the parameter min that enforces a minimal density for A is also introduced The pre filtration process desribed above has a linear complexity hence re peating the procedure a few times does not impact signficantly on the overall set up cost Algorithm 2 describes the procedure used to generate S in FSAIPACK where the function Drop summarizes the pre filtration process of equation 17 2 3 Dynamic FSAI computation with adaptive pattern generation Adaptive procedures hav
7. atrix is K det A II Wo i 23 The objective is to find an augmented pattern Sa allowing for a reduction of k in equation 23 The idea is to compute the gradient of k with respect to Go and add in So the positions corresponding to its largest components Notice that each Wo in 23 depends on the i th row of Go only so the adaptive pattern generation can be efficiently carried out in parallel Denote by Vo the gradient of Yo with respect to the components of Go fi Its components read Ooi vo 2 5 Qjr90 ir aji Vj 1 i 1 24 090 ij r 1 with go i the coefficients of Goli i e Vwo can be computed at the cost of a sparse matrix by vector product After computing Va using equation 24 the non zero pattern P is enlarged by adding the s positions corresponding to the largest entries of Vwo i thus obtaining P and then g from equation 19 The above procedure can be repeated k times to find all the rows of G G3 Gp The current preconditioner quality can be monitored quite inexpensively by computing the value of Yk Therefore it is possible to stop the adaptive procedure for every row whenever a relative exit tolerance is met Wki Gali JAG i qe oi Goli JAGO i 7 lt e 25 with e a user specified parameter To reduce the computational cost of each iteration it is also possible to enforce some dropping by neglecting those entries 91 5 Such that Irail lt Tlga j 1 1
8. e been developed to generate dynamically S during the FSAI computation with the aim at selecting for each row the best positions 8 10 12 In FSAIPACK we use an approach similar to the one proposed in 12 for Block FSAI preconditioning The use of this algorithm in the context of FSAI is new Suppose that an initial guess Go for G has already been computed and call S its non zero pattern The default initial guess is Go diag A with S the diagonal pattern however any other choice is acceptable Then write Go as Go DoGo 18 where Do diag Go i e the diagonal entries of Go are unitary It can be proved 12 that the entries of Go can be directly computed by solving the dense linear systems AP Pildo AP J L 2 n 19 where P P i P j i j S and go is the vector collecting the non zero off diagonal terms in the th row of Go Consider the Kaporin conditioning number of the preconditoned matrix Go AG 1 gor A a tr GoAGF tr DoGoAGp Do I ee pr nes 1 20 det Go AG det DoGo AGE Do n Recalling that Go AG has unitary diagonal entries and det Go det GT it follows that 1 1 1 det diag Go AGT aT det A 7 21 det A det Do Denoting by Gh fi the i th row of Go the numerator of equation 21 reads det diag GAG IT JAG li JI Do 22 with Yo Goli TAGS i 7 Hence the Kaporin conditioning number of the preconditioned m
9. ed at the end of a command or a data type line A command line is a string with at most 100 characters with the syntax gt keyword input input output character character The keyword is the identifier of a function implemented in the FSAIPACK li brary The symbols and include the identifiers of the CSRMAT objects used as input and output of the function identified by the keyword A function can have more than one input objects separated by the symbol All functions 16 have only one output object The symbol is the separator between input and output objects All functions depend on a set of user specified parameters If the user does not specifies a parameter value a default is automatically as sumed The parameters set by the user are identified by the symbol followed by a character associated to the parameter Each command type line must be followed by a number of data type lines equal to the number of parameters set by the user Each data type line contains a numerical value that is assigned to a parameter according to the order of the parameters declared in the command type line Blank characters can be inserted wherever in a command or data type line and are discarded by the interpreter The available keywords required input and output objects user specified parameters and default values are summarized in Table 4 The symbols A and B denote CSRMAT type variables patt a non zero pattern included in a CSRMAT ty
10. effects on the PCG convergence rate an a posteriori sparsification of G can be performed This technique introduced in 16 is called post filtration and proceeds as follows Assume that G has already been computed The off diagonal non zero entries of the i th row stored in the dense vector g are processed using a dual threshold strategy similar to the one suggested in 18 in the context of ILU factorizations Two user specified parameters Minay and 7 are needed such that only the largest Mmaz off diagonal entries of g are retained with an absolute value larger than T g 2 Each vector g is therefore decomposed as gi J i A 7 35 where y and e collect the sparsified row and the discarded entries respectively All vectors g i 1 n form the matrix G which however no longer satis fies the condition that the preconditioned matrix GAG has unitary diagonal entries This condition helps accelerate the PCG convergence 21 hence to restore it the actual post filtered FSAI factor G is defined by scaling G with the aid of the diagonal matrix D G DG 36 Let us define the set collecting the column indices of the discarded entries e 4l es 37 Then the coefficients of D can be computed as di 1 eT AS Ge 7 38 The post filtration procedure is summarized in Algorithm 5 As to the computational cost filtering a row with m non zeroes and selecting the largest entries have a linear and superlinear com
11. g S are described in the sequel ALGORITHM 1 STATIC_FSAI FSAI computation from a given static non zero pattern Input Matrix A the non zero pattern S Output Matrix G for i 1 n do Gather A P Pi from A Solve A P Pi 9 em gi g V Jimi Scatter the coefficients in g into G i Pi end 2 1 Static FSAI Assume that the non zero pattern S is given Denote by P the set of column indices belonging to the i th row of S Pi j 1 5 S 10 and by A P Pi the submatrix of A collecting the entries ag such that k l Pi Let us indicate with g and g the dense vectors containing the non zero coef ficients in the i th row of G and G respectively Using this notation equation 6 can be re written as A Pi Pilg m i 1 2 n 11 where em is the m th vector of the canonical basis of R The row g of G is obtained by scaling g with the square root of its last entry es V Jimi The procedure for computing a static FSAI given the non zero pattern S is summarized in Algorithm 1 From a computational viewpoint for a generic row i the most expensive tasks in Algorithm 1 consist of gathering A P P from A and solving the related linear system with the numerical complexities proportional to m and m3 respectively In contrast the tasks of scaling g and scattering the coefficients of g into G have a linear complexity with m hence their cost is negligible Let us define the preco
12. h requiring a sparse matrix by a sparse vector multiplication At most the computational complexity of these tasks is proportional to MA Mmax and Mpy 1 MA Mmaz gt where Mpy 1 is the average number of non zeroes per row of M t The overall cost for the fully iterative construction of the i th row is therefore Ci A kiter 1 My 1 MA Mma 33 with the total cost estimated as con kier 1 Mpy 1 Ma Mmae 2 kiter 1 Mpy 1 M ua 34 Hence the cost for computing G with Algorithm 4 is approximately linear with the density ug with the proportionality constant depending linearly on kiter and my 1 For such a reason the use of a preconditioned iteration may have a strong impact on the overall cost and should be used only when a quite sparse and effective MT is available 10 ALGORITHM 5 POST_FILT FSAI post filtration Input Matrix A Matrix G Parameters T Mmaz Output Matrix G for i 1 n do Select the Mmaz largest entries of g such that gi gt Tllg ll2 Store the selected entries in g and the discarded entries in Create the set of discarded column indices amp Gather A E from A di 1 ef A E Jey 5 Ji dig Scatter the coefficients of J into Gli P Ey end 2 5 FSAI post filtration Experience suggests that FSAI may have several small entries independently of the strategy used to generate S In order to make the FSAI application cheaper with no important detrimental
13. ithms exhibit a high degree of parallelism an MPI implementation for distributed memory computers is not theoretically difficult and is currently under development A number of derived data types and classes are defined and can be accessed at the user level with the aim at making the FSAIPACK use easier Table 1 e Pattern a data structure used to store the pattern of a matrix in the compressed sparse row storage mode 19 Pattern includes two member integer variables NROWS and NTERM with the size and the number of non zeroes of the matrix respectively and two member integer arrays IAT and JA storing the pointer to the beginning of each row and the column indices of the non zeroes of the matrix respectively e CSRMAT a derived data type that extends the Pattern type so as to include also the coefficients of the matrix CSRMAT includes as members a Pattern variable PATT and a double precision array COEF collecting the matrix coefficients in row major order e FSAIP_Prec a derived data type containing two lists of pointers to CSRMAT type variables LEFT and RIGHT The definition of this data type has been selected to store a FSAI preconditioner in the most general form 40 LEFT and RIGHT are the list in ascending and descending order of the Gk and GT factors respectively The FSAIP Prec type variable is accessed by a function that includes new factors in the LEFT and RIGHT lists and by a routine that applies the preconditioner to a
14. lly found scaling G G DG 7 where D is a diagonal matrix The choice of D is not unique In principle the best choice would be using the inverse of the diagonal entries of L However it can be shown that if the diagonal entries of GAG are unitary the Kaporin number of the preconditioned matrix 4 tr GAG ES Get GAGT I 8 is minimum over all matrices belonging to Ws 13 14 Such a condition can be enforced by setting p D diag G 9 The FSAI preconditioner is very robust as it exists for any non zero pattern S that includes the main diagonal and the preconditioned matrix is SPD by construction Moreover the computation of each row of G can be performed independently on the others thus the resulting setup process exhibits a high inherent parallelism However the actual FSAI performance is intimately con nected to the selection of S Quite obviously the denser the pattern the more accurate and expensive the preconditioner In particular equation 6 shows that the i th row of G is computed by solving a dense m x m system with m denoting the number of non zeroes retained in the i th row As a conse quence the setup cost turns to be proportional to m3 thus growing up very quickly as m increases The optimal a priori choice of an appropriate non zero pattern is a difficult task that can be addressed either statically or dynamically e g 9 5 10 12 The most effective techniques currently available for definin
15. ly To improve the easiness of use of FSAIPACK a command language is de fined that allows for the implementation of a user specified strategy to compute FSAI The commands are stored as an array of character variables of maximal length 100 Each element of such array represents a single line of the strategy and must comply with a set of syntax rules The strategy array is used as input for the FSAIPACK driver with the Command Language interpreter provided as an open source code 4 1 Syntax rules Strategy lines belong to three types 1 comment type 2 command type 3 data type The type of each line is identified by the first non blank character according to Table 3 The objects handled by the strategy are CSRMAT type variables identified by a user specified string of characters except the final preconditioner that is a FSAIP_Prec type variable The CSRMAT objects are managed in a fully dynamic way by the command language allocating and deallocating automat ically the user specified data structures whenever needed Mandatory object identifiers are 1 A system matrix identifier 2 PREC final preconditioner identifier A and PREC must be used at least once in each strategy The maximum length of the string identifying each object is 11 characters A comment line can be introduced at any point of a strategy FSAIPACK Command Language discards any character located after the comment type identifier A comment can be also introduc
16. n generation Input Matrix A Matrix Go optional Parameters kiter S T Output Matrix G if Go is present then Go diag Go Go else Go T end for 1 n do for k 0 kiter 1 do Compute Vwpz Compute P sd by adding to P the indices of the s largest components of Vir Gather APP and API i from A Fk 1 5k 1 1 Solve A Pi Pi Jk A P 4 1 if Yk 1 i lt Yo then exit the loop over k Drop the entries of 9 1 such that Jk 1 ij lt TG 441 Alle end F aktl yi dii 1 ATP 0 2 Gri diri Scatter the coefficients of g into Gk 1li PrN end 2 4 Fully iterative FSAI construction The adaptive pattern generation can be computationally expensive with the most expensive task being the repeated solution to dense linear subsystems with size Mp The objective is to minimize Wk for each row 7 i e the quadratic form 22 To do this a steepest descent algorithm can be implemented Let us define a such that br r li AV gt min 30 with a reading Vue Voz a VES Vr 31 Vb iAV Yki Then the current i th row of Gr is updated as Grili Grli a Vr 32 Actually after only a few iterations Gi 2 may be full so that some dropping is mandatory In the FSAIPACK implementation a dual dropping strategy is enforced by means of the user specified parameters T and Mmax representing as usual the relative threshold tolerance and the maximum number of entries to be
17. nce the sum of the costs for computing each factor Gp is generally smaller than the cost for computing the final G in just one step The overall cost of the recurrent FSAI computation depends on both the strategy used for G at each level and the sparse matrix matrix product Agy1 Gy AGP The full computation of Ax 1 is not feasible hence a dropping is implemented using again the dual strategy defined by Mmaz and 7 and preserving the symmetry of Ax A sketch of the procedure used to compute B GAG is provided in Al gorithm 7 The cost c for computing the i th row of B with Algorithm 7 is proportional to Ci X MGi MA Mmaz Mai 42 12 ALGORITHM 7 PREC_MAT Computation of a FSAI preconditioned matrix Input Matrix A Matrix G Mmazx T Output Matrix B GAGT for i 1 n do v AGI Select the Mmaz largest entries of v such that v gt r vll2 j 1 2 n w Gv Select the Mmaz largest entries of w such that w gt 7 wll2 j i i 1 n Scatter w into Bli end Transpose the strict upper part of B into its strict lower part with the overall average cost can Me Ma Mmar n MA MA Mmax UG 43 3 Package decription The FSAIPACK software package is a collection of functions and subroutines that implement the Algorithm 1 through 7 for shared memory parallel computers with the Fortran90 programming language and OpenMP directives for multi thread execution 17 As all the algor
18. nditioner density ug as the ratio between the number of non zeroes of G and A 12 nnz G KET nnz A or equivalently as the ratio between the average number of non zeroes of each row of G Ma and A Ma 13 ma ua 14 TRA For a regularly distributed problem i e m Mg for any i the asymptotic cost c for the preconditioner setup turns out to be proportional to the third power of ug Cn Me n MA Ue 15 Therefore the cost for computing FSAI grows very rapidly with increasing the preconditioner density ALGORITHM 2 MK_PATTERN Generation of a static non zero pattern Input Matrix A parameters T min k Umax Output Non zero pattern S repeat At Drop A 7 if wz lt min then 7 T 43 fmin until 47 gt min Bo 1 for i 1 k do Bi Low Bi 1 A 5 if uB gt Umar then exit the loop end Store the pattern of By as S 2 2 Static pattern generation The problem of generating static non zero patterns for the FSAI preconditioner has been already discussed by several authors Traditionally for approximate inverses based on a Frobenius norm minimization 1 7 the non zero pattern of small powers of A is suggested However the number of entries in S grows so quickly with the power of A that only the second or third power can be used in real applications With FSAI only a lower triangular pattern is needed with the upper part of AF actually useless In 9 10 it is shown that better patterns c
19. pattern with respect to the Kaporin conditioning number of the preconditioned matrix 14 The aspect that mainly impacts on the FSAI quality and efficiency is the selection of its sparsity pattern Originally the algorithm was conceived assuming that a pattern S for the non zero entries of the preconditioner is statically set by the user A typical choice based on the Neumann series expansion of 47 is to use for S the pattern of a small power of A i e A or A3 However even small powers of A can be very dense thus slowing down the overall algorithm Possible remedies to such a drawback can be either using the power of a sparsified A 5 or post filtering the FSAI preconditioner by dropping its smallest entries 16 or both More recently also a dynamic pattern computation has been advanced 12 using an algorithm that chooses adaptively the most significant terms to be retained for each row of the preconditioner Another possibility is to build the preconditioner recursively so as to compute a dense and high quality FSAI as the product of a few sparse factors 3 The choice of the most efficient technique for generating the FSAI non zero pattern is strongly problem dependent In most cases a combination of existing strategies would probably provide the best outcome The software package FSAIPACK collects all existing ways for computing a FSAI preconditioner FSAIPACK allows for combining different strategies to build S in both a static and dyn
20. pe variable G a CSRMAT type variables containing a FSAI preconditioner factor and FSAI the FSAIP_Prec type variable with the final preconditioner which is always denoted by PREC in our syntax 4 2 Examples This is an example strategy This is an example strategy MK_PATTERN A patt k t Two out of four parameters are set Number of steps of power recurrence 05 Pre filtration tolerance STATIC_FSAI A patt G The static FSAI is improved dynamically ADAPT_FSAI A G n e G is an Input Output object 10 Number of adaptive steps 1 e 3 Exit tolerance Post filtering a FSAI factor is generally recommended gt POST_FILT A G Default parameters are used gt gt V HVON V It is necessary to transpose G and append the left and right factors TRANSP_FSAI G Gt APPEND_FSAI G Gt PREC PREC is a mandatory identifier The strategy above performs the following operations e the system matrix is used to build the static pattern patt by k 2 steps of the power recurrence and t 0 05 as pre filtration tolerance e the system matrix and the pattern patt are used to compute statically the FSAI factor G e the factor G is improved dynamically by n 10 steps of the adaptive pro cedure with exit criterion e 107 e the factor G is post filtered according to the entries of the system matrix e the factor G is transposed into Gt e the factors G and Gt are appended in the final preconditioner 17 Function Keywo
21. plexity respectively while the cost for the computation of di is proportional to m as it requires the i 11 ALGORITHM 6 REC_FSAI Recurrent FSAI computation Input Matrix A nz Output Matrices Gi G2 Gn Ao A Go for k 0 n 1 do Ar 1 GrArGT Compute Gzy1 as the FSAI factor of Ak 1 end gathering of a submatrix from A Hence the average post filtration cost c can be estimated as Cn Me n MA He 39 and is therefore usually negligible relative to the cost of computing G The post filtration procedure generally produce sparse FSAI factors with no significant loss in the convergence rate 2 6 Recurrent FSAI computation Another way to reduce the FSAI cost though retaining a large number of non zero entries is to compute G as the product ne G G 40 k 1 where Gk k 1 ng is the k th level FSAI The final G implicitly inherits a large number of non zeroes even though each Gk is rather sparse This idea first introduced in 16 and later developed by 22 3 looks similar to multilevel preconditioners e g 20 11 The k th term Gk is computed as the FSAI factor of Ag Ax Gr iAp 16 41 by any of the previous techniques The procedure starts with Go I and Ao A and proceeds through ny levels A summary is provided in Algorithm 6 Quite obviously the matrix G in 40 is never formed explicitly and its density is generally much larger than the sum of the G densities He
22. rd Input Output Parameters identifier default MK_Patt MK_PATTERN In A T t 0 05 In Out patt k k 3 min m 0 20 maz M 5 00 CPT_static FSAI STATIC_FSAI In A patt Out G CPT_adaptive_FSAI ADAPT_FSAI In A kiter n 30 In Out G S s 1 T t 0 00 E e le 3 CPT_projection FSAI PROJ_FSAI In A kiter n 10 Opt in Gp Gh Mmar S 10 In Out G T t 0 00 E e le 8 CPT_preconditioned_Matrix PREC_MAT In A G GT mmac n n Out B T t 0 00 FILTER_FSAI POST_FILT In A Morag n n In Out G T t 0 05 TRANSPOSE_FSAI TRANSP_FSAI In G Out GT APPEND_LEFT and APPEND_FSAI In G GT APPEND RIGHT In Out FSAI Table 4 Keywords input output objects parameter identifiers and defualt values 18 Recall that the APPEND_FSAI keyword is mandatory at the end of any strategy References 1 Benson M W 1973 Iterative solution of large scale linear systems M Sc Thesis Lakehead University Thunder Bay ON 1973 2 BENzI M AND TUMA M 1999 A comparative study of sparse approximate inverse preconditioners Applied Numerical Mathematics 30 305 340 3 BERGAMASCHI L AND MARTINEZ A 2012 Banded target matrices and recursive FSAI for parallel preconditioning Numerical Algorithms 61 223 241 4 CHOW E i AND SAAD Y 1998 Approximate inverse preconditioners via sparse sparse iterations SIAM J Sci Comput 19 995 1023 5 Cuow E 2000 A priori sparsity patterns for parallel spa
23. retained respectively 18 As in Algorithm 3 the iterative procedure for building FSAI is controlled by two user specified parameters kiter and e defining the maximum number of iterations and the exit tolerance of the steepest descent algorithm respectively Because of the dropping this procedure is actually an incomplete steepest descent algorithm ALGORITHM 4 PROJ_FSAI Fully iterative FSAI computation Input Matrix A kiter Mmazx T Matrix Go optional Matrix M optional Output Matrix G if Go is present then Go diag Go Go else Go I end if M7 is not present then M7 lt I for i 1 n do for k 0 kiter 1 do Compute Vx p MVY a V Yki P pP Ap Graki Grli aiki Select the Mmaz largest entries of Gr 1li larger than T Gr 1 5 lla if Yk 1 i lt Yo then exit the loop over k The steepest descent convergence can be accelerated by using an inner pre conditioner M7 e g a previously computed FSAI at the expense of an addi tional computational cost The fully iterative FSAI computation procedure is provided in Algorithm 4 that is reminiscent of the idea proposed in 4 where a self preconditioned descent type method is used to compute a non factored approximate inverse All iterations in Algorithm 4 have approximately the same computational cost because a fixed number of entries are retained for each row The more expensive operations are the computation of Vk and a bot
24. rse approximate inverse preconditioners SIAM J Sci Comput 21 1804 1822 6 FERRONATO M 2012 Preconditioning for sparse linear systems at the dawn of the 21st century history current developments and future perspectives ISRN Applied Mathematics 2012 Article ID 127647 doi 10 5402 2012 127647 7 FREDERICKSON P O 1975 Fast approximate inversion of large sparse linear systems Mathematical Report 7 Lakehead University Thunder Bay ON 1975 8 GROTE M AND HUCKLE T 1997 Parallel preconditioning with sparse approximate inverses SIAM J Sci Comput 18 838 853 9 HUCKLE T 1999 Approximate sparsity patterns for the inverse of a matrix and preconditioning Appl Num Math 30 291 303 10 HuckLE T 2003 Factorized sparse approximate inverses for precondi tioning J Supercomput 25 109 117 11 JANNA C FERRONATO M AND GAMBOLATI G 2009 Multilevel in complete factorizations for the iterative solution of non linear FE problems Int J Num Meth Eng 80 651 670 12 JANNA C AND FERRONATO M 2011 Adaptive pattern research for Block FSAI preconditioning SIAM J Sci Comput 33 3357 3380 13 KAPORIN I E 1990 A preconditioned conjugate gradient method for solving discrete analogs of differential problems Diff Equat 26 897 906 14 KAPORIN I E 1994 New convergence results and preconditioning strate gies for the conjugate gradient method Num Lin Alg Appl 1 179 210
25. vector The derived data types are supplemented with the respective constructors and destructors along with other specific functions 13 Derived Data Type Member Variables Member Functions Pattern NROWS NTERM IAT JA new Pattern dlt_Pattern CSRMAT PATT COEF new CSRMAT dlt_CSRMAT new CSRCOEF dlt_CSRCOEF copy CSRMAT FSAIP Prec LEFT RIGHT append LEFT append_RIGHT delete_LEFT delete_RIGHT apply_FSAIP_Prec dlt_FSAIP_Prec Table 1 Summary of the user accessible derived data types The other functions accessible to the user are Table 2 e MK_patt creates a pattern for the static FSAI computation by Algorithm 2 3 e CPT static FSAI computes the coefficients of a static FSAI by Algorithm 1 e CPT adaptive_ FSA computes a FSAI factor with adaptive pattern gen eration of Algorithm 3 e CPT_projection_FSAI computes a fully iterative FSAI by Algorithm 4 e CPT_preconditioned_Matiz computes the sparse sparse product of three matrices of Algorithm 7 e FILTER_FSATI post filters a FSAI factor by Algorithm 5 e TRANSPOSE_FSAI transposes a FSAI factor The member functions APPEND_LEFT and APPEND_RIGHT that append a FSAI factor and its transposed to the lists of a FSAI_Prec variable are indirectly accessible by using a specific interpreted command as it will be shown later All functions return a CSRMAT type variable except MK_patt that returns a Pattern type variable and

FSAIPACK: User's guide

Contents

Download Pdf Manuals

Related Search

Related Contents