Home
pdf file - ABC
Contents
1. 3 reserved R 4 100 0 threshold of MM ratio 5 Value of MM ratio 0 6 0 01 Exponent for the Exponential Moving Average 7 13 reserved R 14 Residual norm EIGENSOLVE LINEARSOLVE 15 Set up time EIGENSOLVE LINEARSOLVE 0 16 Preconditioner time EIGENSOLVE LINEARSOLVE 17 Solver time EIGENSOLVE LINEARSOLVE 18 Total time EIGENSOLVE LINEARSOLVE 0 19 Last Performed preconditioner parameter 20 reserved R 21 50 Xabclib s Information 21 reserved R 22 1 Max elapsed time limit time 23 1 0 8 Convergence criterion 24 reserved R L preconditioner parameter OES SOR type 3 relaxation omega 1 lt omega lt 2 ILU 0 type 4 Break down threshold default 1 0E 8 ILUT type 6 Dropping criterion 26 27 reserved R 28 L 2 norm of RHS 29 2 norm of max residual 0 30 Floating operations x 1079 operations 31 L preconditioner time 32 Total solve time elapsed 33 reserved R 11 OpenATLib Xabclib User s Manual for Version 1 0 Minimum running time When IATPARAM 32 21 reserved 12 OpenATLib Xabclib User s Manual for Version 1 0 3 How to use the OpenATLib If you want to develop own library using OpenATLib you should take the following processes 1 Put the static library of libOpenAT a to current directory 2 Call OpenATI INIT in program on own library so
2. G Oc O MFLAG JFSTART 0 O first element in each row SN 18 oo F FF FIF FF T 25 r r 26 F 28 F F 29 FE F F 31 Original Segmented Scan Branchless Segmented Scan Fig 3 6 An example of Original Segmented Scan and Branchless Segmented Scan 27 OpenATLib Xabclib User s Manual for Version 1 0 If you want to specify SpMxV implementation of OpenATI DSRMV or OpenATI_DURMYV you need to run setup function before call OpenATI DSRMV or OpenATI_DURMV OpenATI_DSRMV_Setup S1 No necessary to run setup function S2 Fix the groups of rows processed by each thread for normalized non zero elements S3 Fix the groups of rows processed by each thread for normalized non zero elements and the start and end point of reduction part of each thread OpenATI_DURMV_Setup U1 No necessary to run setup function U2 Fix the groups of rows processed by each thread for normalize non zero elements U3 Set array of MFLAG and JFSTART for Branchless Segmented Scan U4 Set array of FALG for Original Segmented Scan 28 OpenATLib Xabclib User s Manual for Version 1 0 3 4 3 Argument Details and Error Code of OpenATI DSRMV Setup 1 Argument Details Argument Description N The number of dimension for the matrix N gt 1 NNZ The number of
3. 9 Double OUTPUT 2 norm of max residual RATPARAM 30 Double OUTPUT Floating operations x10 9 operations RATPARAM 31 Double OUTPUT Preconditioner time RATPARAM 32 Double OUTPUT Total solve time RATPARAM 33 Doube 00 INPUT Minimum running time 4 Error Code Value Description 0 Normal return Less than 0 If i returns the value of i th argument is illegal 100 Computation was stopped by failing to make preconditioner 200 Computation was stopped by breakdown 400 Computation was stopped by exceeding the execution time tolerance 500 Computation was stopped by exceeding the maximum number of restart 600 Computation was stopped by failing to allocate memory in case of IATPARAM 10 12 13 21 79 OpenATLib Xabclib User s Manual for Version 1 0 700 Computation was stopped by the value of LUINF exceeds Integer max in case of IATPARAM 10 21 1000 Computation was stopped by stagnation of relative residual This error code is output only when IATPARAM 32 1 80 OpenATLib Xabclib User s Manual for Version 1 0 5 References 1 T Sakurai K Naono M Egi M Igai and H Kidachi Proposal on Runtime Parameter Auto Tuning Approach for Restarted Lanczos Method IPSJ SIG Notes 2007 111 pp 173 178 2007 in Japanese 2 M Kudo H Kuroda T Katagiri and Y Kanada The Effect of Optimal Algorithm Selection of Parallel Sparse Matrix V
4. 14 2 2 1 5 ir ei Res 14 2 2 2 Compliling eiii ba ea EE ve 15 2 2 3 Running sample programs essen 15 3 OpenATLib A Common Auto tuning Interface 16 ENNECO MT perc Tr 16 3 1 1 Overview of the function eene nne enne nennen 16 3 1 2 Argument Details and Error Code sse 16 3 2 OpenAL DAFSTQ a eet SS Mec caesus Scie 17 3 2 1 Overview of the function nennen nnne 17 3 2 2 Overview of the auto tuning method a 17 3 2 3 Argument Details and Error Code a 19 3 244 Usage Examples n terri deterior 21 3 3 OBenATI DAEP T Les Cat 22 3 3 1 Overview of the function enne nennen nennen 22 3 3 2 Overview of the auto tuning method a 22 3 3 8 Argument Details and Error Code a 23 3 34 Usage Example uu re it RE E ERR REOR s havea ERE dh 24 3 4 OpenATI DSRMV and OpenATI_DURMV 2 25 3 471 Overview of the ec PR PEU 25 3 4 2 Overview of auto tuning method a 25 3 4 8 Argument Details and Error Code of
5. Array of integer parameters for OpenATLib and 76 OpenATLib Xabclib User s Manual for Version 1 0 50 OUTPUT Xabclib RATPARA Double INPUT Array of double precision parameters for OpenATLib M 50 OUTPUT Xabclib WORK Double WORK Workspace LWORK LWORK Integer The size of the workspace for double precision WORK Satisfy LWORK gt 9 N N 1 2 1 INFO Error code B 2 Using parameters on IATPARAM Number Type Initial IO Description LIE AEN IATPARAM 3 Integer OMP GET INPUT Number of THREADS _ READSO IATPARAM 4 Integer 1 INPUT Flag of Krylov subspace expand by MM ratio IATPARAM 9 Integer 0 INPUT OpenATI DURMV auto tuned On Off 0 Perform SpMxV specified by IATPARAM 0 2 and 3 Perform SpMxV to judge the best method among three implementations IATPARAM 10 Integer 12 INPUT If IATPARAM 9 0 then set the number of implementations If IATPARAM 9 2 or 3 the best number of implementations returns 11 Row Decomposition Method TT OpenATLib Xabclib User s Manual for Version 1 0 12 Normalized NZ Method 13 Branchless Segmented Scan 21 Original Segmented Scan IATPARAM 11 IATPARAM 12 IATPARAM 13 IATPARAM 22 IATPARAM 23 IATPARAM 24 IATPARAM 25 IATPARAM 26 IATPARAM 31 IATPARAM 32 IATPARAM 33 Integer 128 INPUT Columns of Segmented Scan s algorithms Integer 2 INPU
6. DSRMV Setup 29 3 4 4 Argument Details and Error Code of DURMV Setup 31 3 4 5 Argument Details and Error Code for OpenATI_DSRMV 34 3 4 6 Argument Details and Error Code for OpenATI_DURMV 37 3 4 7 Usage Example niit doe t ti E LEER EX Ee 40 3 5 OpenAT lh DAFGS ME 42 3 5 T Overview ofthe TUMCTION ses tort taire Fee rer een nid 42 3 5 2 Overview of Reorthonormalization 42 3 5 3 Argument Details and Error Code a 43 3 6 OpenATi DAFMG GOS2 OBS cetur asua 44 3 6 1 Overview of the function 1 terc rete e edge 44 OpenATLib Xabclib User s Manual for Version 1 0 3 6 2 Argument Details and Error Code a 44 3 7 LINEARSOLVE and OpenATI EIGENSOLVE 45 3 7 1 Overview of the function nennen nnne nnne 45 3 7 2 Overview of numerical policy 45 3 7 3 Automatic selection of preconditioner and 47 3 7 4 Argument Details and Error Code of OpenATI LINEARSOLVE 49 3 7 5 Argument Details and Error Code of EIGENSOLVE 51 3 7 6 USage Exampl
7. IATPARAM 8 Integer 12 INPUT If IATPARAM 7 0 then set the OUTPU number of implementations T If IATPARAM 7 2 or 3 the best number of implementations returns 11 Row Decomposition Method 12 Normalized NZ Method 13 Normalized NZ Method with vector reduction parallelization 3 Using parameters on RATPARAM OpenATI DSRMV doesn t use RATPARAM 35 OpenATLib Xabclib User s Manual for Version 1 0 4 Error Code Value Description 0 Successful exit 100 The value of IATPARAM 9 is illegal If ATPARAM 7 0 200 The value of IATPARAM 7 is illegal 36 OpenATLib Xabclib User s Manual for Version 1 0 3 4 6 Argument Details and Error Code for DURMV 1 Argument Details Argument IRP N 1 ICOL NNZ VAL NNZ xw Ye IATPARAM Integer 50 ndi RATPARA Double M 50 UINF Double LUINF LUINF INPUT INPUT INPUT INPUT INPUT INPUT OUTPUT INPUT OUTPUT INPUT INPUT OUTPUT INPUT Description The number of dimension for the matrix N gt 1 The number of non zero elements for the matrix Pointers to first elements on each row for the matrix The non zero row indexes for the matrix The non zero elements for the matrix Right hand side vector elements Results vector elements for SpMxV Array of integer parameters for OpenATLib and Xabclib Array of double precision parameters for OpenATLib and Xabclib If IATPARAM 9 0 or 1 INP
8. OpenATLib Xabclib INFO OUTPUT Error Code 2 Using parameters on IATPARAM Number Type Initial IO Description me IATPARAM 14 Integer INPUT Access to meminfo for Linux system 15 Integer OUTPUT Number of retried solver IATPARAM 6 Integer OUTPUT Total restart of solver 17 Integer OUTPUT Total Matrix Vector times 49 OpenATLib Xabclib User s Manual for Version 1 0 IATPARAM 18 Integer OUTPUT Last performed preconditioner type 1 None 2 Jacobi 3 SOR 4 ILU 0 Diagonal 5 ILU 0 6 ILUT IATPARAM 19 Integer OUTPUT Maximum number of fill in s in each row for ILUT preconditioner IATPARAM 20 Integer OUTPUT Last performed solver type 1 Xabclib_GMRES 2 Xabclib BICGSTAB 3 Using parameters on RATPARAM Number Type Initial IO Description Value RATPARAM 14 Doube OUTPUT Residual norm RATPARAM 15 Double OUTPUT Set up time RATPARAM 16 Double OUTPUT Preconditioner time RATPARAM 17 Double OUTPUT Solver time RATPARAM 18 Double OUTPUT Total time RATPARAM 19 Double OUTPUT Last Performed preconditioner parameter 4 Error Code Value Description 0 Normal return 100 in POLICY FILE is illegal 200 The value of IATPARAM 9 is illegal 300 POLICY in POLICY FILE is illegal 810 PRECONDITIONER in POLICY FILE is illegal 820 SOLVER in POLICY FILE
9. 3 Using parameters on RATPARAM Number RATPARAM 6 Type Double Initial Value 0 01 IO Description Exponent for the Exponential Moving Average In Fig 3 2 INPUT 19 OpenATLib Xabclib User s Manual for Version 1 0 4 Error Code Value Description Normal return 20 OpenATLib Xabclib User s Manual for Version 1 0 3 2 4 Usage Example You can write the code like Fig 3 3 Parameter Definition Pth 10 Threshold for judging stagnation PERR 1 0D0 ISTGCNT 0 EMA 0 0D0 ETIME1 OMP GET WTIMEO ETIME2 ETIMEI STOP TOL RATPARAM 23 ITER IATPARAM 22 ETIME RATPARAM 22 omission IF RERR STOP TOL RETURN Convergence Test ETIME3 ETIME2 ETIME2 OMP GET WTIMEO ETIME ETIME2 ETIME1 EITRTIME ETIME2 ETIM E3 CALL OpenATI DAFSTG ISTGCNT EMA RERR PERR STOP TOL ITER MAX ITER ETIME EITRTIME MAX ETIME IATPARAM RATPARAM INFO IF ISTGCNT Pth RETURN Stagnation PERR RERR omission Fig 8 3 An Example of OpenATI DAFSTG description 21 OpenATLib Xabclib User s Manual for Version 1 0 3 3 OpenATI DAFRT 3 3 1 Overview of the function To perform Krylov subspace method for example Lanczos method for eigensolvers computation and GMRES method for linear equation solvers they need to specify the dimension of the inner Krylov subspace to fix available memory space If the iteration number is over for the fixed dimension new
10. end if Fig 44 Arnoldi Method 62 OpenATLib Xabclib User s Manual for Version 1 0 4 2 4 Argument Details and Error Code 1 Argument Details Argument Type Description N Integer INPUT The number of dimension for the matrix NZ 1 NNZ Integer INPUT The number of non zero elements for the matrix IRP N 1 Integer Pointes to first position on each row for the matrix Note Satisfy IRP 1 1 IRP N 1 NNZ 1 ICOL NNZ Integer The row indexes for non zero elements for the matrix VAL NNZ Double The non zero elements for the matrix 2088 Integer The number of eigenvalues you need The execution time increases according to the If gt 100 the execution time will be enormous hence it may not solve in practical time EV NEV COMP OUTPUT The eigenvalues The k th eigenvalue is set to EV k LEX 1 EVEC COMP OUTPUT The eigenvectors The k th eigenvector LDE NEV LEX 1 corresponding to the eigenvalue EV k is set to the k th column LDE Integer The leading dimension of EVEC array LDE gt N IATPARAM Integer INPUT Array of integer parameters for OpenATLib and 50 OUTPUT Xabclib RATPARA Double INPUT Array of double precision parameters for OpenATLib M 50 OUTPUT Xabclib WORK Double WORK Workspace LWORK LWORK Integer INPUT The size of the double precision workspace WORK Satisfy LWORK gt 5 MSIZE N 5 MSIZE MSIZE
11. 1 not generated yet 2 already generated lt L gt preconditioner type 25 4 1 None 2 Jacobi 3 SOR 4 ILU 0 _Diagonal 5 ILU 0 6 ILUT 26 5 Maximum number of fill in s in each row for ILUT 50 Input size of Krylov subspace in GMRES Arnoldi caution in Xabclib ARNOLDI must to be IATPARAM 27 gt NEV Start size of Krylov subspace at subspace expand AT on 3 5 GMRES Arnoldi See IATPARAM 4 uto in Xabclib ARNOLDI if IATPARAM 28 less than NEV then start subspace size NEV overwritten 29 Final size of Krylov subspace in GMRES Arnoldi OpenATLib Xabclib User s Manual for Version 1 0 E eigenvalue order option in Xabclib LANCZOS 1 largest eigenvalue 2 largest magnitude in Xabclib ARNOLDI 1 largest real part eigenvalue 2 largest magnitude 3 largest imaginary part 3l Total Matrix Vector times 32 Krylov iteration times When stagnation of relative residual occurs solver is stopped 5 0 Off 1 0n 34 Minimum running iteration When IATPARAM 32 1 35 49 reserved 50 debug info 0 Off 1 On 10 OpenATLib Xabclib User s Manual for Version 1 0 Table 2 4 OpenATLib amp Xabclib double precision parameter list lt L gt for Linear solver E for Eigen value solver index default mandatory RATPARAM 50 description 1 2 mandatory 3 20 OpenATLib s Information
12. 21 Original Segmented Scan IATPARAM 1 IATPARAM 12 IATPARAM 13 IATPARAM 22 IATPARAM 23 IATPARAM 24 IATPARAM 25 IATPARAM 26 Integer 128 INPUT didi fover uii Columns of Segmented Scan s algorithms 0 Classical Gram Schmidt 1 DGKS 2 Modified Gram Schmidt 3 Blocked Gram Schmidt Iterative refinement of DGKS 0 no Iterative refinement 1 Iterative refinement Maximum number of restart iterations Final number of restart iterations Preconditioner operations flag 1 not generated yet 2 already generated Set preconditioner kinds 1 None 2 Jacobi 3 SSOR 4 ILU 0 _Diagonal 5 ILU 0 6 ILUT Maximum number of fill in s in each row for ILUT IATPARAM 27 INPUT Max dive IATPARAM 28 B 71 Start size of Krylov subspace at subspace expand See IATPARAM 4 OpenATLib Xabclib User s Manual for Version 1 0 nteger 7 OUTPUT Final size of Krylov subspace Integer OUTPUT Total Matrix Vector times Integer INPUT When stagnation of relative residual occurs solver is stopped 0 Off 1 On IATPARAM 33 Integer O INPUT Minimum running iteration 3 Using parameters on RATPARAM Number Type Initial IO Description Value RATPARAM 4 INPUT Threshold value for MM ratio RATPARAM 22 INPUT Max elapsed time RATPARAM 23 1 0 08 INPUT Convergence criterion RAT
13. WRITE 6 WRITE 6 ORTHOGONALITY ERR WRITE 6 RETURN END subroutine matgen itest n nz irp icol a implicit real 8 a h o z integer 4 irp icol real 8 a character fi IF itest EQ 207 THEN fi lename ex19 dat else if itest eq 301 then fi lename vibrobox rb end if if itest gt 300 itest le 321 then call matread itest filename irp icol nz a else if itest gt 200 and itest le 222 then OPEN 5 FILE f i ename return end subroutine matread itest filename ncol colptr rowind nnzero values implicit real 8 a h o z SAMPLE CODE FOR READING A SPARSE MATRIX IN STANDARD FORMAT CHARACTER TITLE 72 KEY 8 MXTYPE 3 1 PTRFMT 16 INDFMT 16 VALFMT 20 RHSFMT 20 INTEGER TOTCRD PTRCRD INDCRD VALCRD RHSCRD 1 NROW NCOL NNZERO NELTVL INTEGER COLPTR ROWIND REAL 8 VALUES character fi lename 60 lun i t 23 open lunit file filename if itest eq 308 then READ LUNIT 1100 TITLE KEY 84 OpenATLib Xabclib User s Manual for Version 1 0 1 TOTCRD PTRCRD INDCRD VALCRD RHSCRD 2 MXTYPE NROW NCOL NNZERO 3 PTRFMT INDFMT VALFMT RHSFMT 1100 FORMAT A72 A8 5114 A3 11X 3114 2A16 2A20 iin C LUNIT READ CLUNIT 1000 TITLE KEY 1 TOTCRD
14. 1 0 4 1 3 The Lanczos Method The Lanczos method using this library is shown in Fig 4 2 The algorithm is based on the algorithm referred by 3 1 Start with v r lir 5 lock 0 2 For IR 1 2 maxrestart Do 3 For j lock 1 m Do 4 Compute v rl 5 r Av 6 r v 7 if l r r ay 8 if 7 1 rzr ayv 9 r LV by modified Gram Schmidt 10 6 11 EndDo om Born Nocks2 12 Eigen solve T SOS T 13 k th residual estimate with B S for k 1 NEV 14 creat Ritz vectors V S 15 count up new locked Ritz pair 16 if lock new lock gt goto exit 17 create new starting Shur vector r V S ubera 18 deflation Voa Qoa for L 1 new lock then lock new lock 19 EndDo Fig 4 2 The Lanczos Method 56 OpenATLib Xabclib User s Manual for Version 1 0 4 1 4 Argument Details and Error Code 1 Argument Details Argument Type Description N Integer INPUT The number of dimension for the matrix N gt 1 NNZ Integer INPUT The number of non zero elements for the upper ud triangle part IRP N 1 Integer INPUT Pointes to diagonal elements on each row und Note Satisfy IRP 1 1 IRP N 1 NNZ 1 ICOL NNZ Integer INPUT The row indexes for non zero elements on the upper triangle part VAL NNZ Double INPUT The values for non zero elements on the upper
15. 1 iteration ol Iteration tolerant 7 e G x R 8 If e Else p p 1 lt log then p 0 tol 9 If p p then output stagnation Else goto 2 Fig 3 2 The formulas of detection 18 OpenATLib Xabclib User s Manual for Version 1 0 3 2 3 Argument Details and Error Code 1 Argument Details Argument ISTGCNT EMA RERR PERR STOP TOL ITER MAX ITER ETIME EITRTIME MAX ETIM E IATPARAM 50 RATPARA M 50 INFO Type Integer Double Double Double Double Integer Integer Double Double Double Integer Double Integer INPUT OUTPUT INPUT OUTPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT OUTPUT Description The counter for detecting stagnation of relative residual The exponential moving average of relative residual The error of the approximate solution vector The last error of the approximate solution vector Convergence criterion The number of iterations Max Iterations The elapsed time The elapsed time per iteration Max elapsed time Array of integer parameters for OpenATLib and Xabclib Array of double precision parameters for OpenATLib and Xabclib Error code 2 Using parameters on IATPARAM Number IATPARAM 6 Type Integer Initial Value 10 Description INPUT A certain threshold value for judging stagnation In Fig 3 2 pth
16. AUTO ILUO is selected by default This keyword is used by only OpenATI LINEARSOLVE D PRECONDITIONER NO No preconditioner 46 OpenATLib Xabclib User s Manual for Version 1 0 PRECONDITIONER JACOBI JACOBI PRECONDITIONER SSOR SSOR PRECONDITIONER ILUOD ILU 0 Diagonal PRECONDITIONER ILUO ILU 0 PRECONDITIONER ILUT ILUT Q 0 6060 PRECONDITIONER AUTO Automatic select 1 SOLVER values value XABCLIB GMRES XABCLIB BICGSTAB AUTO 1 OpenATI LINEARSOLVE XABCLIB LANCZOS XABCLIB ARNOLDI OpenATI EIGENSOLVE The default value is XABCLIB GMRES OpenATI LINEARSOLVE 1 Detail of this policy is explained in 3 7 3 3 7 3 Automatic selection of preconditioner and solver OPENATI LINEARSOLVE has the function of performing preconditioned iterative solvers under the given order This function can call two or more iterative solvers and preconditioners and performs these solvers and preconditioners in order for satisfying time tolerant and required accuracy Algorithm of automatic selection of preconditioner and solver policy as follow 1 5 1 0D0 S 0 retry Set strategy 5 5 8 involves type of solver and preconditioner 2 For iz1 m Call solver according to S with a function of detecting stagnation 4 If stagnation occured then go to 5 Else go to 8 If rerative residual r lt r then hu 749 5 6 End For 7 8 0 t
17. INIT Restart Frequency Auto tuning Function OpenATI_DAFRT Detecting Stagnation Auto tuning Function OpenATI_DAFSTG Sparse Matrix Vector Multiply Auto tuning Function OpenATI DURMV Setup Function for OpenATI DURMV OpenATI DURMV Setup Gram Schmidt orthonormalization function OpenATI DAFGS Fig 1 2 Components of Function on Linearsolver OpenATLib Xabclib User s Manual for Version 1 0 2 Specification 2 1 Functions and Auguments of OpenATLib and Xabclib In this section library for functions and specification on a common auto tuning interface named OpenATLib is explained OpenATLib is an Application Programming Interface API to supply auto tuning facility on arbitrary matrix computation libraries For example estimation function for the best values on algorithmic parameters and best implementation for sparse matrix vector multiplication SpMxV 1 The function Table 2 1 shows auto tuning functions providing OpenATLib Table 2 1 Auto tuning Function Providing OpenATLib Function Name OpenATI INIT OpenATI DAFRT OpenATI DAFSTG OpenATI DSRMV OpenATI DURMV OpenATI DSRMV Setup OpenATI DURMV Setup OpenATI DAFGS OpenATI DAFMC CCS2CR S OpenATI LINEARSOLVE OpenATI EIGENSOLVE Description Set default parameter for OpenATLib and Xabclib Judge increment for restart frequency on Krylov subspace Detect stagnation of relative residu
18. IO Description Value 1 DGKS 2 Modified Gram Schmidt IATPARAM 13 Integer OUTPUT Iterative refinement of DGKS 0 no Iterative refinement 1 Iterative refinement 3 Using parameters on RATPARAM IATPARAM 12 Integer 2 INPUT 0 Classical Gram Schmidt 3 Blocked Gram Schmidt OpenATI_ DAFGS doesn t use RATPARAM 43 OpenATLib Xabclib User s Manual for Version 1 0 3 6 DAFMC CCS2CRS 3 6 1 Overview of the function OpenATI DAFMC CCS2CRS converts sparse matrix storage format from CCS Compressed Column Storage into CRS Compressed Row Storage 3 6 2 Argument Details and Error Code Z IRP N 1 Integer OUTPUT Pointers of first element on each row of the matrix in CRS format ICOL NNZ Integer OUTPUT The non zero column indexes for the matrix in CRS 1 Argument Details Argument Description IATPARAM Integer INPUT Array of integer parameters for OpenATLib and 50 liii Xabclib N INPUT The order of the matrix N gt 1 NNZ Non Zero elements of the matrix NNZ gt N IPTR N 1 Integer INPUT Pointers of first element on each column of the ES matrix in CCS format INDEX NN Integer INPUT Row indexes of elements in CCS format S VALUE NN INPUT Value of elements in CCS format format VAL NNZ OUTPUT Value of elements CBS format 44 OpenATLib Xabclib User s Manual for Version 1 0 3 7 OpenATI LINEARSOLVE and OpenATI EIGENSOLVE Sparse iterative
19. Initial IO Description Umen IATPARAM 14 Integer INPUT Access to meminfo for Linux system OUTPUT Number of retried solver OUTPUT Total restart of solver 51 15 IATPARAM 16 OpenATLib Xabclib User s Manual for Version 1 0 17 Integer OUTPUT Total Matrix Vector times 3 Using parameters on RATPARAM Number Type Initial IO Description Value RATPARAM 5 Double OUTPUT Residual norm RATPARAM 15 Double OUTPUT Set up time RATPARAM 17 Double OUTPUT Solver time RATPARAM 18 Double OUTPUT Total time 4 Error Code Value Description 0 Normal return 100 in POLICY FILE is illegal 200 The value of IATPARAM 7 or IATPARAM 9 is illegal 300 POLICY in POLICY FILE 1s illegal 810 PRECONDITIONER in POLICY FILE is illegal 820 SOLVER in POLICY FILE is illegal 400 The value of MAXMEMORY in POLICY FILE is greater than free size of memory 500 Failing to allocate work area 20 Error code from Xabclib LANCZOS Xabclib Arnoldi For more detail refer 3 1 4 and 3 2 4 52 OpenATLib Xabclib User s Manual for Version 1 0 3 7 6 Usage Example DOPENATI LINEARSOLVE An example of policy file POLICY ACCURACY RESIDUAL 1 0D 10 CPU 16 PRECONDITIONER ILUO SOLVER XABCLIB GMRES MAXMEMORY 1 0 MAXTIME 500 0 Before running put policy input file named OPENA
20. Interface Library 3 1 OpenATI INIT 3 1 1 Overview of the function OpenATI INIT sets default parameters for OpenATLib and Xabclib This function must be called before using all functions of OpenATLib and Xabclib 3 1 2 Argument Details and Error Code 1 Argument Details Argument Description IATPARAM Integer OUTPUT Array of integer parameters for OpenATLib and 50 Xabclib RATPARA Double OUTPUT Array of double precision parameters for OpenATLib M 50 EE and Xabclib INFO OUTPUT Error code 2 Error Code Value Description Normal return 16 OpenATLib Xabclib User s Manual for Version 1 0 3 2 OpenATI DAFSTG 3 2 1 Overview of the function Recently many iterative solvers and preconditioner methods are proposed However the history of relative residual shows the various movements by solvers precondiotiners and matrices Hence we need to predict the solver will satisfy user s request or not from the history of relative residual so far OpenATI DAFSTG enables us to detect the stagnation of relative residual from the history of them 3 2 2 Overview of the auto tuning method OpenATI_DAFSTG uses gradient of the history as of then for detection For example at the fiftieth iteration there are three histories like Fig 3 1 Like them OpenATI_DAFSTG calculates gradient of them Next from the latest point of history OpenATI DAFSTG draws a prediction line with calculated gradient to the line of hu
21. NUM SMP IATRARAM RATPARAM WK SINF LSINF INFO omission Fig 3 7 Example of OpenATI DSRMV Description 40 OpenATLib Xabclib User s Manual for Version 1 0 If you want to specify SpMxV implementation in OpenATI DSRMV implement the code like Fig 3 8 Parameter definition IATPARAM 7 0 Initialize DSRMV parameter IATPARAM 8 13 Initialize DSRMV parameter omission Call SpMxV LSINF N NUM_SMP 3 Allocate memory for setup ALLOCATE SINF LSINF CALL OpenATI_LDSRMV_Setup N NNZ IRPICOL IATPARAM RATPARAM SINELSINEINFO CALL DSRMV N NNZ IRP ICOL VAL X Y IATRARAM RATPARAM WK SINF LSINF INFO omission Fig 39 8 An example of OpenATI_DSRMV Description with specified SpMxV implementation 41 OpenATLib Xabclib User s Manual for Version 1 0 3 5 DAFGS 3 5 1 Overview of the function Vector orthonormalization spends a lot of CPU time in many Krylov Subspace methods Gram Schmidt orthonormalization methodI7 is typical orthonormalization method There are many implementations to perform Gram Schmidt method and trade offs must be made between computational complexity and accracy Hence It 15 difficult to fix the best implementation OpenATI DAFGS is API that supplies selectable from 4 kinds Gram Schmidt orthonormalization implementation 3 5 2 Overview of Reorthonormalization method In this function the API has 4 kinds G
22. Perform SpMxV to judge the best methods between three methods except for Original Segment Scan 3 Perform SpMxV to judge the best method among four implementations 4 Perform SpMxV to judge the best method among four implementations and auto configure IATPARAM 1 If IATPARAM 9 0 or 1 then set the number of implementations If IATPARAM 9 2 3 or 4 the best number of implementations returns 11 Row Decomposition Method 12 Normalized NZ Method 13 Branchless Segmented Scan 21 Original Segmented Scan Columns of Segmented Scan s algorithms If IATPARAM 9 is set as 1 or 4 IATPARAM 11 is set as IATPARAM 11 Mod IATPARAM 11 IATPARAM 3 OpenATI_DURMV_Setup doesn t use RATPARAM 32 OpenATLib Xabclib User s Manual for Version 1 0 4 Error Code Value Description 0 Successful exit 100 Invalid IATPARAM 10 value 200 LUINF value exceeds upper limit of Integer 300 Invalid LUINF value IATPARAM 10 12 13 21 38 OpenATLib Xabclib User s Manual for Version 1 0 3 4 5 Argument Details and Error Code for DSRMV 1 Argument Details Argument Description N The number of dimension for the matrix gt 1 NNZ The number of non zero elements for the matrix IRP N 1 Integer INPUT Pointers to diagonal elements on each row for the matrix ICOL NNZ The non zero row indexes for the matrix VAL NNZ The non zero elements for the matrix X N Right
23. by stagnation of relative residual This error code is output only when IATPARAM 32 1 73 OpenATLib Xabclib User s Manual for Version 1 0 4 4 Xabclib BICGSTAB 4 4 1 Overview of the function Xabclib BICGSTAB can solve large scale unsymmetric sparse matrices in the linear equations problem 4 4 2 Target problem and data format 1 Target problem The problem to be solved in the library is the linear equations problem A x P where A is a large scale sparse matrix xis a solution vector and bis a right hand side vector 2 Input data format The unsymmetric sparse matrix format is Compressed Row Storage CRS for unsymmetric matrices shown in Fig 3 3 74 OpenATLib Xabclib User s Manual for Version 1 0 4 4 3 Overview of the algorithm The algorithm used in this solver is the BiCGStab method which is shown in Fig 4 6 The algorithm was presented in 10 BiCGStab with right preconditioner by Dr Itoh 1 x initial guess r b Ax r M r solve Mr r i r 0 2 iter k 0 1 2 iter Q p 4 5 solve Mv 6 7 a p ly 8 s r ap ands r av check conv if small enough then x x G p exit 9 t As 10 lt E 0 x x ap s 12 r s t 13 check conv if r small enough exit Il solve Mr r 15 2 46 P 5 177 Bzal py Pp 18 Po Pn 19 end iter Fig 4 6 The BiCGStab M
24. computation is done with the current calculated approximation as initial vector to make new Krylov subspace This process 1s called restart and the number of iterations is called restart frequency If the restart frequency is too small it causes stagnation of reduction for residual vector which is calculated by real solution and approximation vectors then the number of iterations is increased On the other hand if the restart frequency is too big it causes heave computation to make big Krylov subspaces hence the execution time 1s very increased The best frequency depends on input sparse matrix numerical condition and it 1s very tough to estimate the best frequency without execution Hence in the library point of view we need on the fly namely run time auto tuning facility OpenATI DAFRT enables us to judge the incensement of frequency based on the current information of Krylov subspace 3 3 2 Overview of the auto tuning method The previous estimation for the best restart frequency is difficult it can detect stagnation based on the run time history of residuals The method is proposed in 1 The norm of the stagnation is defined by the value that maximum value divided by minimal vale from t th time to s th time The values called Ratio of Max Min in residual Hereafter we describe the ratio MM ratio for simplification The MM ratio to past tth time namely Ri can be described with rth residual ri follows
25. exact convergence test until true convergence 9 If POLICY MEMORY Meta Solvers set arguments with less memory usage 45 OpenATLib Xabclib User s Manual for Version 1 0 If POLICY STABLE Meta Solvers set arguments without AT In this case Meta Solvers set IATPARAM as following value 4 7 and 9 0 IATPARAM 27 28 30 LINEARSOLVE or NEV 5 EIGENSOLVE The others are set as default value CPU value value entry OMP NUM THREADS at run time OMP GET NUM THREADS is selected by default Note 1 lt value lt OMP GET MAX THREADS RESIDUAL value value entry require accuracy by real value The default value is 1 0D 8 In case of POLICY ACCURACY is set and false convergence occur solver continue to re excute with more exact convergence test until true convergence MAXMEMORY value value entry require memory usage in Gbyte The default value is memfree in proc meminfo Linux If fails to get property in proc meminfo search and allocate free memory dynamically Note The maximum limit of MAXMEMORY is 16Gbyte MAXTIME value lt value gt entry time tolerance in sec The default value is infinite When execution time exceeds time tolerance computation is stopped PRECONDITIONER lt value gt lt value gt NO JACOBI SSOR ILUOD ILUO ILUT
26. no Iterative refinement 1 refinement IATPARAM 22 Integer 1 INPUT Maximum number of restart iterations IATPARAM 3 Integer OUTPUT Final number of restart iterations IATPARAM 27 Max size of Krylov subspace IATPARAM 28 Integer 2 INPUT Start size of Krylov subspace at OUTPUT subspace expand See IATPARAM 4 If 28 less than start subspace size NEV overwritten IATPARAM 29 Integer OUTPUT Final size of Krylov subspace IATPARAM 30 Integer Eigenvalue order option 1 largest real part eigenvalue 2 largest magnitude 3 largest imaginary part IATPARAM 31 Integer Total Matrix Vector times IATPARAM 32 Integer When stagnation of relative residual B 3 Using parameters RATPARAM Number Type Initial IO hu RATPARAM 4 Threshold value for MM ratio RATPARAM 22 Max elapsed time RATPARAM 23 Convergence criterion RATPARAMQ9 Double OUTPUT 2 norm of max residual 65 occurs solver is stopped 0 Off 1 On Description OpenATLib Xabclib User s Manual for Version 1 0 RATPARAM 30 RATPARAM 32 Double OUTPUT floating operations x10 9 operations Double total solve time 4 Error Code Value Description 0 Normal return Less than 0 If i returns the value of i th argument is illegal 100 Computation was stopped by breakdown for zero vector division 200 Computat
27. 9 MSIZE 6 NEV MSIZE IATPARAM 27 IWORK Integer WORK Workspace LIWORK 63 OpenATLib Xabclib User s Manual for Version 1 0 LIWORK INPUT INFO OUTPUT Satisfy 2 Using parameters on IATPARAM Error code The size of the integer workspace IWORK LIWORK MSIZE MSIZE IATPARAM 27 Number Type Initial IO Description IATPARAM 3 Integer OMP_G INPUT Number of THREADS ET_MA X_THR EADSO IATPARAM 4 Integer 1 INPUT Flag of Krylov subspace expand by MM ratio IATPARAM 5 Integer 5 INPUT incremental value for Krylov subspace when MM ratio is less than threshold RATPARAM 4 IATPARAM 9 Integer 0 INPUT OpenATI_DURMV auto tuned On Off 0 Perform SpMxV specified by IATPARAM 10 2 and 3 Perform SpMxV to judge the best method among three implementations IATPARAM 10 Integer 12 INPUT If IATPARAM 9 0 then set the number of implementations If IATPARAM 9 2 or 3 the best number of implementations returns 11 Row Decomposition Method 12 Normalized NZ Method 13 Branchless Segmented Scan 21 Original Segmented Scan IATPARAM 1 Integer 128 INPUT Columns of Segmented Scan s algorithms 64 OpenATLib Xabclib User s Manual for Version 1 0 IATPARAM 12 Integer 2 INPUT 0 Classical Gram Schmidt 1 DGKS 2 Modified Gram Schmidt 3 Blocked Gram Schmidt IATPARAM 13 Integer OUTPUT Iterative refinement of DGKS 0
28. ATPARAM 11 9 3 2 AT on select fastest type in 11 or 12 3 AT on select fastest type in 11 12 or 13 4 AT on select fastest type in 11 12 or 13 And Auto configure IATPARAM 11 Fastest DURMV impl Method 10 12 11 block row decomp 12 nonzero decomp 13 BSS 21 0 original SS Columns of Segmented Scan s algorithms n ib If IATPARAM 9 is set as 1 or 4 IATPARAM 11 is set as IATPARAM 11 Mod IATPARAM 11 IATPARAM 3 OpenATI DURMV and OpenATI DURMV Setup OpenATLib Xabclib User s Manual for Version 1 0 Type of Gram Schmidt procedure 0 CGS 1 DGKS 2 MGS 3 Blocked CGS 1 DGKS refinement done or not 0 done 1 not 0 ia Access to meminfo EIGENSOLVE LINEARSOLVE done 1 0 15 Number of retried solver EIGENSOLVE LINEARSOLVE 16 Total restart of solver EIGENSOLVE LINEARSOLVE 17 Total Matrix Vector times EIGENSOLVE LINEARSOLVE Last performed preconditioner type 18 1 None 2 Jacobi 3 SOR 4 ILU O Diagonal 5 ILU 0 6 ILUT 19 Maximum number of fill in s in each row for ILUT preconditioner Last performed solver type 5 l Xabclib GMRES 2 Xabclib BICGSTAB D 0 atio 21 3t of OMP NUM THREADS 0 32 init Max Iterations O if Solver recognize 1 then set N 23 3t of Iterations 24 i lt preconditioner operations flag
29. M ratio is less than threshold RATPARAM 4 OpenATI_DSRMV auto tuned On Off 0 Perform SpMxV specified by IATPARAM 8 2 Perform SpMxV to judge the best methods between two methods except for reduction parallel implementation 3 Perform SpMxV to judge the best method among three methods Note that workspace according to the number of threads is needed If IATPARAM 7 0 then set the number of implementations If IATPARAM 7 2 or 3 the best number of implementations returns 11 Row Decomposition Method 12 Normalized NZ Method OpenATLib Xabclib User s Manual for Version 1 0 13 Normalized NZ Method with 11 vector reduction parallelization IATPARAM 12 Intege 2 INPUT 0 Classical Gram Schmidt r 1 DGKS 9 Modified Gram Schmidt 3 Blocked Gram Schmidt IATPARAM 13 Intege OUTPUT Iterative refinement of DGKS r 0 no Iterative refinement 1 Iterative refinement IATPARAM 22 Intege 1 INPUT Maximum number of restart iterations IATPARAM 23 Intege OUTPUT Final number of restart iterations sud BI IATPARAM 27 Max size of Krylov subspace IATPARAM 28 Intege 2 INPUT Start size of Krylov subspace at r subspace expand AT on See IATPARAM 4 If IATPARAM 28 less than NEV then start subspace size NEV overwritten IATPARAM 29 Intege OUTPUT Final size of Krylov subspace EL IATPARAM 30 Intege 1 INPUT Eigenvalue order option r 1 largest eigenvalue CT 2 largest
30. OpenATLib Xabclib User s Manual for Version 1 0 Information Technology Center The University of Tokyo and Central Research Laboratory Hitachi Ltd April 27 2012 OpenATLib Xabclib User s Manual for Version 1 0 DISCLAIMER This software OpenATLib and Xabclib is provided by the copyright holders and contributors Information Technology Center The University of Tokyo and Central Research Laboratory Hitachi Ltd AS IS and any express or implied warranties including but not limited to the implied warranties of merchantability and fitness for a particular purpose are disclaimed In no event shall the copyright owner or contributors be liable for any direct indirect incidental special exemplary or consequential damages including but not limited to procurement of substitute goods or services loss of use data or profits or business interruption however caused and on any theory of liability whether in contract strict liability or tort including negligence of otherwise arising in any way out of the use of this software even if advised of the possibility of such damage OpenATLib Xabclib User s Manual for Version 1 0 Contents ume 4 LEO 6 2 1 Functions and Auguments of OpenATLib and Xabclib 6 2 2 Linking and Running OpenATLib and
31. PARAM 25 Double 1 0E 08 INPUT If IATPARAM 25 3 then Set parameter o for SSOR preconditioner 1 lt lt 9 If IATPARAM 25 4 or 5 then Set threathold value to judge breakdown when computing ILU 0 preconditioner If IATPARAM 25 6 then Set value of dropping criterion when computing ILU 0 preconditioner RATPARAM 28 Double OUTPUT 2 norm of RHS RATPARAM 29 Double OUTPUT 2 norm of max residual RATPARAM 30 Double OUTPUT Floating operations x10 9 operations RATPARAM 31 Double OUTPUT Preconditioner time RATPARAM 32 Double OUTPUT Total solve time RATPARAM 33 Double 00 INPUT Minimum running time 4 Error Code Value IATPARAM 29 IATPARAM 31 IATPARAM 32 Description Normal return 72 OpenATLib Xabclib User s Manual for Version 1 0 Less than 0 100 200 300 400 500 600 700 1000 If i returns the value of i th argument is illegal Computation was stopped by failing to make preconditioner Computation was stopped by breakdown Computation was stopped by that the value of OpenATI_DAFRT is illegal Computation was stopped by exceeding the execution time tolerance Computation was stopped by exceeding the maximum number of restart Computation was stopped by failing to allocate memory in case of IATPARAM 10 12 13 21 Computation was stopped by the value of LUINF exceeds Integer max in case of IATPARAM 10 21 Computation was stopped
32. PRE gt 3 NNZ 2 2 N 50 If IATPARAM 25 6 then NPRE gt 3 2 0 IFILL 1 N 2 3 N 50 IFILL IATPARAM 26 69 OpenATLib Xabclib User s Manual for Version 1 0 IATPARAM Integer INPUT Array of integer parameters for OpenATLib and 50 Xabclib RATPARA Double INPUT Array of double precision parameters for OpenATLib M 50 1 and Xabclib WK Double WORK Workspace re LWK Integer INPUT The size of the workspace for double precision WK Satisfy LWK gt MSIZE 2 N MSIZE 1 MSIZE 1 N 1 2 1 MSIZE IATPARAM 27 INFO Error code 2 Using parameters on IATPARAM Number Type Initial IO Description Value IATPARAM 3 Integer OMP_G INPUT Number of THREADS ET_MA X_THR EADSO IATPARAM 4 Integer 1 INPUT Flag of Krylov subspace expand by MM ratio IATPARAM 5 Integer 5 INPUT incremental value for Krylov subspace when MMr ratio is less than threshold RATPARAM 4 IATPARAM 9 Integer 0 INPUT OpenATI DURMV auto tuned On Off 0 Perform SpMxV specified by IATPARAM 10 2 and 3 Perform SpMxV to judge the best method among three implementations IATPARAM 10 Integer 12 INPUT If IATPARAM 9 0 then set the number of implementations 70 OpenATLib Xabclib User s Manual for Version 1 0 If IATPARAM 9 2 or 3 the best number of implementations returns 11 Row Decomposition Method 12 Normalized NZ Method 13 Branchless Segmented Scan
33. PTRCRD INDCRD VALCRD RHSCRD 2 MXTYPE NROW NCOL NNZERO NELTVL 3 PTRFMT INDFMT VALFMT RHSFMT 1000 n A72 A8 5114 11X 4114 2A16 2A20 endi write 6 gt INPUT FILE NAME IS filename write 6 TITLE 6 KEY C Te Z A EN s E t 0 READ MATRIX STRUCTURE READ LUNIT PTRFMT COLPTR D 1 NCOL 1 READ LUNIT INDFMT ROWIND D 1 NNZERO IF CVALCRD GT 0 THEN x READ MATRIX VALUES EM LUNIT VALFMT VALUES D 1 NNZERO return end subroutine residz n irp icol nz a nev e v r implicit real 8 a h o z integer 4 irp n 1 icol nz real 8 a nz complex 16 e nev v nv1 nev r n 16 s resmax 0 000 do 100 ic 1 nev do 210 1 1 s demplx 0 0d0 0 0d0 do 220 irp i irp i 1 1 jj icol jc s sta jc v jj ic 220 continue r 5 210 continue do 230 r i r i e ic v i 230 continue zansa 0 040 do 240 zansa zansa dreal con jg i r i 240 continue write 6 IC IC EZ e ic RES sqrt zansa abs e ic resmax max resmax sqrt zansa abs e ic 100 continue WRITE 6 WRITE 6 MAX RESID WRITE 6 return end H H tl E e 5 gt lt 85
34. T 0 Classical Gram Schmidt 1 DGKS 2 Modified Gram Schmidt 3 Blocked Gram Schmidt Integer OUTPU Iterative refinement of DGKS T 0 no Iterative refinement 1 Iterative refinement Integer 1 INPUT Maximum number of restart iterations Integer OUTPU Final number of restart iterations T Integer 1 INPUT Preconditioner operations flag 1 not generated yet 2 already generated Integer 4 INPUT Set preconditioner kinds 1 None 2 Jacobi 3 SSOR 4 ILU 0 _Diagonal 5 ILU 0 6 ILUT Integer 5 INPUT Maximum number of fill in s in each row for ILUT m NN OUTPU Total Matrix Vector times T Integer INPUT When stagnation of relative residual occurs solver is stopped 0 Off 1 0 Integer 0 INPUT Minimum running iteration 78 OpenATLib Xabclib User s Manual for Version 1 0 3 Using parameters on RATPARAM Number Type Initial IO Description Value RATPARAM 4 INPUT Threshold value for MM ratio RATPARAM 22 INPUT Max elapsed time RATPARAM 23 1 0 08 INPUT Convergence criterion RATPARAM 25 Double 1 0E 08 INPUT If IATPARAM 25 3 then Set parameter for SSOR preconditioner 1 lt 0 lt 9 If IATPARAM 25 4 or 5 then Set threathold value to judge breakdown when computing ILU 0 preconditioner If IATPARAM 25 6 then Set value of dropping criterion when computing ILU 0 preconditioner RATPARAM 28 Double OUTPUT 2 norm of RHS
35. TI POLICY INPUTAZ thread number When OpenATI LINEARSOLVE running is complete computation result and input parameters are reported OPENATI POLICY REPORT ZA thread number An example of DOPENATI POLICY REPORTAZ as follow LINEAR SOLVER POLICY REPORT ee 2010 0114 11 30 Environment variables OPENAT I DEBUG OPENATI POLICY input policy dat Pol icy Definitions POLICY ACCURACY SMPs 16 SOLVER XABCLIB GMRES PRECONDITIONER ILUO REQUIREMENT WORKING MEMORY lt lt lt Upper Bound 16GBYTE gt gt gt REQUIREMENT RESIDUAL REQUIREMENT MAX TIME 1 0000000000000 1 000000000000000E 008 500 000000000000 MAX SUBSPACE SIZE RUNTIME MEMORY USE 14214 3 24 GBYTE KRYLOV SUBSPACE EXPAND AT 1 MATVEC 1 Initial Gram Schmidt Strategy BCGS OPENAT I_LINEARSOLVE RESULT MATRIX DATA N 14214 NNZ FASTEST MATVEC NO 11 FINAL KRYLOV SUBSPACE SIZE FINAL Gram Schmidt Strategy DGKS 2 Norm of RHS 25 2388580282479 NUMBER OF RETRYED GMRES TOTAL RESTARTS of GMRES RESIDUAL NORM 259688 42 6 197 3 005885687924543E 010 SET UP TIME 1 126790046691895E 002 SEC SOLVER TIME 1 32032704353333 SEC TOTAL TIME 1 33159494400024 SEC lt report date time input parameters successfully exit lresult report fastest OpenATI
36. UT IATPARAM 0 11 IATPARAM 10 12 13 21 Set returned by DURMV Setup If IATPARAM 9 2 3 or 4 INPUT Not necessary to set Not necessary to set OUTPUT Returns setup information for best implementation The size of If IATPARAM 9 0 or 1 IATPARAM 10 11 LUINF gt 0 IATPARAM 10 12 LUINF gt int 0 5 NUM_SMP 1 IATPARAM 10 13 LUINF gt int 1 5 N int 4 25 JL 10 37 OpenATLib Xabclib User s Manual for Version 1 0 JL IATPARAM 1 IATPARAM 10 21 LUINF int 1 125 NNZ int 2 125 JL 10 If IATPARAM 9 2 LUINF gt int 0 5 NUM_SMP 1 If IATPARAM 9 3 or 4 LUINF gt int 1 5 N int 4 25 JL 10 NUM_SMP IATPARAM 3 JL IATPARAM 11 INFO OUTPUT Error Code 2 Using parameters on IATPARAM Number Type Initial IO Description Value IATPARAM 9 Integer 3 INPUT OpenATI DURMV auto tuned On Off 0 Perform SpMxV specified by IATPARAM 10 1 Perform SpMxV specified by IATPARAM 10 and auto configure IATPARAM 11 2 Perform SpMxV to judge the best methods between three methods except for Original Segment Scan 3 Perform SpMxV to judge the best method among four implementations 4 Perform SpMxV to judge the best method among four implementations and auto configure IATPARAM 1 IATPARAM 10 Integer 12 INPUT If IATPARAM 9 0 or 1 then set the OUTPU number of implementations T If IATPARAM 9 2 3 or 4 the bes
37. _DURMV case lt Msize for convergence initial norm of RHS retried iterations 58 OpenATLib Xabclib User s Manual for Version 1 0 2 EIGENSOLVE An example of policy file POLICY TIME RESIDUAL 1 0D 8 CPU 16 SOLVER XABCLIB LANCZOS MAXMEMORY 16 0 MAXTIME 600 0 Before running put policy input file named OPENATI POLICY INPUTJAZ thread number When OpenATI EIGENSOLVE running is complete computation result and input parameters are reported in POLICY REPORTZ thread number An example of OPENATI POLICY REPORTAZ as follow EIGEN SOLVER POLICY REPORT 2011 1129 14 53 Environment variables OPENATI DEBUG 0 OPENATI POLICY OPENATI POLICY INPUT 0 Pol icy Definitions POLICY TIME SMPs 16 SOLVER XABCLIB LANCZOS REQUIREMENT WORKING MEMORY 16 00000000000000 lt lt lt Upper Bound 16GBYTE gt gt gt REQUIREMENT RESIDUAL 1 000000000000000 008 REQUIREMENT MAX TIME 600 000000000000 MAX SUBSPACE SIZE RUNTIME MEMORY USE 12326 3 65 GBYTE KRYLOV SUBSPACE EXPAND AT 1 MATVEC AT 3 Initial Gram Schmidt Strategy BCGS OPENATI EIGENSOLVE RESULT MATRIX DATA N 12328 NNZ 177518 FASTEST MATVEC NO 13 FINAL KRYLOV SUBSPACE SIZE 30 FINAL Gram Schmidt Strate
38. al for iterative method Judge the best implementation for double precision symmetric SpMxV on CRS format Judge the best implementation for double precision non symmetric SpMxV on CRS format Setup function for OpenATI_DSRMV Setup function for OpenATI_DURMV Gram Schmidt orthonormalization function with 4 implementations Convert matrix storage format from CCS into CRS Over LinearSolver with numerical policy interface Over EigenSolver with numerical policy interface OpenATLib Xabclib User s Manual for Version 1 0 The functions provided OpenATLib are classified for the following four categories a Computation Function Ex OpenATI DIS b Auxiliary Function Ex DAFRT OpenATI_DAFSTG c Setup Function Ex OpenATI INIT OpenATI_D S Setup d Meta interface Ex LINEARSOLVE For a and b functions the function names are named by the manner on Table 2 1 following OpenATI Table 2 2 Nomenclature of OpenATLib functions First Character The character shows data type S Single Precision D Double Precision Second and Third If the function is auxiliary it comes AF Characters If the function is computation it comes matrix kinds in the second character and matrix storage format in the third character The second character Symmetric U Non symmetric D Diagonal T Tridiagonal The third character R CRS Format C CCS Fo
39. atagiri T Sakurai M Igai S Ohshima H Kuroda K Naono and K Nakajima An improvement in preconditioned BiCGStab method High Performance Computing Symposium 2011 2011 81 OpenATLib Xabclib User s Manual for Version 1 0 Appendix A Sample code of OpenATI EIGENSOLVE for thread safe PROGRAM MAIN IMPLICIT NONE INTEGER NMAX NZMAX parameter NMAX 268100 NZMAX 9400000 INTEGER NTMP NZTMP NEVTMP INTEGER IRPTMP NMAX 1 ICOLTMP NZMAX DOUBLE PRECISION ATMP NZMAX INTEGER N NZ NEV INFO INTEGER IRP 1001 IATPARAM ALLOCATABLE IRP ICOL IATPARAM DOUBLE PRECISION E V RATPARAM ALLOCATABLE AC EC VC RATPARAM DOUBLE PRECISION WK 0 ALLOCATABLE WK 0C INTEGER 5 OMP GET THREAD NUM EXTERNAL OMP GET NUM THREADS OMP GET MAX THREADS INTEGER GET NUM THREADS OMP GET MAX THREADS open 31 file Input param status OLD read 31 itest close 31 CALL MATGEN I TEST NTMP NZTMP IRPTMP I COLTMP ATMP GET MAX THREADS NEVTMP 10 WRITE 6 Heth Input Parameter List write 6 itest itest WRITE 6 Matrix Info NTMP NZ NZTMP WRITE 6 OpenMP Number of MAX Threads WRITE 6 C omp parallel default none 1 private N NZ IRP ICOL A E V INFO I o
40. ector Multiplication IPSJ SIG Notes 2002 ARC 147 151 156 2002 in Japanese Hernandez J E Roman and A Tomas Evaluation of Several Variants of Explicitly Restarted Lanczos Eigensolvers and Their Parallel Implementations High Performance Computing for Computational Science VECPAR 2006 pp 403 416 2007 4 Y Saad Iterative methods for sparse linear systems SIAM 1996 b Guy E Blelloch Michael A Heroux and Marco Zagha Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors Carnegie Mellon University Pittsburgh PA 1993 6 K Naono M Igai and H Kidachi Performance Evaluation of the Gram Schmidt Orthogonalization Library with Numerical Policy Interface on Heterogeneous Platforms IPSJ Tran on Advanced computing systems 46 SIG 12 ACS 11 pp 279 288 2005 Gn Japanese 7 Daniel J Gragg W B Kaufman L And Stewart G W Reorthogonalization and stable algorithms for updating the Gram Schmidt QR factorization Math of Computation Vol 30 pp 772 795 1976 8 K Naono M Igai and H Kidachi Performance Evaluation of the Gram Schmidt Orthogonalization Library with Numerical Policy Interface on Heterogeneous Platforms Transaction on Advanced Computing Systems Vol 46 No SIG12 ACS11 pp 279 288 2005 in Japanese 9 Lehoucq Richard Bruno Analysis and implementation of an implicitly restarted Arnoldi iteration TR95 13 Rice Univ 1995 10 S Itoh K K
41. es 25 Tea T A dre vmi aeneus 53 4 Xabclib A Numerical Library with Auto tuning Facility OpenATLib 55 Xabelib erc due iea bb 4 1 1 Overview of the function u uu 55 4 1 2 Target problem formularization and data 55 4 7 3 The Lanczos Method m ee c dde nee ns 56 4 1 4 Argument Details and Error Code a 57 4 2 Xabclib ARNOEDIE leet lea a obs eet 61 4 2 1 Overview 1 enne enne enn 61 4 2 2 Target problem formularization and data 61 4 2 8 The Arnoldi Method 5 nete rettet nie eene direi ert Pea edge 62 4 2 4 Argument Details and Error Code a 63 2 3 Xabelib GMRES aunt 67 4 3 1 Overview of the function enne nnne 67 4 3 2 Target problem and data 67 4 3 3 Overview of the algorithm 1 68 4 3 4 Argument Details and Error Code a 69 Xabelib BIGGS TAB x tlie uka ash Meetic adeat eiu 74 4 4 1 Overview of the function enne nennen 74 4 4 2 Targe
42. ethod 75 OpenATLib Xabclib User s Manual for Version 1 0 4 4 4 Argument Details and Error Code 1 Argument Details Argument N NNZ IRP N 1 ICOL NNZ VAL NNZ B N X N PRECOND NPRE NPRE IATPARAM Type Integer Integer Integer Integer Double Double Double Double Integer Integer INP INP INP INP 4 4 INPU INPU UT UT S S UT U INPUT OUTPUT INPUT OUTPUT INPUT INPUT Description The number of dimension for the matrix N gt 1 The number of non zero elements for the matrix Pointes to first position on each row for the matrix Note Satisfy IRP 1 1 IRP N 1 NNZ 1 The row indexes for non zero elements for the matrix The non zero elements for the matrix The elements for right hand size vector INPUT Set the elements of initial guess for solution vector x 0 OUTPUT Return the elements of solution vector x INPUT e If IATPARAM 24 1 then none to be set If IATPARAM 24 2 then set preconditioner kind of M already specified OUTPUT If IATPARAM 24 1 then the preconditioner kind of M returns If IATPARAM 24 2 then no modification The size of PRECOND array If IATPARAM 25 1 then NPRE gt 0 If IATPARAM 25 2 3 or 4 then NPRE gt N If IATPARAM 25 5 then NPRE gt 3 NNZ 2 2 N 50 If IATPARAM 25 6 then NPRE gt 3 2 0 IFILL 1 N 2 3 N 50 IFILL IATPARAM 26
43. gy BCGS NUMBER OF RETRYED LANCZOS 1 TOTAL RESTARTS of LANCZOS 21 SET UP TIME 5 362033843994141E 004 SEC SOLVER TIME 0 654937982559204 SEC TOTAL TIME 0 655474185943604 SEC If you want to use these Meta Solvers for thread safe refer to sample code in Appendix A 54 OpenATLib Xabclib User s Manual for Version 1 0 4 Xabclib A Numerical Library with Auto tuning Facility on OpenATLib 4 1 Xabclib LANCZOS 4 1 1 Overview of the function Xabclib LANCZOS can compute several eigenvalues from the absolutely largest value for large scale symmetric matrices in the standard eigenproblem 4 1 2 Target problem formularization and data format 1 Target problem The target problem is the standard eigenproblem 4 v for computing eigenvalues and eigenvectors on large scale sparse matrices where A is a large scale sparse matrix is an eigenvalue and vis an eigenvector 2 Input data format The data format for input symmetric sparse matrix is Compressed Row Storage CRS shown in Fig 4 1 Please note that the format is dedicated for symmetric matrices hence we do not need lower elements 12030 DISISITISIS 4 0 0 5 1 6 7 10 6 0 0 Row indexes for 1 2 4 2 5 4 5 5 non zero elementa 7 8 Values for 1 2 3 4 5 6 7 8 9 non zero elements Fig 4 1 Compressed Row Storage CRS for Symmetric Matrices 55 OpenATLib Xabclib User s Manual for Version
44. h o z integer 4 irp n 1 icol nz real 8 a nz e nev v nv1 nev r n C gt gt gt gt gt gt gt gt gt gt resmax 0 000 do 100 ic 1 nev SSS mat vec s a irp i v i ic do 220 jc irp i 1 irp i 1 1 jj icol jc s s a v jj ic jj r jj a je v i ic E 220 Bon Ine r i r i s 210 continue do 230 r i 2r i eCic v i ic 230 continue zansa 0 040 do 240 1 1 zansa zansatr i i 240 continue write 6 IC IC EZ e ic RES sqrt zansa abs e ic resmax max resmax sqrt abs e ic 100 continue C WRITE 6 WRITE 6 MAX RESID resmax C WRITE 6 return end Qookolokolokolokolokolokolokolokolokolok SUBROUTINE ORTHO NV V NV1 0 IMPLICIT REAL 8 A H 0 2 REAL 8 V NV1 NV NV1 NV ICHK 0 DO 400 J 1 NV DO 500 1 1 5 0 000 600 K 1 S S V K DV K J J 88 OpenATLib Xabclib User s Manual for Version 1 0 600 CONTINUE IF I EQ J T IF 1 0D0 GT 1 0D 12 THEN WRITE 6 IT ING T EIGENVECTOR J 15 NOT NORMALIZED SQRT S O I J S 500 CONTINUE 400 CONTINUE ERR 0 000 DO 700 J 1 NV DO 800 1 1 J 1 IF I NE J ERR MAX ERR O I J 800 CONTINUE 700 CONTINUE IF ICHK EQ 0 THEN WRITE 6 OK EIGENVECTOR NORMALIZED END IF
45. hand side vector elements Y N Solution vector elements for SpMxV IATPARAM Integer INPUT Array of integer parameters for OpenATLib and 50 ndi OUTPUT Xabclib RATPARA Double INPUT Array of double precision parameters for M 50 OpenATLib Xabclib WK N Double WORK If IATPARAM 7 0 and IATPARAM 8 13 or IATPARAM IATPARAM 7 3 then set workspace to the 3 argument SINF Double INPUT If IATPARAM 7 0 LSINF OUTPUT INPUT IATPARAM 8 11 Not necessary to set IATPARAM 8 12 13 Set SINF retuned by OpenATI_DSRMV_Setup If IATPARAM 7 2 3 INPUT Not necessary to set OUTPUT Returns setup information for best implementation LSINF Integer INPUT The size of SINF If IATPARAM 7 0 IATPARAM 8 11 LSINF gt 0 IATPARAM 8 12 LSINF gt int 0 5 NUM_SMP 1 34 OpenATLib Xabclib User s Manual for Version 1 0 IATPARAM 8 13 LSINF gt N NUM_SMP 3 If IATPARAM 7 2 LSINF gt int 0 6 NUM_SMP 1 If IATPARAM 7 3 LSINF gt N NUM_SMP 3 INFO OUTPUT Emor code 2 Using parameters on IATPARAM Number Type Initial IO Description Value IATPARAM 7 Integer 3 INPUT OpenATI DSRMV auto tuned On Off Perform SpMxV specified by IATPARAM 8 2 Perform SpMxV to judge the best methods between two methods except for reduction parallel implementation 3 Perform SpMxV to judge the best method among three methods Note that workspace according to the number of threads is needed
46. hen retry retry Call solver according to without a function of detecting stagnation retry Output solution and report 47 OpenATLib Xabclib User s Manual for Version 1 0 In the following the order of strategy is listed STRATEGY PRECONDITIONER SOLVER 1 SSOR BiCGStab 2 SSOR GMRES m 3 ILUO Diagonal BiCGStab 4 ILUO Diagonal GMRES m 5 ILUO BiCGStab 6 ILUO GMRES m 7 ILUT 10 1 0E 08 BiCGStab 8 ILUT 10 1 0E 08 GMRES m 48 OpenATLib Xabclib User s Manual for Version 1 0 3 7 4 Argument Details and Error Code of OpenATI LINEARSOLVE CALL OpenATI LINEARSOLVE N NNZ IRP ICOL VAL B X IATPARAM RATPARAM INFO 1 Argument Details Argument Description N The number of dimension for the matrix NZ 1 NNZ The number of non zero elements for the matrix IRP N 1 Integer INPUT Pointes to first position on each row for the matrix ICOL NNZ Integer INPUT The row indexes for non zero elements for the matrix VAL NNZ INPUT The non zero elements for the matrix B N The elements for right hand size vector X N Double INPUT INPUT OUTPUT Set the elements of initial guess for solution vector 0 OUTPUT Return the elements of solution vector x IATPARAM Integer INPUT Array of integer parameters for OpenATLib and 50 ud g Xabclib RATPARA Double INPUT Array of double precision parameters for M 50
47. hreads Thread 1 3 Thread 1 3 Thread 2 Thread 2 6 Thread 3 3 Thread 3 alt Thread 4 Thread 4 2 3 Non Zero Row Decomposition Method Normalized NZ Method Fig 3 5 example of Row Decomposition Method and Normalized NZ Method Original Segmented Scan method Branchless Segmented Scan method Original Segmented Scan 5 is designed for sparse matrix multiplication on vector multiprocessors In this method input matrix is divided into fixed length of Non Zero element group These Non Zero element group are named segment vector In a code of Original Segmented Scan innermost loop has fixed length of loop and mask process with FLAG representing the beginning of row Fig 3 6 shows an example of segment vector of length 6 processed by 5 threads Branchless Segmented Scan is the method modified for scalar multi core system by removing IF operator for mask process in innermost loop In this method row pointer array in CSR format is extended for segment vector In Fig3 6 IRP is expanded MFLAG 26 OpenATLib Xabclib User s Manual for Version 1 0 example input matrix row pointer IRP o 2 o ooo 1 3 7 t 15 18 26 28 29 31 0 0 0 4 5 6 o 9 0 10 0 3 12013 0 0 0 0 0 16 1 20 21 2 27 0 segment vector 1
48. io Double OUTPUT Value of MM ratio 4 Error Code Value Description Normal return 23 OpenATLib Xabclib User s Manual for Version 1 0 3 3 4 Usage Example Judge incensement of restart frequency per 5 iterations If it is needed to increase the frequency is increased by stridden 1 In this case you can write the code like Fig 3 4 Parameter Definition MSIZE 1 Initial restart frequency I 5 Judgment frequency omission IF RSDID TOL RETURN _ Convergence Test SAMP K RSDID Set residual to SAMP K mod K D eq 0 THEN DAFRT per I times IRT 0 CALL OpenATI DAFRT I SAMPIRT IATRARAM RATPARAM INFO IF IRT 1 MSIZE MSIZE 1 Increase restart frequency K 0 END IF 1 omission Fig 3 4 An Example of OpenATI DAFRT description 24 OpenATLib Xabclib User s Manual for Version 1 0 3 4 OpenATI DSRMV and OpenATI DURMV OpenATI DSRMV Setup OpenATI DURMV Setup 3 4 1 Overview of the function Sparse matrix vector multiplication SpMxV is crucial function and widely used in many iterative methods Its execution time directly affects total execution time in many cases There are many implementations to perform SpMxV The best implementation depends on computer environment and numerical characteristics of input sparse matrix It is hence difficult to fix the best method We need auto tuning method at run time to adapt user s computer environme
49. ion was stopped by abnormal computation of eigenvalues in part of tridiagonal matrix computation 300 Computation was stopped by exceeding the maximum number of restart 400 Computation was stopped by exceeding the execution time tolerance 500 Eigenvalue and eigenvector are illegal 600 Computation was stopped by failing to allocate memory in case of IATPARAM 10 12 13 21 66 OpenATLib Xabclib User s Manual for Version 1 0 4 3 GMRES 4 3 1 Overview of the function Xabclib GMRES can solve large scale unsymmetric sparse matrices in the linear equations problem 4 3 2 Target problem and data format 1 Target problem The problem to be solved in the library is the linear equations problem A x P where A is a large scale sparse matrix xis a solution vector and bis a right hand side vector 2 Input data format The unsymmetric sparse matrix format is Compressed Row Storage CRS for unsymmetric matrices shown in Fig 3 3 67 OpenATLib Xabclib User s Manual for Version 1 0 4 3 3 Overview of the algorithm The algorithm used in this solver is the GMRES method which is shown in Fig 4 5 The algorithm was presented in 4 1 Compute r b Ax B a and v nl 2 Define the m 1 x m matrix H n TETT Set Hn 0 3 For j 1 2 m Do 4 Compute For i 1 j Do h 5 6 7 0 hy 8 EndDo 9 hu 0 Set m j and g
50. is illegal 400 The value of MAXMEMORY in POLICY FILE is greater than free size of memory 500 Failing to allocate work area 20 Error code from Xabclib GMRES Xabclib BICGSTAB For more detail refer 3 3 4 and 3 4 4 50 OpenATLib Xabclib User s Manual for Version 1 0 3 7 5 Argument Details and Error Code of OpenATI EIGENSOLVE CALL OpenATI_EIGENSOLVE N NNZ IRP ICOL VAL IORDER NEV EV EVEC IATPARAM RATPARAM INFO 1 Argument Details Argument Type Description N Integer INPUT The number of dimension for the matrix N21 NNZ Integer INPUT The number of non zero elements for the upper triangle part IRP N 1 Integer INPUT Pointes to diagonal elements on each row Note Satisfy IRP 1 1 IRP N 1 NNZ 1 ICOL NNZ Integer INPUT The row indexes for non zero elements on the upper triangle part VAL NNZ Double INPUT The values for non zero elements on the upper triangle part NEV Integer EV NEV Double INPUT The number of eigenvalues you need OUTPUT The eigenvalues The k th eigenvalue is set to EV k EVEC Double OUTPUT The eigenvectors k the eigenvector N NEV corresponding to the eigenvalue EV k is set to the m k th column IATPARAM Integer INPUT Array of integer parameters for OpenATLib and 50 ed Xabclib RATPARA Double INPUT Array of double precision parameters for M 50 522 OpenATLib Xabclib INFO Error Code 2 Using parameters on IATPARAM Number Type
51. magnitude IATPARAM 31 Intege OUTPUT Total Matrix Vector times ad B d IATPARAM 32 Intege INPUT When stagnation of relative residual r occurs solver is stopped 0 Off 1 0 3 Using parameters on RATPARAM Number Type Initial IO Description Value 59 OpenATLib Xabclib User s Manual for Version 1 0 RATPARAM 4 Threshold value for MM ratio RATPARAM 22 Max elapsed time RATPARAM 23 Convergence criterion RATPARAMQ9 Double OUTPUT 2 norm of max residual RATPARAM 30 Double OUTPUT floating operations x10 9 operations RATPARAM 32 Double OUTPUT total solve time 4 Error Code Value Description 0 Normal return Less than 0 If i returns the value of i th argument is illegal 100 Computation was stopped by breakdown for zero vector division 200 Computation was stopped by abnormal computation of eigenvalues in part of tridiagonal matrix computation 300 Computation was stopped by exceeding the maximum number of restart 400 Computation was stopped by exceeding the execution time tolerance 500 Computation was stopped by failing to allocate memory in case of IATPARAM 8 12 13 60 OpenATLib Xabclib User s Manual for Version 1 0 4 2 Xabclib ARNOLDI 4 2 1 Overview of the function Xabclib ARNOLDI can compute several eigenvalues for large scale unsymmetric matrices in the standard eigenproblem 4 2 2 Target problem formularization and data format 1 Targe
52. max 7 z z s f 1 s R s t s min r 2 z s t l s If restart frequency is big enough the residual tends to reduce bigly hence MM ratio is going to be big If restart frequency is small it tends to cause stagnation hence MM ratio 1s going to be small Hence we can control restart frequency at run time monitor for the MM ratio If the MM ratio is going to be small to a fixed value at run time the frequency should be increased 22 OpenATLib Xabclib User s Manual for Version 1 0 3 3 3 Argument Details and Error Code 1 Argument Details Argument Description NSAMP INPUT The number of sampling points SAMP Double INPUT The values of sampling points ccm IRT Integer OUTPUT 0 Do not need to increase restart frequency 1 Need to increase restart frequency IATPARAM Integer INPUT Array of integer parameters for OpenATLib and 50 Xabclib RATPARA Double INPUT Array of double precision parameters for OpenATLib M 50 ue and Xabclib INFO OUTPUT Error code 2 Using parameters on IATPARAM Number Type Initial IO Description IATPARAM 4 Integer 1 INPUT 1 Judge incensement of restart eee frequency based on MM ratio IATPARAM 5 Integer 5 INPUT Incremental value for Krylov subspace when MMrratio is less than threshold RATPARAM 4 3 Using parameters on RATPARAM Number Type Initial IO Description Value INPUT Threshold value for MM rat
53. mp private IP ATPARAM RATPARAM I omp private WK 0 1 shared NTMP NZTMP IRPTMP ICOLTMP ATMP NEVTMP ITEST GET THREAD NUM N NTMP NZ NZTMP NEV NEVTMP ALLOCATE 1 bie A NZ ALLOCATE 2 NEV V 2 N NEV ALLOCATE IATPARAM 50 RATPARAM 50 ce O NEVAN DO 1 1 1 IRP D IRPTMP 1 NDDO DO 1 1 100 h I COLTMP 1 ACD 1 ND DO CALL OpenATI INIT IATPARAM RATPARAM INFO wr ite 6 OpenATI EIGENSOLVE THREAD SAFE TEST IP C ATPARAM 50 71 ATPARAM 30 22 CALL OpenATI NZ IRP ICOL A E V ATPARAM RATPARAM INFO I omp barrier write 6 OpenATI EIGENSOLVE INFO INFO if info 11 0 THEN write 6 1111 Parameter Error Info INFO 82 OpenATLib Xabclib User s Manual for Version 1 0 GOTO 9000 else if info ne 0 then write 6 1111 Breakdown Error Info INFO GOTO 9000 end if IF ITEST GT 300 AND ITEST LE 321 THEN call resid n irp icol nz a nev e v n wk call ORTHO N nev V N 0 ELSE IF ITEST GT 200 AND ITEST LE 222 then a residz n irp icol nz nev e v n wk E 9000 CONTINUE DEALLOCATE IRP ICOL A DEALLOCATE E V DEALLOCATE RATPARAM DEALLOCATE WK 0 I omp barrier omp end parallel STOP END subroutine resid n irp icol nz a nev e v r implicit real 8 a
54. ndredth iterations as the time limit If the point at the intersection of the prediction with the line of time limit is less than the convergence criterion OpenATI DAFSTG estimates the iterative solver will converge On the other hand when the intersection point is greater than the criterion OpenATI DAFSTG estimates the solver will not converge Iteration Tolerance 1 0E 01 I 1 00 Gradient I 1 0E 01 l 3 3 1 0 02 DING 72 c OE es E 1 0F 03 NG 1 0 04 I 1 0 05 1 1 0F 06 1 0 07 10 08 sr M l 0 50 100 150 200 Requested Accuracy Iterations Fig 3 1 The idea of this auto tuning method 17 OpenATLib Xabclib User s Manual for Version 1 0 Next the formulas of detection are explained By time series analysis method OpenATI DAFSTG calculates the gradient of relative residual and predicts the value at time limit OpenATI DAFSTG uses Exponential Moving Average as time series analysis method for calculating the gradient Because this analysis method 1s easily calculated And it is not necessary to record the previous relative residual Formulas of prediction as follow 1 p 0 e 0 G 0 2 Run 1 iterarion 3 If r lt then output convergence Else goto 4 4 e log 7 5 e e 1 1 G G 6 R min T 7 t L k Time tolerant t Computation time for
55. non zero elements for the matrix IRP N 1 Integer INPUT Pointers to first elements on each row for the matrix ICOL NNZ The non zero row indexes for the matrix IATPARAM Integer INPUT Array of integer parameters for OpenATLib and 50 pem Xabclib RATPARA Double INPUT Array of double precision parameters for M 50 OpenATLib and Xabclib SINF Double OUTPUT If IATPARAM 9 11 LSINF No returns If IATPARAM 8 12 13 Returns the groups of rows processed each thread for OpenATI DSRMV LSINF Integer INPUT The size of SINF IATPARAM 8 11 LSINF gt 0 IATPARAM 8 12 LSINF gt int 0 6 NUM_SMP 1 IATPARAM 8 13 LSINF gt N NUM_SMP 8 NUM_SMP IATPARAM 3 INFO Error Code 2 Using parameters on IATPARAM Initial IO Description Value 12 INPUT Set the number corresponding Number Type IATPARAM 8 Integer implementation of SpMxV in OpenATI DSRMV 11 No necessary to run this function 12 Create information for 29 OpenATLib Xabclib User s Manual for Version 1 0 Normalized NZ Method 13 Create information for Normalized NZ Method with vector reduction parallelization 3 Using parameters on RATPARAM OpenATI DSRMV Setup doesn t use RATPARAM 4 Error Code Value Description 0 Successful exit 100 Invalid IATPARAM 8 value is inputted 200 Invalid LSINF value is inputted 8 12 or 13 30 OpenATLib Xabclib User s Manual for Version 1 0 3 4 4 Arg
56. nt and matrices OpenATI DSRMV is designed for double symmetric SpMxV and OpenATI DURMV is designed for double non symmetric SpMxV auto tuning APIs for their implementations at run time 3 4 2 Overview of auto tuning method In this function the API surveys all candidates of SpMxV implementations in the first iteration time then select the best implementation after that This method was proposed by 2 The following several implementations are supplied for OpenATI_DSRMV 3 kinds and OpenATI 4 kinds in version beta OpenATI DSRMV S1 Row Decomposition Method S2 Normalized NZ Method S3 Normalized NZ Method with vector reduction parallelization DURMV U1 Row Decomposition Method U2 Normalized NZ Method for scalar multi core processors U3 Branchless Segmented Scan for scalar multi core processors U4 Original Segmented Scan for vector processors 25 OpenATLib Xabclib User s Manual for Version 1 0 Row Decomposition Method and Normalized NZ Method Row Decomposition Method Input Matrix is divided into the number of threads blocks for balancing the number of row processed by each thread Normalized NZ Method Input Matrix is divided into the number of threads blocks for normalizing the number of non zero element processed by each thread Figure 3 5 shows an example of Row Decomposition Method and Normalized NZ Method in case of 6 dimension matrix processed by 4 t
57. o to 12 10 v h 11 EndDo 1 12 Compute the minimizer of and x x V Fig 4 5 GMRES Method 68 OpenATLib Xabclib User s Manual for Version 1 0 4 3 4 Argument Details and Error Code 1 Argument Details Argument N NNZ IRP N 1 ICOL NNZ VAL NNZ B N X N PRECOND NPRE NPRE Type Integer Integer Integer Integer Double Double Double Double Integer INP INP INP INP 4 4 INPU INPU UT UT S S UT U INPUT OUTPUT INPUT OUTPUT INPUT Description The number of dimension for the matrix N gt 1 The number of non zero elements for the matrix Pointes to first position on each row for the matrix Note Satisfy IRP 1 1 IRP N 1 NNZ 1 The row indexes for non zero elements for the matrix The non zero elements for the matrix The elements for right hand size vector INPUT Set the elements of initial guess for solution vector x 0 OUTPUT Return the elements of solution vector x INPUT e If IATPARAM 24 1 then none to be set If IATPARAM 24 2 then set preconditioner kind of M already specified OUTPUT If IATPARAM 24 1 then the preconditioner kind of M returns If IATPARAM 24 2 then no modification The size of PRECOND array If IATPARAM 25 1 then NPRE gt 0 If IATPARAM 25 2 3 or 4 then NPRE gt N If IATPARAM 25 5 then N
58. ram Schmidt orthonormalization method Selected method is indicated by value of IATPARAM 12 By default Modified Gram Schmidt method is selected 1 Classical Gram Schmidt method CGS When Krylov Subspace size is large accuracy of orthonormalization is lowering Acceleration performance by parallelization is excellent 2 DGKS method This method supplies improved accuracy by running CGS 2 times DGKS method computational complexity needs twice as many as CGS one 3 Modified Gram Schmidt method MGS MGS is most popular Gram Schmidt method This method is most effective performance and accuracy 4 Blocked Classical Gram Schmidt method BCGS BCGS method is orthonormalized by intra block with CGS by inter block with MGS Block length is 4 42 OpenATLib Xabclib User s Manual for Version 1 0 3 5 3 Argument Details and Error Code 1 Argument Details Argument Description NORMALF Integer INPUT Normalization of Output vector LG 0 not normalized iti 1 normalized N Vector length N gt 1 X N Vector for normalization Q LQ MM Orthonormalized vectors Q 1 N MM LQ Leading Dimension of Q MM number of vector of Q HR MM Inner product X by Q 1 N M IATPARAM Integer INPUT Array of integer parameters for OpenATLib and 50 Ie OUTPUT Xabclib RATPARA Double INPUT Array of double precision parameters for M 50 OpenATLib Xabclib 2 Using parameters on IATPARAM Number Type Initial
59. rmat Fourth and Fifth Process Kinds Characters MV Matrix vector multiplication RT Restart frequency Sixth and Above Property of Process kinds Characters 2 Common Parametr List for OpenATLib and Xabclib OpenATLib and Xabclib use common parameter lists named IATPARAM RATPARAM IATPARAM 1s integer parameter list and RATPARAM 1s double precision parameter list If you call OpenATI INIT this function sets these lists as default value Table 2 3 and 2 4 show description and default value of IATPARAM RATPARAM OpenATLib Xabclib User s Manual for Version 1 0 Table 2 3 OpenATLib amp Xabclib integer parameter list lt L gt for Linear solver lt E gt for Eigen value solver IATPARAM 50 index default description type 1 mandatory M mandatory M 3 20 OpenATLib s Information 3 1 of THREADS SMP s 1 OMP_NUM_THREADS Flag of Krylov subspace expand by MM ratio O AT off 1 Incremental value for Krylov subspace when MM ratio is less than threshol d RATPARAM 4 6 10 A certain threshold value for judging stagnation OpenATI DSRMV auto tuned On Off 0 AT off 2 AT on select fastest type in 11 or 12 3 AT on select fastest type in 11 12 or 13 Fastest DSRMV impl Method 8 12 11 block row decomp 12 nonzero decomp 13 parallel 1 0 vector reduction OpenATI DURMV auto tuned On Off 0 AT off 1 AT off and Auto configure I
60. solvers with Numerical policy 3 7 1 Overview of the function Numerical policy is requirement and priority of memory CPU time accuracy and others specified by library user OpenATI supplies OpenATI LINEARSOLVE is designed for unsymmetric liner problem and OpenATI EIGENSOLVE is designed for symmetric unsymmetric eigenvalue problem as sparse iterative solvers with numerical policy OpenATI LINEARSOLVE and OpenATI EIGENSOLVE are Meta Solvers that call Xabclib and set optimized arguments automatically on user s numerical policy 3 7 2 Overview of numerical policy If you want to use Meta Solvers you make numerical policy file with following format and input numerical policy files name is POLICY Thread number Policy file s format is as follow keyword value There are POLICY CPU RESIDUAL MAXMEMORY MAXTIME PRECONDITIONER SOLVER as configurable keywords Unregistered keyword in policy file is inputted the default value The explanation of all keyword is as follow POLICY value value TIME ACCURACY MEMORY STABLE TIME is selected by default D If POLICY TIME Meta Solvers preference for execution time over accuracy and saving memory Therefore algorithms for high performance are positively selected rf POLICY ACCURACY Meta Solvers recalculation solution of solvers If false convergence occurs Meta Solvers continue to reexcute with more
61. t number of implementations returns 38 OpenATLib Xabclib User s Manual for Version 1 0 11 Row Decomposition Method 12 Normalized NZ Method 13 Branchless Segmented Scan 21 Original Segmented Scan IATPARAM 11 Integer 128 INPUT Columns of Segmented Scan s algorithms If IATPARAM 9 is set as 1 or 4 IATPARAM 11 is set as IATPARAM 11 Mod IATPARAM 11 IATPARAM 3 on DURMV and OpenATI DURMV Setup 3 Using parameters on RATPARAM OpenATI DURMV doesn t use RATPARAM 4 Error Code Value Description 0 Successful exit 100 The value of IATPARAM 10 is illegal If IATPARAM 9 0 200 The value of IATPARAM 9 is illegal 39 OpenATLib Xabclib User s Manual for Version 1 0 3 4 7 Usage Example Search the best implementation of SpMxV 1n the first iteration time then the best implementation is used after that based on the run time searching To implement this see the code of Fig 8 7 Parameter definition IATPARAM 7 3 Initialize DSRMV parameter LSINF N NUM SMP 3 ALLOCATE SINF LSINF omission The first SpMxV CALL DSRMV N NNZ IRP ICOL VAL X Y IATRARAM RATPARAM WK SINF INFO IATPARAM 7 0 Hereafter we select the best one omission SpMxV after run time searching We can use the best implantation based on previous information CALL DSRMV N NNZ IRP ICOL VAL X Y
62. t problem The target problem is the standard eigenproblem A v for computing eigenvalues and eigenvectors on large scale sparse matrices where is a large scale sparse matrix is an eigenvalue and vis an eigenvector 2 Input data format The data format for input symmetric sparse matrix A is Compressed Row Storage CRS shown in Fig 4 3 Please note that the format is dedicated for symmetric matrices hence we do not need lower elements 0 Pointers to first row elements i3 7 10 11 Row indexes for 1 2 4 1 2 5 3 1 4 5 non zero elementa on co c 0 0 C2 TTS IS 17181512 non zero elements ox o h m eo O Q U Sc WwW 0 Fig 4 3 Compressed Row Storage CRS for Unsymmetric Matrices 61 OpenATLib Xabclib User s Manual for Version 1 0 4 2 3 Arnoldi Method The Arnoldi method using this library is shown in Fig 4 4 The algorithm is based on the algorithm referred by 9 Explicitly re start Arnoldi method with deflated Schur vector step 1 random vector u step 2 1 0 step 3 Q 20 v zu step 4 Arnoldi decompose AQ Q H 1 step 5 solve Hessenberg system H S 05 step 6 check convergence 2 5 9 lt eps step 7 deflation if 6 S is converged then Y 0 8 v C y v Av fork 1 2 1 1 1 1 v end if step 8 if one more eigenpair desired then 7 Q S a sampling goto step 4
63. t problem and data format a 74 44 3 Overview of the algorithm 2 1 400 75 4 4 4 Argument Details and Error Code 76 References Re DRE E EPIRI q Paese ku qur ue Eo ERRARE RR Ra REL dur aa 81 Appendix A Sample code of OpenATI EIGENSOLVE for thread safe 82 OpenATLib Xabclib User s Manual for Version 1 0 1 Overview In this manual functions for numerical library developers in OpenATLib and Xabclib are explained Fig 1 1 and Fig 1 2 show the components of function on Xabclib and Xabclib Eigensolver with Parameter Setup Function Numerical Policy Interface OpenATI EIGENSOLVE OpenATI INIT Eigensolver with Restart Frequency Auto tuning Facility Auto tuning Function Xabclib LANCZOS OpenATI DAFRT Xabclib ARNOLDI Sparse Matrix Vector Multiply Auto tuning Function D SIU RMV Setup Function for OpenATI_D SIU RMV OpenATI D SIUJRMV Setup Gram Schmidt orthonormalization function OpenATI DAFGS Fig 1 1 Components of Function on Eigensolver OpenATLib Xabclib User s Manual for Version 1 0 Linearsolver with Numerical Policy Interface OpenATI LINEARSOLVE C _ Linear solver with Auto tuning Facility Xabclib GMRES Xabclib BICGSTAB Parameter Setup Function OpenATI
64. triangle part NEV Integer The number of eigenvalues you need The execution time increases according to the NEV If NEV gt 100 the execution time will be enormous hence it may not solve in practical time EV NEV OUTPUT The eigenvalues The k th eigenvalue is set to EV k EVEC Double OUTPUT The eigenvectors The k the eigenvector LDE NEV corresponding to the eigenvalue EV k is set to the rT k th column LDE The leading dimension of EVEC array LDE gt N IATPARAM Integer INPUT Array of integer parameters for OpenATLib and 50 2 Xabclib RATPARA Double INPUT Array of double precision parameters for OpenATLib M 50 and Xabclib WK Double WORK Workspace rmm LWK Integer INPUT The size of the double precision workspace WK Satisfy LWK gt 1 MSIZE N 2 MSIZE MSIZE 7 MSIZE 5 NEV 2 MSIZE IATPARAM 27 IWK Integer WORK Workspace amo 57 OpenATLib Xabclib User s Manual for Version 1 0 LIWK INPUT Satisfy INFO OUTPUT Error code 2 Using parameters on IATPARAM Number Type Initial IO Value IATPARAM 3 OMP G INPUT The size of the integer workspace IWK LIWK gt 5 MSIZE 3 MSIZE IATPARAM 27 Description Number of THREADS IATPARAM 4 Flag of Krylov subspace expand by MM ratio IATPARAM 5 IATPARAM 7 IATPARAM 8 Intege 12 INPUT r OUTPU T 58 incremental value for Krylov subspace when M
65. ument Details and Error Code of OpenATI DURMV Setup 1 Argument Details Argument Description N The number of dimension for the matrix N gt 1 NNZ The number of non zero elements for the matrix IRP N 1 Integer INPUT Pointers to first elements on each row for the E matrix IATPARAM Integer INPUT Array of integer parameters for OpenATLib and 50 OUTPUT Xabclib RATPARA Double INPUT Array of double precision parameters for M 50 OpenATLib and Xabclib UINF Double OUTPUT IATPARAM 10 11 LUINF No returns IATPARAM 10 12 13 21 Returns the groups of rows processed each thread or information array for segmented scan LUINF Integer INPUT The size of UINF IATPARAM 10 11 LUINF gt 0 IATPARAM 10 12 LUINF gt int 0 5 NUM_SMP 1 IATPARAM 10 13 LUINF gt int 1 5 N int 4 25 JL 10 JL IATPARAM 11 IATPARAM 10 21 LUINF gt int 1 125 NNZ int 2 125 JL 10 NUM_SMP IATPARAM 3 JL IATPARAM 11 INFO OUTPUT Error Code 31 OpenATLib Xabclib User s Manual for Version 1 0 2 Using parameters on IATPARAM Number Type Initial IO Value IATPARAM 9 Integer 3 INPUT IATPARAM 10 Integer 12 INPUT OUTPU T IATPARAM 11 Integer 128 INPUT 3 Using parameters on RATPARAM Description DURMV auto tuned On Off 0 Perform SpMxV specified by IATPARAM 10 1 Perform SpMxV specified by IATPARAM 10 and auto configure IATPARAM 11 2
66. urce code for setting default parameters like Fig 2 1 3 Call target functions of OpenATLib on own library source code 4 Describe makefile to link libOpenAT a INTEGER IATPARAM 50 DOUBLE PRECISION RATPARAM 50 CALL OpenATI INIT IATPARAM RATPARAM INFO CALL OpenATI LINEARSOLVE N NZ IRBPICOL VAL B X IATPARAM RATPARAM INFO Fig 2 1 An Example of using the OpenATLib 13 OpenATLib Xabclib User s Manual for Version 1 0 2 2 Linking and Running OpenATLib and Xabclib 2 2 1 Directory structure Directory structure of this software 1s described as following Fig 2 2 SET Fig 2 2 Directory structure of OpenATLib and Xabclib 1 OpenATLib Xabclib User s Manual for Version 1 0 2 2 2 Compiling Compiling the current version OpenATLib and Xabclib requires the following installed version on your system 8 Intel amp Fortran Compiler version 11 0 or higher b HITACHI Optimized Fortran the environment variables COMP must be set to HITACHI For compiling OpenATLib Xabclib and making archive file libOpenAT a you run shell script make all on source directory 2 2 3 Running sample programs Sample programs are compiled by running the make command using the makefile on each sample directory And you try to run executable file by shell script test sh 15 OpenATLib Xabclib User s Manual for Version 1 0 3 OpenATLib A Common Auto tuning
Download Pdf Manuals
Related Search
Related Contents
Neff H53W50N3GB microwave AI4100 Portable AEI Reader Q-Logic Network Cables 9000 User's Manual SPT100A Operation User Manual V1.02 - GPS Transition Networks CFMFF4040-100 network media converter Alto-Shaam AR-7EVH Electric Grill User Manual FORMAL ACTIVITY ANSWERS: PAR64 CX-1 RGBW LED PAR user manual Copyright © All rights reserved.
Failed to retrieve file