Home

Sun Performance Library User`s Guide

1. END WHER CALL DGER M N 1 0D0 X LDX V1 1 K 1 V2 1 K 1 Chapter 2 Using Sun Performance Library 31 32 Because the code to replace negative numbers with zero in V2 has no natural analog in Sun Performance Library that code is pulled out of the outer loop With that code removed to its own loop the rest of the loop is a rank 1 update of the general matrix x that can be replaced with the DGER routine from BLAS The amount of performance increase can also depend on the data the Sun Performance Library routine uses For example if V2 contains many negative or zero values the majority of the time might not be spent in the rank 1 update In this case replacing the code with a call to DGER might not increase performance Evaluating other loop indexes can affect the Sun Performance Library routine used For example if the reference to K is a loop index the loops in the code sample shown above might be part of a larger code structure where the loops over DGEMV or DGER could be converted to some form of matrix multiplication If so a single call to a matrix multiplication routine can increase performance more than using a loop with calls to DGER Because all Sun Performance Library routines are MT safe multithread safe using the auto parallelizing compiler to parallelize loops that contain calls to Sun Performance Library routines can increase performance on MP platforms An
2. DO K 1 N3 DO I 1 N1 WRITE 5 F4 1 2X A I J K J 1 N2 END DO WRITE END DO END SUBROUTINE PRINT_REAL_AS_COMPLEX N1 N2 N3 A LD1 LD2 INTEGER N1 N2 N3 I J K COMPLEX A LD1 LD2 DO K 1 N3 DO I 1 N1 80 Sun Performance Library User s Guide May 2003 CODE EXAMPLE 5 1 Linear Real to Complex FFT and Complex to Real FFT Continued WRITE 5 Al F4 1 A1 F4 1 A1 2X AIMAG A I J K END DO WRITE END DO END REAL A I J K N2 1 T Ly my_system s 95 dalign testscm f xlic_lib sunperf my_system a out Linear complex to real and real to complex FFT of a sequence s 0 1 7 0 0 2 8 0 0 3 9 0 out of place forward FFT of X Z 0 6 0 0 24 0 0 0 0 2 0 1 1 5 0 9 in place forward FFT of X Q c67 01 0 24 505 0s 0 SQa2 051 eld 0279 out of place inverse FFT of Z ooo WNE wo oO I ooo in place inverse FFT of Z ooo wry wo oO N ooo CODE EXAMPLE 5 1 Notes The forward FFT of X is actually 0 6 0 0 24 0 0 0 Z 0 2 0 1 1 5 0 9 0 2 0 1 1 5 0 9 Because of symmetry Z 2 is the complex conjugate of Z 1 and therefore only the first two 1 2 complex values are stored For the in place forward transform SFFTCM is called with real array X as the output array to be of type COMPLI the input and output Because
3. END DO SOMP END PARALLEL SECTIONS Only one level of parallelism exists which are the two sections Further parallelism within a DGI EMM call is suppressed Sun Performance Library User s Guide May 2003 Synchronization Mechanisms The underlying parallelization model determines the Sun Performance Library behavior The two basic modes of multithreading compiler parallelization and POSIX or Solaris threads use two different types of synchronization mechanisms Compiler parallelized code uses spin waits which produce the most responsive synchronization operations but aggressively consume CPU cycles Compiler parallelized code produces optimal performance when each thread has a dedicated CPU but wastes resources when other jobs or threads are also competing for CPUs However codes that explicitly use POSIX or Solaris threads use synchronization functions from 1ibthread These synchronization functions are less responsive but they relinquish the CPU when the thread is idle providing good throughput and resource usage in a shared oversubscribed environment With compiler parallelization the environment variable SUNW_MP_THR_IDLE can be used at run time to alter the spin wait characteristics of the threads Legal settings of SUNW_MP_THR_IDLE are as follows oe setenv SUNW_MP_THR_IDLE spin setenv SUNW_MP_THR_IDLE 2s oe ae setenv SUNW_MP_THR_IDL
4. Complex data type m ZBDSQR Double complex data type If a routine name is not available for S B C and z the x prefix will not be used and each routine name will be listed 129 LAPACK Routines TABLE A 1 lists the Sun Performance Library LAPACK routines P denotes routines that are parallelized TABLE A 1 LAPACK Linear Algebra Package Routines Routine Function Bidiagonal Matrix SBDSDC or DBDSDC XBDSOR Diagonal Matrix Computes the singular value decomposition SVD of a bidirectional matrix using a divide and conquer method Computes SVD of real upper or lower bidiagonal matrix using the bidirectional QR algorithm SDISNA or DDISNA General Band Matrix XGBBRD XGBCON XGBEQU XGBRF S XGBSV XGBSVX XGBTRF XGBTRS P Computes the reciprocal condition numbers for eigenvectors of real symmetric or complex Hermitian matrix Reduces real or complex general band matrix to upper bidiagonal form Estimates the reciprocal of the condition number of general band matrix using LU factorization Computes row and column scalings to equilibrate a general band matrix and reduce its condition number Refines solution to general banded system of linear equations Solves a general banded system of linear equations simple driver Solves a general banded system of linear equations expert driver LU factorization of a general band matrix using partial pivoting with row interchanges
5. Computes selected eigenvalues and eigenvectors of a Hermitian band matrix Reduces Hermitian definite banded generalized eigenproblem to standard form Replacement with newer version CHBGVD or ZHBGVD suggested Computes all eigenvalues and eigenvectors of a generalized Hermitian definite banded eigenproblem Computes all eigenvalues and eigenvectors of generalized Hermitian definite banded eigenproblem and uses a divide and conquer method to calculate eigenvectors Computes selected eigenvalues and eigenvectors of a generalized Hermitian definite banded eigenproblem Reduces Hermitian band matrix to real symmetric tridiagonal form by using a unitary similarity transform Hermitian Matrix CHECON or ZHECON CHEEV or ZHEEV CHEEVD or ZHEEVD CHEEVR or Estimates the reciprocal of the condition number of a Hermitian matrix using the factorization computed by CHETRF or ZHETRF Replacement with newer version CHEEVR or ZHEEVR suggested Computes all eigenvalues and eigenvectors of a Hermitian matrix simple driver Replacement with newer version CHEEVR or ZHEEVR suggested Computes all eigenvalues and eigenvectors of a Hermitian matrix and uses a divide and conquer method to calculate eigenvectors Computes selected eigenvalues and the eigenvectors of a complex Hermitian matrix Appendix A Sun Performance Library Routines 133 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Rou
6. Solves a general banded system of linear equations using the factorization computed by xGBTRF General Matrix Unsymmetric or Rectangular XGEBAK XGEBAL XGEBRD XGECON Forms the right or left eigenvectors of a general matrix by backward transformation on the computed eigenvectors of the balanced matrix output by xGEBAL Balances a general matrix Reduces a general matrix to upper or lower bidiagonal form by an orthogonal transformation Estimates the reciprocal of the condition number of a general matrix using the factorization computed by xGETRF 130 Sun Performance Library User s Guide May 2003 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Routine Function XxGEEQU Computes row and column scalings intended to equilibrate a general rectangular matrix and reduce its condition number XGEES Computes the eigenvalues and Schur factorization of a general matrix simple driver XGEESX Computes the eigenvalues and Schur factorization of a general matrix expert driver XGEEV Computes the eigenvalues and left and right eigenvectors of a general matrix simple driver XGEEVX Computes the eigenvalues and left and right eigenvectors of a general matrix expert driver XGEGS Depreciated routine replaced by xGGES XGEGV Depreciated routine replaced by xGGEV XGEHRD Reduces a general matrix to upper Hessenberg form by an orthogonal similarity transformation xGELOF P Computes LQ f
7. CGSSPS or ZGSSPS One call interface to sparse solver Sparse solver initialization Fill reducing ordering and symbolic factorization Matrix value input and numeric factorization Triangular solve Sets user specified ordering permutation Returns permutation used by solver Returns condition number estimate of coefficient matrix De allocates sparse solver Prints solver statistics Signal Processing Library Routines Sun Performance Library contains routines for computing the fast Fourier transform sine and cosine transforms and convolution and correlation Appendix A Sun Performance Library Routines 149 FFT Routines Sun Performance Library provides a set of FFT interfaces that supersedes a subset of the FFTPACK and VFFTPACK routines provided in earlier Sun Performance Library releases The legacy FFT routines and man pages for the routines are still included to maintain compatibility with existing codes but the routines are no longer supported For information on using the legacy FFT routines see the section 3P man pages TABLE A 7 shows the mapping between the Sun Performance Library FFT routines and the corresponding FFTPACK and VFFTPACK routines P denotes routines that are parallelized TABLE A 7 FFT Routines Routine Replaces Function CFFTC P CFFTI Initialize the trigonometric weight and factor tables or compute CFFTF P the one dimensional forward or inverse FFT of a co
8. N 2 n 0 V D SINT Notes a Mx N 1 values are needed to compute the VFST of M N point sequences a The input and output sequences are stored row wise a V D SINT is normalized and is its own inverse Calling V D SINT twice yields the original data D SINQF Forward FST of a Quarter Wave Odd Sequence The forward FST of a quarter wave odd sequence is computed as n n 1 2k 1 N 2 X k 2 x n sin on n 0 x V 1cos ak k 0 N 1 N values are needed to compute the forward FST of an N point quarter wave odd sequence D SINQB Inverse FST of a Quarter Wave Odd Sequence The inverse FST of a quarter wave odd sequence is computed as N 1 x n 2 Yr X K sin BAERS n ONAL k 0 Calling the forward and inverse routines will result in the original input scaled by 1 in Sun Performance Library User s Guide May 2003 V D SINOF Forward FST of One or More Quarter Wave Odd Sequences The forward FST of one or more quarter wave odd sequences is computed as Fori 0 M 1 iL Pees n 1 2k 1 Mn X i k 2 x n isin ZOARCO sN 1 i coszk k 0 N 1 a L 2N V D SINQF Notes a The input and output sequences are stored row wise m The transform is normalized so that if the inverse routine V D SINOB is called immediately after calling V D SINOF the original data is obtained V D SINQB Inverse FST of One or More Quarter Wave Odd Sequences The inverse FST of one or more
9. ordmthd outunt msglvl handle ier if ier ne 0 goto 110 deallocate sparse solver storage call dgssda handle ier if ier ne 0 goto 110 print values of sol write 6 200 i rhs i expected rhs i error do i 1 negns write 6 300 i rhs i xexpct i rhs i xexpct 1 enddo stop 110 continue c call to sparse solver returns an error write 6 400 amp example FAILED sparse solver error number ier stop 200 format a5 3a20 300 format i5 3d20 12 i sol xexpct values Chapter 4 Working With Matrices 63 64 CODE EXAMPLE 4 1 400 format a60 i20 end my_sytem a out CODE EXAMPLE 4 2 G implicit none integer character double precision integer double precision integer page 3 Ax b solve for ow OO Q oO OO a Oy Or O O14 Oo OW ON Al GP 3Q SORE OP QO LO OP IQ oO O gt Sun Performance Library User s Guide May 2003 i expected rhs i T rhs i 1 0 200000000000D 01 0 2 0 200000000000D 01 0 3 0 100000000000D 01 0 4 0 800000000000D 01 0 5 0 500000000000D 00 0 program example_ss c This program is an example driver that calls th Sparse matrix structure and value arrays Solving a Symmetric System One Call Interface Continued fail message sparse solver error number my_system s 95 dalign example_lcall f xlic_lib sunperf error 528466159722D 13 105249142
10. Input vector X Ls Bz 3 3 Input vector Y Sun Performance Library User s Guide May 2003 CODE EXAMPLE 5 12 Convolution Used to Compute the Product of a Vector and Circulant Matrix Continued 4 5 6 Output vector Z 34 Bis 328 The difference between this example and the previous example is that the length of the output vector is the same as the length of the input vectors so there are no implied zeros on the end of the input vectors With no implied zeros to shift into the effect of an end off shift from the previous example does not occur and the end around shift results in a circulant matrix product CODE EXAMPLE 5 13 Two Dimensional Convolution Using Direct Method my_system cat con_ex23 f 1000 PROGRAM TEST INTEGER M N PARAMETER M 2 PARAMETER N 3 INTEGER yet COMPLEX P1 M N P2 M N P3 M N DATA Pl 1 2 3 4 5 6 P2 1 2 3 4 5 6 EXTERNAL CCNVCOR2 PRINT Pl PRINT 1000 P1 I J J 1 N I 1 M PRINT P2 PRINT 1000 P2 I J J ll m 2 H ll ki Z CALL CCNVCOR2 V Direct No Transpose X No Overwrite X No Transpose Y No Overwrite Y M N Pl M My Np 0 0 P2 M M N P3 M 07 0 PRINT P3 PRINT 1000 P3 I J J 1 N I 1 M FORMAT 3 F5 1 F5 1 i END Chapter 5 Us
11. N3 inverse FFT V LDR1 2 LDR2 Compute 3 dimensional out of plac First leading dimension of Z LD R1 must be even CALL CFFTS3 1 N1 N2 N3 SCALI IFAC SW 0 IERR E Y LDY1 LDY2 Z LDR1 LDR2 TRIGS WRITE out of place inverse FFT of Y DO K 1 N3 DOT Aly N WRITE 5 F5 1 2X AAt K a 1 N2 END DO WRITE END DO Compute 3 dimensional in place inverse FFT Y which is complex array containing input data is also used to store real results as a real array its first leading dimension is 2 LDY1 CALL CFFTS3 1 N1 N2 N3 SCALE Y LDY1 LDY2 Y 2 LDY1 LDY2 5 TRIGS IFAC SW LW IERR WRITE in place inverse FFT of Y CALL PRINT_COMPLEX_AS REAL N1 N2 N3 Y 2 LDY1 LDY2 DEALLOCATE SW END PROGRAM TESTSC3 SUBROUTINE PRINT_COMPLEX_AS REAL N1 N2 N3 A LD1 LD2 INTEGER N1 N2 N3 I J K REAL A LD1 LD2 DO K 1 N3 DO I 1 N1 WRITE 5 F5 1 2X A I J K J 1 N2 END DO WRITE END DO END SUBROUTINE PRINT_REAL AS COMPLEX N1 N2 N3 A LD1 LD2 INTEGER N1 N2 N3 I J K COMPLEX A LD1 LD2 Chapter 5 Using Sun Performance Library Signal Processing Routines 93 CODE EXAMPLE 5 4 Dimensional Array Continued Three Dimensional Real to Complex FFT and Complex to Real FFT of a Three
12. NZ Length of output vectors where Nz 2 0 Z Result vectors LDZ Leading dimension of the array containing the result matrix Z where LDZ 2 MAX 1 MZ WORKIN Work array LWORK Length of work array 1 When the sizes of the two matrices to be convolved are similar the FFT method is faster than the direct method However when one sequence is much larger than the other such as when convolving a large data set with a small filter the direct method performs faster than the FFT based method Work Array WORK for Convolution and Correlation Routines The minimum dimensions for the WORK work arrays used with the one dimensional and two dimensional convolution and correlation routines are shown in TABLE 5 11 The minimum dimensions for one dimensional convolution and correlation routines depend upon the values of the arguments NPRE NX NY and NZ Chapter 5 Using Sun Performance Library Signal Processing Routines 113 114 The minimum dimensions for two dimensional convolution and correlation routines depend upon the values of the arguments shown TABLE 5 9 TABLE 5 9 Arguments Affecting Minimum Work Array Size for Two Dimensional Routines SCNVCOR2 DCNVCOR2 CCNVCOR2 and ZCNVCOR2 Argument Definition X Number of rows in the filter matrix Y Number of rows in the input matrix Z Number of output vectors NX Number of columns in the filter matrix Y Number of columns in the input matrix NZ Length of output vectors PRE Nu
13. TR TB TP SV_I GER_I SY SP R_I Operation General Matrix Vector Product and Update Symmetric Matrix Vector Product and Update Triangular Matrix Vector Product and Update Triangular Matrix Solve and Update General Matrix Rank One Update Symmetric Matrix Rank One Update Chapter 6 Interval BLAS Routines 125 126 TABLE 6 6 O n Matrix Operations Name Operation GE GB SY SB SP TR TB TP _NORM_I Matrix Norms GE GB _DIAG_SCALE_I Scale General Matrix Rows or Columns and Update GE GB _LRSCALE_I Scale General Matrix Rows and Columns and Update Sy SB SP _LRSCALE_I Scale Symmetric Matrix Rows and GE GB SY SB SP TR GE GB SY SB SP TR TB TP _ADD_I Add Scaled Matrices Columns and Update TB TP _ACC_I Add Scaled Matrices and Update TABLE 6 7 O n Matrix Operations Name Operation General Matrix Matrix Product and Update GEMM_T SYMM_I TRMM_I TRSM_I Triangular Matrix Solve Symmetric General Matrix Matrix Product and Update Triangular General Matrix Matrix Product and Update TABLE 6 8 Matrix Movements Operation Name GE GB SY SB SP TR TB TP _COPY_I Copy Matrix GE_TRANS_I Transpose Matrix GE_PERMUTE_I Permute Matrix Sun Performance Library User s Guide May 2003 TABLE 6 9 Vector Set Operations Name ENCLOSEV_I INTERIORV_I DISJOINTV_I INTERSECTV_I WINTERSECTV_I HULLV_I WHULLV_I Operation Enclose Vector Test Vecto
14. documentation that is installed with your software a Information on support levels m User forums a Downloadable code samples a New technology previews You can find additional resources for developers at http www sun com developers Before You Begin 17 Contacting Sun Technical Support If you have technical questions about this product that are not answered in this document go to http www sun com service contacting Sun Welcomes Your Comments Sun is interested in improving its documentation and welcomes your comments and suggestions Email your comments to Sun at this address docfeedback sun com Please include the part number 817 0935 10 of your document in the subject line of your email 18 Sun Performance Library User s Guide May 2003 CHAPTER 1 Introduction Sun Performance Library is a set of optimized high speed mathematical subroutines for solving linear algebra and other numerically intensive problems Sun Performance Library is based on a collection of public domain applications available from Netlib at http www netlib org Sun has enhanced these public domain applications and bundled them as the Sun Performance Library The Sun Performance Library User s Guide explains the Sun specific enhancements to the base applications available from Netlib Reference material describing the base routines is available from Netlib and the Society for Industrial and Applied Mathematics
15. http www netlib org fftpack Routines with a V prefix are vectorized routines that are based on the routines contained in VFFTPACK http www netlib org vfftpack Fast Cosine and Sine Transform Routines TABLE 5 5 lists the Sun Performance Library fast cosine and sine transforms Names of double precision routines are in square brackets Routines whose name begins with v can compute the transform of one or more sequences simultaneously Those whose name ends with I are initialization routines TABLE 5 5 Fast Cosine and Sine Transform Routines and Their Arguments Name Arguments Fast Cosine Transforms for Even Sequences COST DCOST LEN 1 X WORK COSTI DCOSTI LEN 1 WORK VCOST VDCOST M LEN 1 X WORK LD TABLE VCOSTI VDCOSTI LEN 1 TABLE Fast Cosine Transforms for Quarter Wave Even Sequences COSQF DCOSQF LEN X WORK COSQB DCOSQB LEN X WORK COSQI DCOSQI LEN WORK VCOSQF VDCOSQF M LEN X WORK LD TABLE VCOSQB VDCOSQB M LEN X WORK LD TABLE VCOSQI VDCOSQT LEN TABLE Fast Sine Transforms for Odd Sequences SINT DSINT LEN 1 X WORK SINTI DSINTI LEN 1 WORK VSINT VDSINT M LEN 1 X WORK LD TABLE VSINTI VDSINTI LEN 1 TABLE Fast Sine Transforms for Quarter Wave Odd Sequences SINQF DSINOQF LEN X WORK Chapter 5 Using Sun Performance Library Signal Processing Routines 97 98 TABLE 5 5 Fast Cosine and Sine Trans
16. its first leading dimension as a real output array would be 2 x N Conversely if the input is of type real stored in a real array with first leading dimension 2 x N then to use the same array to store the complex results its first leading dimension as a complex output array would be N Leading dimension requirements for in place and out of place transforms can be found in TABLE 5 2 TABLE 5 3 and TABLE 5 4 In the linear and multi dimensional FFT the transform between real and complex data through a real to complex or complex to real transform can be confusing because N1 real data points correspond to 1 complex data points N1 real data points do map to N1 complex data points but because there is conjugate symmetry in the complex data only 1 data points need to be stored as input in the complex to real transform and as output in the real to complex transform In the multi dimensional FFT symmetry exists along all the dimensions not just in the first However the two dimensional and three dimensional FFT routines store the complex data of the second and third dimensions in their entirety While the FFT routines accept any size of N1 N2 and N3 FFTs can be computed most efficiently when values of N1 N2 and N3 can be decomposed into relatively small primes A real to complex or a complex to real transform can be computed most efficiently when N1 N2 N3 2 x3 x4 x5 and a complex to complex transform can be computed most effi
17. the SAXPY routine due type shape or number mismatches m Incorrect type of the arguments If SAXPY is called as follows CALL AXPY 100 ALPHA X INCX Y INCY A compiler error occurs because mixing parameter types such as COMPLEX ALPHA and REAL X is not supported m Incorrect shape of the arguments If SAXPY is called as follows CALL AXPY N RALPHA XA INCX Y INCY A compiler error occurs because the XA argument is two dimensional but the interface is expecting a one dimensional argument Chapter 2 Using Sun Performance Library 29 m Incorrect number of arguments If SAXPY is called as follows CALL AXPY RALPHA X INCX Y A compiler error occurs because the compiler cannot find a routine in the AXPY interface group that takes four arguments of the following form AXPY REAL REAL 1 D ARRAY INTEGER REAL 1 D ARRAY In the following example the 95 keyword parameter passing capability can allow a user to make essentially the same call using that capability CALL AXPY ALPHA RALPHA X X INCX INCX Y Y This is a valid call to the AXPY interface It is necessary to use keyword parameter passing on any parameter that appears in the list after the first OPTIONAL parameter is omitted The following calls to the AXPY interface are valid LL AXPY N RALPHA X Y Y INCY INCY LL AXPY N RALPHA X INCX Y LPHA X Y Y LL AXPY N RA
18. 136 Sun Performance Library User s Guide May 2003 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Routine Function SORMTR or Multiplies a general matrix by the orthogonal transformation matrix DORMTR reduced to tridiagonal form by SSYTRD or DSYTRD Symmetric or Hermitian Positive Definite Band Matrix xXPBCON XPBEQU XPBRES XPBSTF XPBSV XPBSVX XPBTRF XPBTRS P Estimates the reciprocal of the condition number of a symmetric or Hermitian positive definite band matrix using the Cholesky factorization returned by xPBTRF Computes equilibration scale factors for a symmetric or Hermitian positive definite band matrix Refines solution to a symmetric or Hermitian positive definite banded system of linear equations Computes a split Cholesky factorization of a real symmetric positive definite band matrix Solves a symmetric or Hermitian positive definite banded system of linear equations simple driver Solves a symmetric or Hermitian positive definite banded system of linear equations expert driver Computes Cholesky factorization of a symmetric or Hermitian positive definite band matrix Solves symmetric positive definite banded matrix using the Cholesky factorization computed by xPBTRF Symmetric or Hermitian Positive Definite Matrix XPOCON XPOEQU XPORF S xXPOSV XPOSVX xXPOTRFE xXPOTRI P Estimates the reciprocal of the condition number of a symmetric
19. 2 m A George and J W H Liu Computer Solution of Large Sparse Positive Definite Systems Prentice Hall Inc Englewood Cliffs New Jersey 1981 m E Ng and B W Peyton Block Sparse Cholesky Algorithms on Advanced Uniprocessor Computers SIAM M Sci Comput 14 1034 1056 1993 a lan S Duff Roger G Grimes and John G Lewis User s Guide for the Harwell Boeing Sparse Matrix Collection Release I Technical Report TR PA 92 86 CERFACS Lyon France October 1992 72 Sun Performance Library User s Guide May 2003 CHAPTER 5 Using Sun Performance Library Signal Processing Routines The discrete Fourier transform DFT has always been an important analytical tool in many areas in science and engineering However it was not until the development of the fast Fourier transform FFT that the DFT became widely used This is because the DFT requires O N computations while the FFT only requires O Nlog N operations Sun Performance Library contains a set of routines that computes the FFT related FFT operations such as convolution and correlation and trigonometric transforms This chapter is divided into the following three sections m Forward and Inverse FFT Routines m Sine and Cosine Transforms m Convolution and Correlation Each section includes examples that show how the routines might be used For information on the Fortran 95 and C interfaces and types of arguments used in each routine see the
20. 2X BIMAG Y I J J 1 END DO WRITE Compute 2 dimensional Use workspace already V which is real array in place forward FFT allocated containing input data is also used to store complex results as a complex array its first FF F oo leading dimension is LDR1 2 CALL SFFTC2 1 N1 N2 ONE V LDR1 V LDR1 2 TRIGS IFAC SW LW IERR in place forward FFT of X CALL PRINT_REAL_AS_COMPLEX N1 2 1 N2 Compute 2 dimensional out of place invers Leading dimension of Z must be even CALL CFFTS2 1 N1 N2 SCALE Y LDY1 Z LDR1 TRIGS IFAC SW 0 IERR WRITE out of place invers DO I W END DO WRITE 2 dimensional in plac FFT is complex array containing input data is also as a real array its first ry WRITE 1 V LDR1 2 N2 EP T FFT of Y Z I J J 1 N2 invers Compute Y which used store real results leading dimension is 2 LDY1 CALL CFFTS2 1 N1 N2 SCALE CO OF Y LDY1 Y 2 LDY1 Chapter 5 Using Sun Performance Library Signal Processing Routines CODE EXAMPLE 5 3 Two Dimensional Real to Complex FFT and Complex to Real FFT of a Two Dimensional Array Continued 88 TRIGS IFAC SW 0 IERR WRITE in place inverse FFT of Y CALL PRINT _COMPLEX AS REAL N1 N2 1 Y 2 LDY1
21. 4 Working With Matrices Most matrices can be stored in ways that save both storage space and computation time Sun Performance Library uses the following storage schemes a Banded storage m Packed storage The Sun Performance Library processes matrices that are in one of four forms m General a Triangular m Symmetric m Tridiagonal Storage schemes and matrix types are described in the following sections Matrix Storage Schemes Some Sun Performance Library routines that work with arrays stored normally have corresponding routines that take advantage of these special storage forms For example DGBMV will form the product of a general matrix in banded storage and a vector and DTPMV will form the product of a triangular matrix in packed storage and a vector Banded Storage A banded matrix is stored so the jth column of the matrix corresponds to the jth column of the Fortran array 49 The following code copies a banded general matrix in a general array into banded storage mode Cc Copy the matrix A from the array AG to the array AB The G matrix is stored in general storage mode in AG and it will Cc be stored in banded storage mode in AB The code to copy C from general to banded storage mode is taken from the C comment block in the original DGBFA by Cleve Moler C NSUB 1 NSUPER 2 NDIAG NSUB 1 NSUPER DO ICOL 1 N Il MAXO 1 ICOL NSUPER I2 MINO N ICOL NSUB DO IROW Il I2 IRO
22. 9 What Is Not in This Book 10 Related Documents and Web Sites 10 Typographic Conventions 11 Shell Prompts 12 Accessing Compiler Collection Tools and Man Pages 13 Accessing Compiler Collection Documentation 15 Accessing Related Solaris Documentation 17 Resources for Developers 17 Contacting Sun Technical Support 18 Sun Welcomes Your Comments 18 Introduction 19 Libraries Included With Sun Performance Library 19 Netlib 20 Sun Performance Library Features 21 Mathematical Routines 21 Compatibility With Previous LAPACK Versions 22 4 Getting Started With Sun Performance Library 23 Enabling Trap 6 24 Using Sun Performance Library 25 Improving Application Performance 25 Replacing Routines With Sun Performance Library Routines 25 Improving Performance of Other Libraries 26 Using Tools to Restructure Code 26 Fortran Interfaces 26 Fortran SUNPERF Module for Use With Fortran 95 27 Optional Arguments 29 Fortran Examples 30 C Interfaces 33 C Examples 35 SPARC Optimization and Parallel Processing 37 Using Sun Performance Library on SPARC Platforms 37 Compiling for SPARC Platforms 38 Compiling Code for a 64 Bit Enabled Solaris Operating Environment 39 64 Bit Integer Arguments 39 Parallel Processing 42 Run Time Issues 42 Degree of Parallelism 43 Synchronization Mechanisms 45 Parallel Processing Examples 45 Working With Matrices 49 Matrix Storage Schemes 49 Banded Storage 49 Packed Storage 50 Sun Performance Li
23. LL AXPY ALPHA RALPHA X X Y Y Fortran Examples To increase the performance of single processor applications identify code constructs in an application that can be replaced by calls to Sun Performance Library routines Performance of multiprocessor applications can increased by identifying opportunities for parallelization 30 Sun Performance Library User s Guide May 2003 To increase application performance by modifying code to use Sun Performance Library routines identify blocks of code that exactly duplicate the capability of a Sun Performance Library routine The following code example is the matrix vector product y Ax y which can be replaced with the DGEMV subroutine END DO In other cases a block of code can be equivalent to several Sun Performance Library calls or contain portions of code that can be replaced with calls to Sun Performance Library routines Consider the following code example DO I 1 N IF V2 I K LT 0 0 THEN V2 I K 0 0 LSE DO J 1 M X J I X J I Vl d K V2 I K END DO END IF END DO The code example can be rewritten to use the Sun Performance Library routine DGER as shown here DO I 1 N IF V2 I K LT 0 0 THEN V2 I K 0 0 END IF END DO CALL DGER M N 1 0D0 X LDX V1 1 K 1 V2 1 K 1 The same code example can also be rewritten using Fortran 95 specific statements as shown here
24. Library fast cosine and sine transform routines are based on the routines contained in FFTPACK http www netlib org fftpack Routines with a V prefix are vectorized routines that are based on the routines contained in VFFTPACK http www netlib org vfftpack TABLE A 8 lists the Sun Performance Library sine and cosine transform routines TABLE A 8 Sine and Cosine Transform Routines Routine COSQB DCOSQB VCOSQB VDCOSQB COSQF DCOSQF VCOSQF VDCOSQF COSQI DCOSQI VCOSQI VDCOSQI COST DCOST VCOST VDCOST COSTI DCOSTI VCOSTI VDCOSTI SINQB DSINQB VSINQB VDSINQB SINQF DSIN F VSINQF VDSINQF Function Cosine quarter wave synthesis Cosine quarter wave transform Initialize cosine quarter wave transform and synthesis Cosine even wave transform Initialize cosine even wave transform Sine quarter wave synthesis Sine quarter wave transform 152 Sun Performance Library User s Guide May 2003 TABLE A 8 Sine and Cosine Transform Routines Continued Routine Function SINQI DSINOQI Initialize sine quarter wave transform and synthesis VSINQI VDSINQT SINT DSINT Sine odd wave transform VSINT VDSINT SINTI DSINT Initialize sine odd wave transform VSINTI VDSINTI Convolution and Correlation Routines TABLE A 9 lists the Sun Performance Library convolution and correlation routines TABLE A 9 Convolution and Correlation Routines Routines Function xCNVCOR Co
25. N2 END PROGRAM TESTSC2 SUBROUTINE PRINT_COMPLEX_AS_REAL N1 N2 N3 A LD1 LD2 INTEGER N1 N2 N3 I J K REAL A LD1 LD2 DO K 1 N3 DOIE Ty NI WRITE 5 F5 1 2X A I J K J 1 N2 END DO WRITE END DO END SUBROUTINE PRINT_REAL_AS_COMPL INTEGER N1 N2 N3 I J K COMPLEX A LD1 LD2 JES X N1 N2 N3 A LD1 LD2 DO K 1 3 DO I 1 N1 WRITE 5 A1 F5 1 A1 F5 1 A1 2X REAL A I J K AIMAG A I J K J 1 N2 END DO WRITE END DO END my_system s 95 dalign testsc2 f xlic_lib sunperf my_system a out Two dimensional complex to real and real to complex FFT x 0 1 0 4 0 7 1 0 0 2 0 5 0 8 1 1 0 3 0 6 2 0 1 2 out of place forward FFT of X Yu O Geor 0 0 iC 2 97 28 C OT 060 00 29 1 8 mlar Tss OLS 5 H 010 at SO Sp MeO iC 0S 58S 0 in place forward FFT of X CB OO AC A2 OF e Os Te 0 00 KC 2595 1 38 e213 C0 by eRe O GO op AO 00 55 1 0 out of place inverse FFT of Y 0 1 0 4 0 7 1 0 OZ 05 1OL8 1 023 0267 2 07152 in place inverse FFT of Y Sun Performance Library User s Guide May 2003 CODE EXAMPLE 5 3 Two Dimensional Real to Complex FFT and Complex to Real FFT of a Two Dimensional Array Continued 0 1 0 4 0 7 1 0 062 055 5 04 850 1 0 239046 2206 1 02 Three Dimensional FFT Routines Sun Performance Library includes routines
26. Routines SCNVCOR DCNVCOR CCNVCOR and ZCNVCOR Continued Argument Definition INC2Z WORK LWORK Stride between output vectors in Z where INC2Z gt 0 Work array Length of work array 1 When the lengths of the two sequences to be convolved are similar the FFT method is faster than the direct method However when one sequence is much larger than the other such as when convolving a large time series signal with a small filter the direct method performs faster than the FFT based method The two dimensional convolution and correlation routines use the arguments shown in TABLE 5 8 TABLE 5 8 Arguments for Two Dimensional Convolution and Correlation Routines SCNVCOR2 DCNVCOR2 CCNVCOR2 and ZCNVCOR2 Argument CNVCOR METHOD TRANSX SCRATCHX TRANSY SCRATCHY MX NX LDX MY Definition v or v specifies that convolution is computed R or r specifies that correlation is computed T or t specifies that the Fourier transform method is used D or d specifies that the direct method is used where the convolution or correlation is computed from the definition of convolution and correlation N or n specifies that X is the filter matrix T or t specifies that the transpose of X is the filter matrix N or n specifies that X must be preserved S or s specifies that X can be used for scratch space The contents of X a
27. SIAM Libraries Included With Sun Performance Library Sun Performance Library contains enhanced versions of the following standard libraries m LAPACK version 3 0 For solving linear algebra problems m BLAS1 Basic Linear Algebra Subprograms For performing vector vector operations a BLAS2 For performing matrix vector operations m BLASS For performing matrix matrix operations The BLAS1 BLAS2 and BLAS3 libraries do not have version numbers There has been only one version of the BLAS routines on Netlib 20 Note LINPACK has been removed from Sun Performance Library LAPACK version 3 0 supersedes LINPACK and all previous versions of LAPACK If the LINPACK routines are still needed the LINPACK library and documentation can be obtained from www netlib org Sun Performance Library is available in both static and dynamic library versions optimized for the V8 V8 and V9 architectures Sun Performance Library supports static and shared libraries on Solaris 7 Solaris 8 and Solaris 9 and adds support for multiple processors Sun Performance Library LAPACK routines have been compiled with a Fortran 95 compiler and remain compatible with the Netlib LAPACK version 3 0 library The Sun Performance Library versions of these routines perform the same operations as the Fortran callable routines and have the same interface as the standard Netlib versions LAPACK contains driver computational and auxiliary routine
28. SUNWspro docs index html and http docs sun com If your software is not installed in the opt directory ask your system administrator for the equivalent path on your system Document Title Description Numerical Computation Guide Describes issues regarding the numerical accuracy of floating point computations 16 Sun Performance Library User s Guide May 2003 Accessing Related Solaris Documentation The following table describes related documentation that is available through the docs sun com web site Document Collection Document Title Description Solaris Reference Manual Collection Solaris Software Developer Collection Solaris Software Developer Collection Resources for Developers See the titles of man page sections Linker and Libraries Guide Multithreaded Programming Guide Provides information about the Solaris operating environment Describes the operations of the Solaris link editor and runtime linker Covers the POSIX and Solaris threads APIs programming with synchronization objects compiling multithreaded programs and finding tools for multithreaded programs Visithttp www sun com developers studioandclick the Compiler Collection link to find these frequently updated resources m Articles on programming techniques and best practices m A knowledge base of short programming tips Documentation of compiler collection components as well as corrections to the
29. User s Guide May 2003 TABLE A 7 FFT Routines Continued Routine Replaces Function DFFTZ2 DFFT2I Initialize the trigonometric weight and factor tables or compute DFFT2F the two dimensional forward FFT of a two dimensional double precision array DFFTZ3 P DFFT3I Initialize the trigonometric weight and factor tables or compute DFFT3F the three dimensional forward FFT of three dimensional double precision array DFFTZM VDFFTI Initialize the trigonometric weight and factor tables or compute VDFFTF P the one dimensional forward FFT of a set of data sequences stored in a two dimensional double precision array SFFTC RFFTI RFFTF Initialize the trigonometric weight and factor tables or compute EZFFTI EZFFTF the one dimensional forward FFT of a real sequence SFFTC2 RFFT2I Initialize the trigonometric weight and factor tables or compute RFFT2F the two dimensional forward FFT of a two dimensional real array SFFTC3 P RFFT3I Initialize the trigonometric weight and factor tables or compute RFFT3F the three dimensional forward FFT of three dimensional real array SFFTCM VRFFTI Initialize the trigonometric weight and factor tables or compute VREETE P the one dimensional forward FFT of a set of data sequences stored in a two dimensional real array ZFFTD DFFTI DFFTB Initialize the trigonometric weight and factor tables or compute DEZFFTI DEZFFTB the one dimensional inverse FFT of a double complex sequence ZFFTD2 DFFT2I Initialize the trigonome
30. Vv or v specifies that convolution is computed R or r specifies that correlation is computed FOUR T or t specifies that the Fourier transform method is used D or d specifies that the direct method is used where the convolution or correlation is computed from the definition of convolution and correlation 1 NX Length of filter vector where NX 2 0 x Filter vector IFX Index of first element of X where NX gt IFX 2 1 INCX Stride between elements of the vector in X where INCX gt 0 NY Length of input vectors where NY 0 NPRE Number of implicit zeros prefixed to the Y vectors where NPRE 2 0 M Number of input vectors where M 2 0 Y Input vectors IFY Index of the first element of Y where NY gt IFY gt 1 INC1Y Stride between elements of the input vectors in Y where INC1Y gt 0 INC2Y Stride between input vectors in Y where INC2Y gt 0 NZ Length of the output vectors where NZ 0 K Number of Z vectors where K 2 0 If K lt M only the first K vectors will be processed If K gt M all input vectors will be processed and the last M K output vectors will be set to zero on exit Z Result vectors IFZ Index of the first element of Z where NZ gt IFZ 2 1 INC1Z Stride between elements of the output vectors in Z where INCYZ gt Q Chapter 5 Using Sun Performance Library Signal Processing Routines 111 112 TABLE 5 7 Arguments for One Dimensional Convolution and Correlation
31. c It factors and solves a structurally symmetric system e w unsymmetric values c implicit none integer neqns ier msglvl outunt ldrhs nrhs character mtxtyp 2 pivot l ordmthd 3 double precision handle 150 integer colstr 5 rowind 8 double precision values 8 rhs 4 xexpct 4 integer i c c Sparse matrix structure and value arrays Coefficient matrix c has a symmetric structure and unsymmetric values Ax b solve for x where c c 1 0 3 0 0 0 0 0 1 0 7 0 e 2 0 4 0 0 0 7 0 2 0 38 0 c A 0 0 0 0 6 0 0 0 x 3 0 b 18 0 c 0 0 5 0 0 0 8 0 4 0 42 0 c data colstr 1 3 6 7 9 data rowind 1 2 1 2 4 3 2 4 data values 1 0d0 2 0d0 3 0d0 4 0d0 5 0d0 6 0d0 7 0d0 amp 8 0d0 data rhs 7 000 38 0d0 18 0d0 42 0d0 data xexpct 1 0d0 2 0d0 3 0d0 4 0d0 c initialize solver mtxtyp su pivot n neqns 4 Chapter 4 Working With Matrices 67 68 CODE EXAMPLE 4 3 Regular Interface Continued Solving a Structurally Symmetric System With Unsymmetric Values outunt 6 msglvl 0 call regular interface call dgssin mtxtyp pivot neqns colstr rowind amp outunt msglvl handle ier if ier ne 0 goto 110 ordering and symbolic factorization ordmthd mmd call dgssor ordmthd handle ier if ier ne 0 goto 110 numeric factorization call dgssfa neqns colstr rowind values handle ier if ier ne 0 goto 110 solution nrhs 1
32. depends upon matrix type and applies to all routine names in this table 95 Equivalent ISEMPTY A c INF A c SUP A c MID A c WID A A INTERVAL b c References The following white paper is available online See the Interval Arithmetic readme for the location of this file Interval Version of the Basic Linear Algebra Subprograms Standard IBLAS derived by G W Walster from the draft INTERVAL BLAS Chapter 5 prepared by Chenyi Hu et al to be included in the Basic Linear Algebra Subprogram Technical BLAST Forum Standard 128 Sun Performance Library User s Guide May 2003 APPENDIX A Sun Performance Library Routines This appendix lists the Sun Performance Library routines by library routine name and function For a description of the function and a listing of the Fortran and C interfaces refer to the section 3P man pages for the individual routines For example to display the man page for the SBDSQR routine type man s 3P sbdsqr The man page routine names use lowercase letters For many routines separate routines exist that operate on different data types Rather than list each routine separately a lowercase x is used in a routine name to denote single double complex and double complex data types For example the routine xBDSOR is available as four routines that operate with the following data types m SBDSOR Single data type m BBDSOR Double data type m CBDSQR
33. disjoint Empty entry and its location Appendix A Sun Performance Library Routines 159 TABLE A 11 Interval BLAS Routines Continued Routine Function tr_encm_i If an interval matrix is enclosed in another tr_hullm_i Convex hull of two interval matrices tr_infm_i Left endpoint of an interval matrix tr_interiorm_i If an interval matrix is in interior of another tr_interm_i Intersection of two interval matrices tr midm_i Midpoint matrix of an interval matrix tr _norm_i Triangular interval matrix norms tr_supm_i Right endpoint of an interval matrix tr_whullm_i Convex hull of two interval matrices tr_widthm_i Elementwise width of an interval matrix tr winterm_i Intersection of two interval matrices waxpby_i Scaled vector addition weancel_i Scaled cancellation whullv_i Convex hull of an interval vector with another widthv_i The elementwise width of an interval vector winterv_i Intersection of an interval vector with another See the section 3P man pages for information on using each routine Sort Routines TABLE A 12 lists the Sun Performance Library sort routines P denotes routines that are parallelized TABLE A 12 Sort Routines Routines Function BLAS_DSORT P Sorts a real double precision vector X in increasing or decreasing order using quick sort algorithm BLAS_DSORTV P Sorts a real double precision vector X in increasing or decreasing order using quick sort algorithm and overwri
34. input is one or more real sequences each containing N1 real data points the result will be one or more complex sequences that are conjugate symmetric That is X k X NI kK k Mths NI I Sun Performance Library User s Guide May 2003 The imaginary part of X 0 is always zero If N1 is even the imaginary part of X is also zero Both zeros are stored explicitly Because the second half of each sequence can be derived from the first half only 1 complex data points are computed and stored in the output array Here and elsewhere in this chapter integer division is rounded down With the inverse transform if an N1 point complex to complex transform is being computed then N1 unrelated data points are expected in each input sequence and N1 data points will be returned in the output array However if an N1 point complex to real transform is being computed only the first 1 complex data points of each conjugate symmetric input sequence are expected in the input and the routine will return N1 real data points in each output sequence For each value of N1 either the forward or the inverse routine must be called to compute the factors of N1 and the trigonometric weights associated with those factors before computing the actual FFT The factors and trigonometric weights can be reused in subsequent transforms as long as N1 remains unchanged TABLE 5 2 lists the single precision linear FFT routines and their purposes For r
35. matrix 53 141 symmetric matrix in packed storage 139 symmetric or Hermitian positive definite band matrix 137 symmetric or Hermitian positive definite matrix 137 symmetric or Hermitian positive definite matrix in packed storage 138 symmetric or Hermitian positive definite tridiagonal matrix 138 symmetric sparse matrix 55 T trap 6 enabling 24 trapezoidal matrix 143 triangular band matrix 142 triangular matrix 52 142 143 triangular matrix in packed storage 142 tridiagonal matrix 54 type Independence 27 typographic conventions 11 U unitary matrix 143 unitary matrix in packed storage 144 Index 167 unsymmetric sparse matrix 56 upper Hessenberg matrix 135 USE SUNPERF enabling Fortran 95 features 27 USE_THREADS routine 43 V VFFTPACK 97 150 152 X xarch 38 XFFTOPT 96 xlic_lib sunperf 23 38 xtypemap 40 168 Sun Performance Library User s Guide May 2003
36. my_system 95 dalign cost f xlic_lib sunperf my_system a out Input sequence of length 4 requires 5 data points 0455701603 0 210 0 352 0 867 Forward fast cosine transform 3 753 0 046 1 004 0 666 0 066 Inverse fast cosine transform results scaled by 1 2 N 0 557 0 603 0 210 0 352 0 867 Sun Performance Library User s Guide May 2003 CODE EXAMPLE 5 7 calls VCOSQF and VCOSQB to compute the FCT and the inverse FCT respectively of two real quarter wave even sequences If the real sequences are of length 2N only N input data points need to be stored and the number of resulting data points is also N The results are stored in the input array CODE EXAMPLE 5 7 Compute the FCT and the Inverse FCT of Two Real Quarter wave Even Sequences my_system cat vcosq f program vcosq implicit none integer parameter len 4 m 2 ld mtl real x ld len xt ld len work 3 len 15 z ld len integer i j call RANDOM_NUMBER x Z xX write a27 i1 Input sequences of length len do j 1 m write a3 i1 a4 4 5 3 2x al seq j x j 1 i 1 len end do call vcosqi len work call vcosgqf m len z xt ld work write Forward fast cosine transform for quarter wav ven sequences do j 1 m write a3 i1 a4 4 5 3 2x al seq j z j i i 1 len end do call vcosqb m len z xt ld work write Inverse fast cosine transform for q
37. of a symmetric or Hermitian positive definite tridiagonal matrix Solves a symmetric or Hermitian positive definite tridiagonal system of linear equations using the LDL factorization returned by xPTTRF 138 Sun Performance Library User s Guide May 2003 TABLE A 1 Routine LAPACK Linear Algebra Package Routines Continued Function Real Symmetric Band Matrix SSBEV or DSBEV SSBEVD or DSBEVD SSBEVX or DSBEVX SSBGST or DSBGST SSBGV or DSBGV SSBGVD or DSBGVD SSBGVX or DSBGVX SSBTRD or DSBTRD Replacement with newer version SSBEVD or DSBEVD suggested Computes all eigenvalues and eigenvectors of a symmetric band matrix Computes all eigenvalues and eigenvectors of a symmetric band matrix and uses a divide and conquer method to calculate eigenvectors Computes selected eigenvalues and eigenvectors of a symmetric band matrix Reduces symmetric definite banded generalized eigenproblem to standard form Replacement with newer version SSBGVD or DSBGVD suggested Computes all eigenvalues and eigenvectors of a generalized symmetric definite banded eigenproblem Computes all eigenvalues and eigenvectors of generalized symmetric definite banded eigenproblem and uses a divide and conquer method to calculate eigenvectors Computes selected eigenvalues and eigenvectors of a generalized symmetric definite banded eigenproblem Reduces symmetric band matrix to real symmetric tridiagonal for
38. or Hermitian positive definite matrix using the Cholesky factorization returned by xPOTRF Computes equilibration scale factors for a symmetric or Hermitian positive definite matrix Refines solution to a linear system in a Cholesky factored symmetric or Hermitian positive definite matrix Solves a symmetric or Hermitian positive definite system of linear equations simple driver Solves a symmetric or Hermitian positive definite system of linear equations expert driver Computes Cholesky factorization of a symmetric or Hermitian positive definite matrix Computes the inverse of a symmetric or Hermitian positive definite matrix using the Cholesky factorization returned by xPOTRF Appendix A Sun Performance Library Routines 137 TABLE A 1 Routine LAPACK Linear Algebra Package Routines Continued Function XPOTRS P Solves a symmetric or Hermitian positive definite system of linear equations using the Cholesky factorization returned by xPOTRF Symmetric or Hermitian Positive Definite Matrix in Packed Storage xXPPCON XPPEQU XPPRES XPPSV XPPSVX XPPTRE xXPPTRI XPPTRS P Reciprocal condition number of a Cholesky factored symmetric positive definite matrix in packed storage Computes equilibration scale factors for a symmetric or Hermitian positive definite matrix in packed storage Refines solution to a linear system in a Cholesky factored symmetric or Hermitian positive definite m
39. or liable for any content advertising products or other materials on or available from such sites or resources Sun will not be responsible or liable for any damage or loss caused or alleged to be caused by or in connection with use of or reliance on any such content goods or services available on or through any such sites or resources Before You Begin 15 Documentation in Accessible Formats The documentation is provided in accessible formats that are readable by assistive technologies for users with disabilities You can find accessible versions of documentation as described in the following table If your software is not installed in the opt directory ask your system administrator for the equivalent path on your system Type of Documentation Format and Location of Accessible Version Manuals except third party HTML at http docs sun com manuals Third party manuals HTML in the installed software through the documentation Standard C Library Class index at file opt SUNWspro docs index html Reference e Standard C Library User s Guide Tools h Class Library Reference Tools h User s Guide Readmes and man pages HTML in the installed software through the documentation index at file opt SUNWspro docs index html Release notes HTML at http docs sun com Related Compiler Collection Documentation The following table describes related documentation that is available at file opt
40. quarter wave odd sequences is computed as Fori 0 M 1 N 1 x n i X k isin ON k 0 T n 1 2k 2 V D SINQB Notes a The input and output sequences are stored row wise m The transform is normalized so that if V D SINQB is called immediately after calling V D SINQF the original data is obtained Chapter 5 Using Sun Performance Library Signal Processing Routines 103 104 Fast Cosine Transform Examples CODE EXAMPLE 5 6 calls COST to compute the FCT and the inverse transform of a real even sequence If the real sequence is of length 2N only N 1 input data points need to be stored and the number of resulting data points is also N 1 The results are stored in the input array CODE EXAMPLE 5 6 Compute FCT and Inverse FCT of Single Real Even Sequence my_system cat cost f program cost implicit none integer parameter len 4 real x 0O len work 3 lent 1 15 z 0 len scale integer i scale 1 0 2 0 len call RANDOM_NUMBER x 0 len z O len x 0 len write a25 i1 a10 i1 al2 Input sequence of length Ss len requires lentl data points write 5 f8 3 2x x i i 0 len call costi lent l work call cost lent l z work write Forward fast cosine transform write 5 f8 3 2x z i i 0 len call cost lent l z work write S Inverse fast cosine transform results scaled by 1 2 N write 5 f8 3 2x z i scale i 0 len end
41. symmetric matrix in packed storage Product of a symmetric matrix and a vector Rank 1 update to a real symmetric matrix Rank 2 update to a real symmetric matrix Product of a triangular matrix in banded storage and a vector Solution to a triangular system in banded storage of linear equations Product of a triangular matrix in packed storage and a vector Solution to a triangular system of linear equations in packed storage Product of a triangular matrix and a vector Solution to a triangular system of linear equations 146 Sun Performance Library User s Guide May 2003 BLAS3 Routines TABLE A 4 lists the Sun Performance Library BLAS3 routines P denotes routines that are parallelized TABLE A 4 BLAS3 Basic Linear Algebra Subprograms Level 3 Routines Routine Function XGEMM P Product of two general matrices CHEMM P or Product of a Hermitian matrix and a general matrix ZHEMM P CHERK P or Rank k update of a Hermitian matrix ZHERK P CHER2K P or Rank 2k update of a Hermitian matrix ZHER2K P XSYMM P Product of a symmetric matrix and a general matrix xSYRK P Rank k update of a symmetric matrix xXSYR2K P Rank 2k update of a symmetric matrix XTRMM P Product of a triangular matrix and a general matrix xTRSM P Solution for a triangular system of equations Sparse BLAS Routines TABLE A 5 lists the Sun Performance Library sparse BLAS routines P denotes routines that are parallelize
42. two packages m Netlib Sparse BLAS package by Dodson Grimes and Lewis consists of sparse extensions to the Basic Linear Algebra Subroutines that operate on sparse vectors a NIST National Institute of Standards and Technology Fortran Sparse BLAS Library consists of routines that perform matrix products and solution of triangular systems for sparse matrices in a variety of storage formats Refer to the following sources for additional sparse BLAS information a For information on the Sun Performance Library Sparse BLAS routines refer to the section 3P man pages for the individual routines For more information on the Netlib Sparse BLAS package refer to http www netlib org sparse blas index html a For more information on the NIST Fortran Sparse BLAS routines refer to http math nist gov spblas Naming Conventions The Netlib Sparse BLAS and NIST Fortran Sparse BLAS Library routines each use their own naming conventions as described in the following two sections Netlib Sparse BLAS Each Netlib Sparse BLAS routine has a name of the form Prefix Root Suffix where the m Prefix represents the data type Chapter 4 Working With Matrices 57 58 Root represents the operation m Suffix represents whether or not the routine is a direct extension of an existing dense BLAS routine TABLE 4 1 lists the naming conventions for the Netlib Sparse BLAS vector routines TABLE 4 1 Netlib Sparse BLAS Naming Conventions Ope
43. 0 len write a25 il1 a10 1i1 al2 Input sequence of length len requires len 1 data points write 3 8 3 2x x i i 0 len 2 call sinti len 1 work call sint len 1 z work write Forward fast sine transform write 3 8 3 2x z i i 0 len 2 106 Sun Performance Library User s Guide May 2003 CODE EXAMPLE 5 8 Compute FST and the Inverse FST of a Real Odd Sequence Continued call sint len 1 z work write 3 8 3 2x z i scale i 0 len 2 end my_system s 95 dalign sint f xlic_lib sunperf my_system a out Input sequence of length 4 requires 3 data points 0 557 0 603 0 210 Forward fast sine transform 2 297 0 694 0 122 Inverse fast sine transform results scaled by 1 2 N 0 557 0 603 0 210 write Inverse fast sine transform results scaled by 1 2 N In CODE EXAMPLE 5 9 VSINQF and VSINQB are called to compute the FST and inverse FST respectively of two real quarter wave odd sequences If the real sequence is of length 2N only N input data points need to be stored and the number of resulting data points is also N The results are stored in the input array CODE EXAMPLE 5 9 Compute FST and Inverse FST of Two Real Quarter Wave Odd Sequences my_system cat vsing f program vsing implicit none integer parameter len 4 m 2 ld m 1 real x ld len xt ld len work 3 len 15 z ld len integer i j call RANDOM _NUMBER x Z
44. 147 Sparse Solver Routines 149 Signal Processing Library Routines 149 Miscellaneous Signal Processing Routines 153 Interval BLAS IBLAS Routines 154 Sort Routines 160 Index 163 6 Sun Performance Library User s Guide May 2003 TABLE 3 1 TABLE 4 1 TABLE 4 2 TABLE 4 3 TABLE 4 4 TABLE 5 1 TABLE 5 2 TABLE 5 3 TABLE 5 4 TABLE 5 5 TABLE 5 6 TABLE 5 7 TABLE 5 8 TABLE 5 9 TABLE 5 10 TABLE 5 11 TABLE 6 1 TABLE 6 2 Tables Comparison of 32 bit and 64 bit Operating Environments 37 Netlib Sparse BLAS Naming Conventions 58 NIST Fortran Sparse BLAS Routine Naming Conventions 59 Sparse Solver Routines 60 Sparse Solver Routine Calling Order 61 FFT Routines and Their Arguments 74 Single Precision Linear FFT Routines 77 Single Precision Two Dimensional FFT Routines 85 Single Precision Three Dimensional FFT Routines 90 Fast Cosine and Sine Transform Routines and Their Arguments 97 Convolution and Correlation Routines 110 Arguments for One Dimensional Convolution and Correlation Routines SCNVCOR DCNVCOR CCNVCOR and ZCNVCOR 111 Arguments for Two Dimensional Convolution and Correlation Routines SCNVCOR2 DCNVCOR2 CCNVCOR2 and ZCNVCOR2 112 Arguments Affecting Minimum Work Array Size for Two Dimensional Routines SCNVCOR2 DCNVCOR2 CCNVCOR2 and ZCNVCOR2 114 MYC_INIT and NYC_INIT Dependencies 114 Minimum Dimensions and Data Types for WoRK Work Array Used With Convolution and Correl
45. 25 0 0 20 0 0 0 0 0 0 16 0 To represent A in CSC format colptr 1 6 7 8 9 10 m rowind 1 2 3 4 5 2 3 4 5 m values 4 0 1 0 2 0 0 5 2 0 0 5 3 0 0 625 16 0 Structurally Symmetric Sparse Matrices A structurally symmetric sparse matrix has nonzero values with the property that if a i j 0 then a j i 0 for all i and j When solving a structurally symmetric system the entire matrix must be passed to the solver routines An example of a structurally symmetric matrix is shown below 1 0 3 0 0 0 0 0 2 0 4 0 0 0 7 0 0 0 0 0 6 0 0 0 0 0 5 0 0 0 8 0 To represent A in CSC format colptr 1 3 6 7 9 m rowind 1 2 1 2 4 m values 1 0 2 0 3 0 3 2 4 4 0 5 0 6 0 7 0 8 0 Unsymmetric Sparse Matrices An unsymmetric sparse matrix does not have a i j a j i for all i and j The structure of the matrix does not have an apparent pattern When solving an unsymmetric system the entire matrix must be passed to the solver routines An example of an unsymmetric matrix is shown below Sun Performance Library User s Guide May 2003 1 0 0 0 00 0 0 0 0 2 0 6 0 0 0 0 0 9 0 A 13 0 00 7 0 0 0 0 0 40 0 0 0 0 80 0 0 5 0 0 0 0 0 0 0 10 0 To represent A in CSC format colptr 1 6 7 8 9 11 m rowind 1 2 3 4 5 2 3 4 2 5 m values 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 Sun Performance Library Sparse BLAS The Sun Performance Library sparse BLAS package is based on the following
46. 3 Name SFFTC2 CFFTS2 conjugate symmetry only the first 1 complex data points need to be stored in the input or output array along the first dimension The complex subarray X 4 1 N1 1 can be obtained from X 0 as follows X k n X N1 k n k Mith NI I n 0 N2 1 To compute a two dimensional transform an FFT routine must be called twice One call initializes the routine and the second call actually computes the transform The initialization includes computing the factors of N1 and N2 and the trigonometric weights associated with those factors In subsequent forward or inverse transforms initialization is not necessary as long as N1 and N2 remain unchanged IMPORTANT Upon returning from a two dimensional FFT routine Y 0 N 1 contains the transform results and the original contents of Y N LDY 1 is overwritten Here N N1 in the complex to complex and complex to real transforms and N 1 in the real to complex transform TABLE 5 3 lists the single precision two dimensional FFT routines and their purposes The same information applies to the corresponding double precision routines except that their data types are double precision and double complex See TABLE 5 3 for the mapping Refer to the individual man pages for a complete description of the routines and their arguments Single Precision Two Dimensional FFT Routines Purpose Size Type of Size Type of Input Output Leading Dimensio
47. 3 SPARC Optimization and Parallel Processing 43 44 The number of OpenMP threads can be set by a variety of means For example by setting the OMP_NUM_THREADS environment variable or by setting the OMP_SET_N UM_THREADS run time call If both environment variables are set they must be set to the same value If the run time function is called it overrides any environment variable setting The degree of parallelization within a pure OpenMP code can be set with the OMP_NUM_T HREADS environment variable The Sun Performance Library USE_THREADS routine can also be used to set the degree of parallelism for Sun Performance Library calls which overrides the OMP_NUM_THREADS value In the following code example each DGEMM call would be parallelized I PAR D DO I CALL END DO OSERIAL 1 N DGEMM Note that the DOSERIAL directive suppresses parallelization but only for the loop nest within the same subroutine and it is overridden by any other directive within that nest The DOSERIAL directive does not impact parallelization within Sun Performance Library In the following code example there will be at most two way parallelism regardless of the setting of the number of OpenMP threads SOMP DO I CALL SOMP PARALLEL SECTIONS SECTION 1 N 2 DGEMM END DO SOMP DO I CALL SECTION 2 1 N DGEMM
48. 73 121 129 man pages accessing 13 MANPATH environment variable setting 14 matrix banded 49 bidiagonal 130 diagonal 130 general 51 130 general band 130 general tridiagonal 132 Hermitian 133 Hermitian band 133 Hermitian in packed storage 134 real orthogonal 136 real orthogonal in packed storage 136 real symmetric band 139 real symmetric tridiagonal 140 Index 165 structurally symmetric sparse 56 symmetric 53 141 symmetric banded 54 symmetric in packed storage 139 symmetric or Hermitian positive definite 137 symmetric or Hermitian positive definite band 137 symmetric or Hermitian positive definite in packed storage 138 symmetric or Hermitian positive definite tridiagonal 138 symmetric sparse 55 trapezoidal 143 triangular 52 142 143 triangular band 142 triangular in packed storage 142 tridiagonal 54 unitary 143 unitary in packed storage 144 unsymmetric sparse 56 upper Hessenberg 135 misalign 46 MT safe routines 32 multithreading compiler parallelization 45 POSIX Solaris threads 45 N naming conventions IBLAS 122 IBLAS prefixes 122 Netlib 20 Netlib Sparse BLAS 57 naming conventions 57 NIST Fortran Sparse BLAS 57 naming conventions 58 O odd sequences fast sine transform routines 97 OMP_NUM_THREADS 44 one call interface 60 optimizing 64 bit code 38 166 Sun Performance Library User s Guide May 2003 SPARC instruction set 38 optional 95 argum
49. 734D 12 350830475782D 13 426325641456D 13 660582699652D 14 200000000000D 01 200000000000D 01 100000000000D 01 800000000000D 01 500000000000D 00 Oo OO CO Oo Solving a Symmetric System Regular Interface my_system cat example_ss f c It factors and solves a symmetric system sparse solver neqns ier msglvl outunt ldrhs nrhs mtxtyp 2 pivot l1 ordmthd 3 handle 150 colstr 6 rowind 9 values 9 rhs 5 xexpct 5 i From George and Liu x where 0 5 2 0 2 0 7 0 0 0 0 0 2 0 3 0 0 0 0 0 x 1 0 b 7 0 0 625 0 0 8 0 4 0 0 0 16 0 0 5 4 0 CODE EXAMPLE 4 2 Solving a Symmetric System Regular Interface Continued data colstr 1 6 7 8 9 10 data rowind gt Ty 27 37 4y 5y 25 Bi Sp SL data values 4 0d0 1 0d0 2 0d0 0 5d0 2 0d0 0 5d0 amp 3 0d0 0 625d0 16 0d0 data rhs 7 0d0 3 0d0 7 0d0 4 0d0 4 0d0 data xexpct 2 0d0 2 0d0 1 0d0 8 0d0 0 5d0 initialize solver mtxtyp ss pivot n neqns 5 outunt 6 msglvl 0 c call regular interface call dgssin mtxtyp pivot neqns colstr rowind amp outunt msglvl handle ier if ier ne 0 goto 110 ordering and symbolic factorization c ordmthd mmd call dgssor ordmthd handle ier if ier ne 0 goto 110 c numeric factorization call dgssfa neqns colstr rowind values handle ier if ier ne 0 goto 110 solution nrhs 1 ldrhs 5 call dgs
50. Array Two Dimensional Real to Complex FFT and Complex to Real FFT of a Two my_system S cat testsc2 f PROGRAM TESTSC2 IMPLICIT NONE INTEGER PARAMETER N1 3 N2 4 LDX1 N1 S LDY1 N1 2 1 LDR1 2 N1 2 1 INTEGER LW IERR I J K IFAC 128 2 REAL PARAMETER ONE 1 0 SCALE ONE N1 N2 REAL V LDR1 N2 X LDX1 N2 Z LDR1 N2 SW 2 N2 TRIGS 2 N1 N2 COMP LEX Y LDY1 N2 WRITE S Two dimensional complex to real and real to complex FFT WRITE X RESHAPE SOURCE 1 2 3 4 5 6 7 8 2 0 1 0 1 1 1 2 SHAPE LDX1 N2 DO I 1 N2 V 1 N1 I X 1 N1 1 86 Sun Performance Library User s Guide May 2003 CODE EXAMPLE 5 3 Dimensional Array Continued Two Dimensional Real to Complex FFT and Complex to Real FFT of a Two END DO WRITE DO I WRITE 5 F5 1 2X END DO WRITE Initialize trig table and get factors of N1 N2 CALL SFFTC2 0 N1 N2 ONE X LDX1 Y LDY1 TRIGS IFAC SW 0 TERR Compute 2 dimensional out of place forward FFT Let FFT routine allocate memory cannot do an in place transform in X because LDX1 lt 2 N1 2 1 CALL SFFTC2 1 N1 N2 ONE X LDX1 Y LDY1 TRIGS IFAC SW 0 IERR out of place forward FFT of X WRI WRI DO I W C RI N2 EAL Y I J F5 1 Al F5 1 Al1
51. Continued nrhs 1 ldrhs 5 call dgsssl nrhs rhs ldrhs handle ier if ier ne 0 goto 110 deallocate sparse solver storage call dgssda handle ier if ier ne 0 goto 110 print values of sol c write 6 200 i rhs i expected rhs i error do i 1 negns write 6 300 i rhs i xexpct i rhs i xexpct 1 enddo stop 110 continue c c call to sparse solver returns an error c write 6 400 amp example FAILED sparse solver error number ier stop 200 format a5 3a20 300 format 15 3d20 12 i sol xexpct values 400 format a60 1i20 fail message sparse solver error number end my_system s 95 dalign example_uu f xlic_lib sunperf my_system a out ae rhs i expected rhs i error 100000000000D 01 100000000000D 01 000000000000D 00 200000000000D 01 200000000000D 01 000000000000D 00 300000000000D 01 300000000000D 01 000000000000D 00 400000000000D 01 400000000000D 01 000000000000D 00 500000000000D 01 500000000000D 01 000000000000D 00 Owe w Ne O O O OLO O O O CO OO e E OO CO Oo Chapter 4 Working With Matrices 71 References The following books and papers provide additional information for the sparse BLAS and sparse solver routines a Dodson D S R G Grimes and J G Lewis Sparse Extensions to the Fortran Basic Linear Algebra Subprograms ACM Transactions on Mathematical Software June 1991 Vol 17 No
52. DO K 1 3 DO I 1 N1 END DO WRITI END DO END Gl my_system a out x 0 1 0 4 0 7 1 0 0 2 0 5 0 8 1 1 0 3 0 6 0 9 1 2 4 1 3 4 2 7 1 0 12 Ge Ac 3 61 263 LO T9252 out of place forward FFT of X 48 6 0 0 9 6 3 4 3 4 0 0 9 462 120 25 H22 7 Pio Bey 2627 33 0 0 0 6 0 7 0 7 0 0 0 3807 Ra OCS 25 Ze IY CaSO S847 in place forward FFT of X 48 6 0 0 9 6 3 4 3 4 0 0 CrP 4e27 SH beO C259 ose COLO Ge 33 0 02 0 E 620 FeO C 75 07 0 0 2 SO y DATs AZ 5p eh te SD 0G Bi out of place inverse FFT of Y 0 1 0 4 0 7 1 0 02 056 5 068 L 03 30 600 29 1 2 4 1 3 4 2 7 1 0 1 2 6 5 4 3 3 1 283 6 he Bd in place inverse FFT of Y 0 1 0 4 0 7 1 0 0 2 0 5 0 8 1 1 0 3 0 6 0 9 1 2 4 1 3 4 2 7 1 0 Le 2 62 9 448 35 253 the On he 9 2232 94 Sun Performance Library User s Guide May 2003 E 5 Al F5 1 A1 F5 1 A1 2X AIMAG A I J K my_system s 95 dalign testsc3 f xlic_lib sunperf Three dimensional complex to real and real to complex FFT EAL A I J K Comments When doing an in place real to complex or complex to real transform care must be taken to ensure the size of the input array is large enough to hold the results For example if the input is of type complex stored in a complex array with first leading dimension N then to use the same array to store the real results
53. DROTI P P P P Computes the dot product of a sparse vector and a full vector Computes the conjugate dot product of a sparse vector and a full vector Ellpack format matrix matrix multiply Ellpack format triangular solve Given a full vector creates a sparse vector and corresponding index vector Given a full vector creates a sparse vector and corresponding index vector and zeros the full vector Jagged diagonal matrix matrix multiply Right permutation of a jagged diagonal matrix Jagged diagonal triangular solve Applies a Givens rotation to a sparse vector and a full vector Given a sparse vector and corresponding index vector puts those elements into a full vector Skyline format matrix matrix multiply Skyline format triangular solve Variable block sparse row format matrix matrix multiply Variable block sparse row format triangular solve 148 Sun Performance Library User s Guide May 2003 Sparse Solver Routines TABLE A 6 lists the Sun Performance Library sparse solver routines TABLE A 6 Sparse Solver Routines Routines Function SGSSFS DGSSFS CGSSFS or ZGSSFS SGSSIN DGSSIN CGSSIN or ZGSSIN SGSSOR DGSSOR CGSSOR or ZGSSOR SGSSFA DGSSFA CGSSFA or ZGSSFA SGSSSL DGSSSL CGSSSL or ZGSSSL SGSSUO DGSSUO CGSSUO or ZGSSUO SGSSRP DGSSRP CGSSRP or ZGSSRP SGSSCO DGSSCO CGSSCO or ZGSSCO SGSSDA DGSSDA CGSSDA or ZGSSDA SGSSPS DGSSPS
54. E 100ms These settings would cause threads to spin wait default behavior spin for 2 seconds before sleeping or spin for 100 milliseconds before sleeping respectively The link time option xlic_lib sunperf links in Sun Performance Library functions that employ the same parallelization model as the user code as indicated by the xparallel xexplicitpar or xautopar compiler parallelization option Using Sun Performance Library routines do not change the spin wait behavior of the code Parallel Processing Examples The following sections demonstrate using the PARALLEL environment variable and the compile and linking options for creating code that supports using m A single processor a Multiple processors Chapter 3 SPARC Optimization and Parallel Processing 45 46 Using a Single Processor To use a single processor Call one or more of the routines Link with xlic_lib sunperf specified at the end of the command line Do not compile or link with xparallel xexplicitpar or xautopar Make sure the PARALLEL environment variable is unset or set equal to 1 The following example shows how to compile and link with libsunperf so cc dalign xarch any c xlic_lib sunperf or 95 dalign xarch any f95 xlic_lib sunperf Using Multiple Processors To compile for multiple processors m Use the same parallelization option for the compiling and linking commands m Specify t
55. FFT of X DO I 0 N1 1 WRITE 5 Al F5 1 A1 F5 1 A1 2X REAL Z I J AIMAG Z I J J 0 N2 1 END DO WRITE WRITE out of place forward FFT of X WRITE Y DO I 0 N1 1 WRITE 5 Al F5 1 A1 F5 1 A1 2X REAL Y I J AIMAG Y I J J 0 N2 1 END DO WRITE Compute in place inverse linear FFT CALL CFFTCM 1 N1 N2 SCALE Y LDY1 Y LDY1 TRIGS IFAC SW LW IERR IF IERR NE 0 THE PRINT ROUTINE RETURN WITH ERROR CODE IERR STOP END IF WRITE in place inverse FFT of Y WRITE Y DO I 0 N1 1 WRITE 5 Al F5 1 A1 F5 1 A1 2X REAL Y I J AIMAG Y I J J 0 N2 1 END DO DEALLOCATE SW END PROGRAM TESTCCM my_system 95 dalign testccm f xlic_lib sunperf my_system a out Linear complex to complex FFT of one or more sequences sty 1042 A OET 048 oO Tey rA E gy Zeg Chapter 5 Using Sun Performance Library Signal Processing Routines 83 CODE EXAMPLE 5 2 Linear Complex to Complex FFT Continued Cron Oel e Oer 1 0 Ted 6 Zed 252 O59 Ob aC dL ZR CMe dep aka Ge M2 226 0 in place forward FFT of X egy LEN ee Zedge BOY SC gap Aol Oy 2 62 0 57082 E O49 OeLy C HO0cd 2 0 2 7084 0 29 0 2 Ox5 O0 d5 Ocd e 26 HO Ty e020 1 Ody 0 32 out of place forward FFT of X Na 0D B2 E R a SOY G Ady Aea C2 65 2 055 m0 0S OA
56. GEVC XTGEXC xXTGSEN XTGSJA XTGSNA xTGSYL Computes right and or left generalized eigenvectors of two upper triangular matrices Reorders the generalized Schur decomposition of a real or complex matrix pair using an orthogonal or unitary equivalence transformation Reorders the generalized real Schur or Schur decomposition of two matrixes and computes the generalized eigenvalues Computes the generalized SVD from two upper triangular matrices obtained from xGGSVP Estimates reciprocal condition numbers for specified eigenvalues and eigenvectors of two matrices in real Schur or Schur canonical form Solves the generalized Sylvester equation Triangular Matrix in Packed Storage xTPCON XTPRES XTPTRI XTPTRS P Estimates the reciprocal or the condition number of a triangular matrix in packed storage Determines error bounds and estimates for solving a triangular system of linear equations where the coefficient matrix is in packed storage Computes the inverse of a triangular matrix in packed storage Solves a triangular system of linear equations where the coefficient matrix is in packed storage 142 Sun Performance Library User s Guide May 2003 TABLE A 1 Routine LAPACK Linear Algebra Package Routines Continued Function Triangular Matrix xTRCON XTREVC XTREXC XTRRE S XTRSEN XTRSNA XTRSYL XTRTRI XTRTRS P Estimates the reciprocal or the condition n
57. If a normalized FFT is followed by its inverse FFT the result is the original input data The Sun Performance Library FFT routines are not normalized However normalization can be done easily by calling the inverse FFT routine with the appropriate scaling factor stored in SCALE Chapter 5 Using Sun Performance Library Signal Processing Routines 75 76 m ERR A flag returning a nonzero value if an error is encountered in the routine and zero otherwise Linear FFT Routines Linear FFT routines compute the FFT of real or complex data in one dimension only The data can be one or more complex or real sequences For a single sequence the data is stored in a vector If more than one sequence is being transformed the sequences are stored column wise in a two dimensional array and a one dimensional FFT is computed for each sequence along the column direction The linear forward FFT routines compute N1 1 2nink X k y x nje NI k 0 N1 1 n 0 where i J 1 or expressed in polar form N1 1 2mnk 2mnk X k L xm cos NI isin 2 k 0 N1 1 n The inverse FFT routines compute N1 1 Qnink x n LY X ke oo WSO GNA k 0 or in polar form N1 1 xn xo cos 2E isin We OL N1 1 n 0 With the forward transform if the input is one or more complex sequences of size N1 the result will be one or more complex sequences each consisting of N1 unrelated data points However if the
58. LP64 xarch v8 v8plusa v8plusb v9 v9a v9b Fortran Integers INTEGER INTEGER 4 INTEGER 8 C Integers int long Floating point S D C Z S D C Z API Names of routines Names of routines with _ 64 suffix Using Sun Performance Library on SPARC Platforms The Sun Performance Library was compiled using the 95 compiler provided with this release The Sun Performance Library routines were compiled using dalign xparallel and xarch set to v8 v8plusa or v9a 37 When linking the program use dalign xlic_lib sunperf and the same command line options that were used when compiling If dalign cannot be used in the program supply a trap 6 handler as described in Getting Started With Sun Performance Library on page 23 If compiling with a value of xarch that is not one of v8 v8plusa v9a the compiler driver will select the closest match Sun Performance Library is linked into an application with the xlic_lib switch rather than the 1 switch that is used to link in other libraries as shown here my_system 95 dalign my _file f xlic_lib sunperf Compiling for SPARC Platforms Applications using Sun Performance Library can be optimized for specific SPARC instruction set architectures and for a 64 bit enabled Solaris operating environment The optimization for each architecture is targeted at one implementation of that architecture and includes optimizations for other architectures when it does not degrade the performance o
59. N aL E Od m0 S021 S0 a S Caa S O a OT in place inverse FFT of Y Y Os dy 02 O Oene 08 A Ley Dey eCe 20 ORES a E DE O ES S O eb EE S E 2 gs 242 0657 Obe C Delp 2s Cer Adag O Pea 280 Two Dimensional FFT Routines For the linear FFT routines when the input is a two dimensional array the FFT is computed along one dimension only namely along the columns of the array The two dimensional FFT routines take a two dimensional array as input and compute the FFT along both the column and row dimensions Specifically the forward two dimensional FFT routines compute N2 1 N1 1 2niln 2nijk ten E Yoaide Pe k 0 N1 l n 0 N2 1 1 0 j 0 and the inverse two dimensional FFT routines compute N2 1 N1 1 2niln 2Nijk D E E Xkne e 7 0 N1 1 1 0 N2 1 n 0 k 0 For both the forward and inverse two dimensional transforms a complex to complex transform where the input problem is N1 x N2 will yield a complex array that is also N1 x N2 When computing a real to complex two dimensional transform forward FFT if the real input array is of dimensions N1 x N2 the result will be a complex array of dimensions 4 1 x N2 Conversely when computing a complex to real transform inverse FFT of dimensions N1 x N2 an F 1 x N2 complex array is required as input As with the real to complex and complex to real linear FFT because of 84 Sun Performance Library User s Guide May 2003 TABLE 5
60. N1 3 N2 2 LDZ N1 LDC N1 LDX 2 LDC INTEGER DIMENSION IFAC 128 REAL SCALE REAL PARAMETER ONE 1 0 REAL DIMENSION SW N1 TRIGS 2 N1 REAL DIMENSION 0 LDX 1 0 N2 1 X V Y COMPLEX DIMENSION 0 LDZ 1 0 N2 1 Z workspace size LW N1 SCALE ONE N1 WRITE Linear complex to real and real to complex FFT of a sequence WRITE X RESHAPE SOURCE 1 2 3 0 0 0 0 0 0 7 8 9 0 0 0 0 0 0 SHAPE 6 2 V X WRITE X DO I 0 N1 1 WRITE 2 F4 1 2x X I Jd J 0 N2 1 END DO WRITE intialize trig table and compute factors of N1 CALL SFFTCM 0 N1 2 ONE X LDX Z LDZ TRIGS IFAC SW LW IERR IF IERR NE 0 THE PRINT ROUTINE RETURN WITH ERROR CODE IERR STOP END IF Compute out of place forward linear FFT Let FFT routine allocate memory CALL SFFTCM 1 N1 N2 ONE X LDX Z LDZ TRIGS IFAC 5 SW 0 IERR IF IERR NE 0 THEN PRINT ROUTINE RETURN WITH ERROR COD DJ I E i E y y STOP END IF WRITE out of place forward FFT of X WRITE Z DO I 0 N1 2 WRITE 2 A1 F4 1 A1 F4 1 A1 2x REAL Z I J AIMAG Z I J J 0 N2 1 END DO WRITE Compute in place forward linear FFT X must be large enough to store N1 2 1 complex value
61. O SCALE ERR X PRINT ROUTINE RET STOP 82 URN WITH Sun Performance Library User s Guide May 2003 I J IFAC 128 4 2 K LDX1 LDZ1 NCPUS N1 LDZ1 N1 SCALE ON SW LW 2 N1 NCPUS WRITE ALLOCATE SW LW X RESHAPE SOURCE 1 2 3 4 S 1 95 1 22 4 01 3 1 4 1257 ho 6 5 1 7 1 2 2 0 SHAPE LDX1 N2 DAN WRITE X DO I 0 N1 1 WRITE 5 A1 F5 1 A1 F5 1 A1 2X S AIMAG X I J J LDX1 ERROR CODE 0 Y E N1 X 0 LDX1 1 0 N2 1 WRITE Linear complex to complex FFT of one or more sequences REAL X I J oP N2 1 intialize trig table and compute factors of N1 LDY1 TRIGS IFAC CODE EXAMPLE 5 2 Linear Complex to Complex FFT Continued END IF Compute out of place forward linear FFT Let FFT routine allocate memory CALL CFFTICM 1 Nl N2 ONE X LDX1 Y LDY1 TRIGS IFAC SW 0 IERR IF IERR NE 0 THE PRINT ROUTINE RETURN WITH ERROR CODE IERR STOP END IF Compute in place forward linear FFT LDZ1 must equal LDX1 CALL CFFTCM 1 Nl N2 ONE Z LDX1 Z LDZ1 TRIGS S IFAC SW 0 IERR WRITE in place forward
62. Operating Environment on page 39 Because the sunperf mod file is compiled with dalign any code that contains the USE SUNPERF statement must be compiled with dalign The following error occurs if the code is not compiled with dalign use sunperf A test_code f Line 2 Column 11 ERROR Procedure SUNPERF and this compilation must both be compiled with a dalign or without a dalign 28 Sun Performance Library User s Guide May 2003 Optional Arguments Sun Performance Library routines support Fortran 95 optional arguments where argument values that can be inferred from other arguments can be omitted For example the SAXPY routine is defined as follows in the man page SUBROUTINE SAXPY N ALPHA X INCX Y INCY REAL ALPHA I R TEGER INCX INCY N EAL X Y The N INCX and INCY arguments are optional Note the square bracket notation in the man pages that denotes the optional arguments Suppose the user tries to call the SAXPY routine with the following arguments UNPERE EX ALPHA X 100 Y 100 XA 100 100 RALPHA ER INCX INCY If mismatches in the type shape or number of arguments occur the compiler would issue the following error message ERROR No specific match can be found for the generic subprogram call AXPY Using the arguments defined above the following examples show incorrect calls to
63. PGVD suggested Computes all the eigenvalues and eigenvectors of a real generalized symmetric definite eigenproblem where the coefficient matrices are in packed storage simple driver Computes selected eigenvalues and eigenvectors of a real generalized symmetric definite eigenproblem where the coefficient matrices are in packed storage expert driver Improves the computed solution to a system of linear equations when the coefficient matrix is symmetric indefinite in packed storage Computes the solution to a system of linear equations where the coefficient matrix is a symmetric matrix in packed storage simple driver Uses the diagonal pivoting factorization to compute the solution to a system of linear equations where the coefficient matrix is a symmetric matrix in packed storage expert driver Reduces a real symmetric matrix stored in packed form to real symmetric tridiagonal form using an orthogonal similarity transform Computes the factorization of a symmetric packed matrix using the Bunch Kaufman diagonal pivoting method Computes the inverse of a symmetric indefinite matrix in packed storage using the factorization computed by xSPTRF Solves a system of linear equations by the symmetric matrix stored in packed format using the factorization computed by xSPTRF Real Symmetric Tridiagonal Matrix SSTEBZ or DSTEBZ XSTEDC XSTEGR XSTEIN XSTEQR SSTERF or DSTERF SSTEV or DSTEV Computes the eigenvalu
64. Pages 1 If you are using the C shell edit your home cshrc file If you are using the Bourne shell or Korn shell edit your home profile file 2 Add the following to your MANPATH environment variable opt SUNWspro man 14 Sun Performance Library User s Guide May 2003 Accessing Compiler Collection Documentation You can access the documentation at the following locations m The documentation is available from the documentation index that is installed with the software on your local system or network at file opt SUNWspro docs index html If your software is not installed in the opt directory ask your system administrator for the equivalent path on your system a Most manuals are available from the docs sun com web site The following titles are available through your installed software only Standard C Library Class Reference Standard C Library User s Guide Tools h Class Library Reference Tools h User s Guide m The release notes are available from the docs sun com web site The docs sun com web site http docs sun com enables you to read print and buy Sun Microsystems manuals through the Internet If you cannot find a manual see the documentation index that is installed with the software on your local system or network Note Sun is not responsible for the availability of third party web sites mentioned in this document and does not endorse and is not responsible
65. RR N1 SCALE X Y TRIGS IFAC WORK LWORK ERR LDY1 TRIGS IFAC LDY1 TRIGS IFAC N1 SCALE X Y TRIGS IFAC WORK LWORK ERR LDY1 TRIGS IFAC LDY1 TRIGS IFAC LDY1 TRIGS IFAC LDY1 TRIGS IFAC LDX2 Y LDY1 LDY2 LDX2 Y LDY1 LDY2 LDX2 Y LDY1 LDY2 Sun Performance Library FFT routines use the following arguments m OPT Flag indicating whether the routine is called to initialize or to compute the transform Sun Performance Library User s Guide May 2003 N1 N2 N3 Problem dimensions for one two and three dimensional transforms X Input array where X is of type COMPLEX if the routine is a complex to complex transform or a complex to real tranform X is of type REAL for a real to complex transform Y Output array where Y is of type COMPLEX if the routine is a complex to complex transform or a real to complex tranform Y is of type REAL for a complex to real transform LDX1 LDX2 and LDY1 LDY2 LDX1 and LDX2 are the leading dimensions of the input array and LDY1 and LDY2 are the leading dimensions of the output array The FFT routines allow the output to overwrite the input which is an in place transform or to be stored in a separate array apart from the input array which is an out of place transform In complex to complex tranforms the input data is of the same size as the output data However real to complex and complex to real transfor
66. S AE E 65 ON 2 1 4087 197 OSG 1 0 3 1 2 2 0 SHAPE LDR1 LDR2 N3 WRITE X DO K 1 3 DO I gt NL WRITE 5 F5 1 2X X I d K J 1 N2 END DO WRITE END DO Initialize trig table and get factors of N1 N2 and N3 CALL SFFTC3 0 N1 N2 N3 ONE X LDX1 LDX2 Y LDY1 LDY2 TRIGS IFAC SW 0 IERR Compute 3 dimensional out of place forward FFT Let FFT routine allocate memory cannot do an in place transform because LDX1 lt 2 N1 2 1 CALL SFFTC3 1 N1 N2 N3 ONE X LDX1 LDX2 Y LDY1 LDY2 TRIGS IFAC SW 0 IERR WRIT out of place forward FFT of X WRIT DO i ae 3 1 N1 2 1 BAe CO AL F521 Al F5 1 A1 2x REAL Y I d K AIMAG Y I J K J 1 N2 92 Sun Performance Library User s Guide May 2003 CODE EXAMPLE 5 4 Three Dimensional Real to Complex FFT and Complex to Real FFT of a Three Dimensional Array Continued FF F F F S Compute 3 dimensional in place forward FFT Use workspace already allocated V which is real array containing input data is also used to store complex results as a complex array its first leading dimension is LDR1 2 CAL L SFFTC3 1 N1 N2 N3 ONE V LDR1 LDR2 V LDR1 2 LDR2 TRIGS IFAC SW LW IERR WRITE CAL in place forward FFT of X L PRINT_REAL_AS COMPLEX N1 2 1 N2
67. SFFTCM expects EX the leading dimension of X as an output array must be as if X were complex Since the leading dimension of real array X is LDX 2 x LDC the leading dimension of X as a complex output array must be LDC Chapter 5 Using Su n Performance Library Signal Processing Routines 81 Similarly in the in place inverse transform CFFTSM is called with complex array Z as the input and output Because CFFTSM expects the output array to be of type RI EAL the leading dimension of Z as an output array must be as if Z were real Since the leading dimension of complex array Z is LDz the leading dimension of Z as a real output array must be LDZ x 2 CODE EXAMPLE 5 2 shows how to compute the linear complex to complex FFT of a set of sequences CODE EXAMPLE 5 2 my_system cat testccm f PROGRAM TESTCC IMPLICIT NONE INTEGER DXI LDY1 Linear Complex to Complex FFT LW IERR USI INTEGER PARA PARAMETER COMPLEX Zz 0 5 Y 0 1 REAL DIMENSION get number of threads NCPUS workspace size G_THRI ETER USING_THR O REAL TRIGS 2 N1 r EADS END DO WRITE CALL CFFTCM 0 N1 N2 sw LW II IF IERR NE 0 THEN EADS N2 N1 Ni 3 LDY1 Ta E 1 0 LDZ1 1 0 N2 1 LDY1 1 0 N2 1 ALLOCATABLI C4
68. SGEBRD or DGEBRD Generates the orthogonal transformation matrix reduced to Hessenberg form as determined by SGEHRD or DGEHRD Generates an orthogonal matrix Q from an LQ factorization as returned by SGELQF or DGELOQF Generates an orthogonal matrix Q from a QL factorization as returned by SGEQOLF or DGEQOLF Generates an orthogonal matrix Q from a QR factorization as returned by SGEQORF or DGEORF Generates orthogonal matrix Q from an RQ factorization as returned by SGERQF or DGEROF Generates an orthogonal matrix reduced to tridiagonal form by SSYTRD or DSYTRD Multiplies a general matrix with the orthogonal matrix reduced to bidiagonal form as determined by SGEBRD or DGEBRD Multiplies a general matrix by the orthogonal matrix reduced to Hessenberg form by SGEHRD or DGEHRD Multiplies a general matrix by the orthogonal matrix from an LQ factorization as returned by SGELOF or DGELQF Multiplies a general matrix by the orthogonal matrix from a QL factorization as returned by SGEQLF or DGEQLF Multiplies a general matrix by the orthogonal matrix from a QR factorization as returned by SGEQRF or DGEQRF Multiplies a general matrix by the orthogonal matrix returned by STZRZF or DT ZRZF Multiplies a general matrix by the orthogonal matrix from an RQ factorization returned by SGERQF or DGEROQF Multiplies a general matrix by the orthogonal matrix from an RZ factorization as returned by STZRZF or DTZRZF
69. Sun Microsystems Inc 4150 Network Circle Santa Clara CA 95054 U S A 650 960 1300 Part No 817 0935 10 May 2003 Revision A Send comments about this document to docfeedback sun com sS amp Sun microsystems Sun Performance Library User s Guide Sun ONE Studio 8 Copyright 2003 Sun Microsystems Inc 4150 Network Circle Santa Clara California 95054 U S A All rights reserved U S Government Rights Commercial software Government users are subject to the Sun Microsystems Inc standard license agreement and applicable provisions of the FAR and its supplements This distribution may include materials developed by third parties Third party software including font technology is copyrighted and licensed from Sun suppliers Portions of this product are derived in part from Cray90 a product of Cray Research Inc libdwarf and libredblack are Copyright 2000 Silicon Graphics Inc and are available under the GNU Lesser General Public License from http www sgi com Parts of the piven may be derived from Berkeley BSD systems licensed from the University of California UNIX is a registered trademark in the U S and in other countries exclusively licensed through X Open Company Ltd Sun Sun Microsystems the Sun logo Java Sun ONE Studio the Solaris logo and the Sun ONE logo are trademarks or registered trademarks of Sun Microsystems Inc in the U S and other countries Netscape and Netscape Navigator are tr
70. VBR block compressed sparse row ZZ MM matrix matrix product SM solution of triangular system supported for all formats except COO RP right permutation for JAD format only Chapter 4 Working With Matrices 59 Sparse Solver Routines The Sun Performance Library sparse solver package contains the routines listed in TABLE 4 3 TABLE 4 3 Sparse Solver Routines Routine Function DGSSES One call interface to sparse solver DGSSIN Sparse solver initialization DGSSOR Fill reducing ordering and symbolic factorization DGSSFA Matrix value input and numeric factorization DGSSSL Triangular solve Utility Routine Function DGSSUO Sets user specified ordering permutation DGSSRP Returns permutation used by solver DGSSCO Returns condition number estimate of coefficient matrix DGSSDA De allocates sparse solver DGSSPS Prints solver statistics Use the regular interface to solve multiple matrices with the same structure but different numerical values as shown below call dgssin initialization input coefficient matrix structure call dgssor fill reducing ordering symbolic factorization do m 1 number_of_structurally_identical_matrices call dgssfa input coefficient matrix values numeric factorization do r 1 number_of_right_hand_sides call dgsssl triangular solve enddo enddo The one call interface is not as flexible as the regular interface bu
71. VCOR to perform FFT convolution of two complex vectors CODE EXAMPLE 5 10 One Dimensional Convolution Using Fourier Transform Method and COMPLEX Data my_system cat con_ex20 f PROGRAM TEST G INTEGER LWORK INTEGER PARAMETER N 3 PARAMETER LWORK 4 N 15 COMPLEX P1 N P2 N P3 2 N 1 DATA Pl 1 2 3 P2 4 5 6 C EXTERNAL CCNVCOR C PRINT P1 PRINT 1000 P1 PRINT P2 PRINT 1000 P2 WORK LWORK Chapter 5 Using Sun Performance Library Signal Processing Routines 115 116 CODE EXAMPLE 5 10 One Dimensional Convolution Using Fourier Transform Method and COMPLEX Data Continued CALL CCNVCOR V T N Pl 1 1 N 0 1 P2 1 1 1 2 N 1 1 P3 1 1 1 WORK LWORK PRINT P3 PRINT 1000 P3 C 1000 FORMAT 1X 100 F4 1 F4 1 i END my_system 95 dalign con_ex20 f xlic_lib sunperf my_system a out PAs TsO EOST 2760 ee ORO 3 00 04 02 P2 4 0 0 02 5 0 0 01 6 0 0 01 P33 4 0 0 01 13 0 0 01 28 0 0 01 27 0 0 01 18 0 0 01 If any vector overlaps a writable vector either because of argument aliasing or ill chosen values of the various INC arguments the results are undefined and can vary from one run to the next The most common form of the computation and the case that executes fastest is applying a filter vec
72. WB IROW ICOL NDIAG AB IROWB ICOL AG IROW ICOL END DO END DO Note that this method of storing banded matrices is compatible with the storage method used by LAPACK BLAS and LINPACK but is inconsistent with the method used by EISPACK Packed Storage A packed vector is an alternate representation for a triangular symmetric or Hermitian matrix An array is packed into a vector by storing the elements sequentially column by column into the vector Space for the diagonal elements is always reserved even if the values of the diagonal elements are known such as in a unit diagonal matrix 50 Sun Performance Library User s Guide May 2003 An upper triangular matrix or a symmetric matrix whose upper triangle is stored in general storage in the array A can be transferred to packed storage in the array AP as shown below This code comes from the comment block of the LAPACK routine DEPTRL JC 1 DO J 1 N DO I 1 J AP JC I 1 A I J END DO JG UG E END DO Similarly a lower triangular matrix or a symmetric matrix whose lower triangle is stored in general storage in the array A can be transferred to packed storage in the array AP as shown below JC 1 DO J 1 N DO I S J N AP JC I 1 A I J END DO JC JC N J 1 END DO Matrix Types The general matrix form is the most common matrix and most operations performed by the Sun Performance Libra
73. actorization of a general rectangular matrix XGELS Computes the least squares solution to an over determined system of linear equations using a QR or LQ factorization of A XGELSD Computes the least squares solution to an over determined system of linear equations using a divide and conquer method using a QR or LQ factorization of A XGELSS Computes the minimum norm solution to a linear least squares problem by using the SVD of a general rectangular matrix simple driver XGELSX Depreciated routine replaced by xSELSY XGELSY Computes the minimum norm solution to a linear least squares problem using a complete orthogonal factorization xGEQLF P Computes QL factorization of a general rectangular matrix XGEQP3 Computes OR factorization of general rectangular matrix using Level 3 BLAS XGEQPF Depreciated routine replaced by xGEQP3 xXGEQRF P Computes OR factorization of a general rectangular matrix XGERE S Refines solution to a system of linear equations XGEROF P Computes RQ factorization of a general rectangular matrix XGESDD Computes SVD of general rectangular matrix using a divide and conquer method XGESV Solves a general system of linear equations simple driver Appendix A Sun Performance Library Routines 131 TABLE A 1 Routine LAPACK Linear Algebra Package Routines Continued Function XGESVX XGESVD XGETRF P XGETRI XGETRS P Solves a general system of linear equations expert driver Comp
74. ademarks or registered trademarks of Netscape Communications Corporation in the United States and other countries All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International Inc in the U S and other countries Products bearing SPARC trademarks are based upon architecture developed by Sun Microsystems Inc Products covered by and information contained in this service manual are controlled by U S Export Control laws and may be subject to the export or import laws in other countries Nuclear missile chemical biological weapons or nuclear maritime end uses or end users whether direct or indirect are strictly prohibited Export or reexport to countries subject to U S embargo or to entities identified on U S export exclusion lists including but not limited to the denied persons and specially designated nationals lists is strictly prohibited DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS REPRESENTATIONS AND WARRANTIES INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT ARE DISCLAIMED EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID Copyright 2003 Sun Microsystems Inc 4150 Network Circle Santa Clara California 95054 Etats Unis Tous droits reserves Droits du gouvernement americain utlisateurs gouvernmentaux logiciel commercial Les utilisateurs gouvernmentaux sont soumis au co
75. aining such calls users of MP platforms can get additional performance by parallelizing these loops C Interfaces The Sun Performance Library routines can be called from within a FORTRAN 77 Fortran 95 or C program However C programs must still use the FORTRAN 77 calling sequence Sun Performance Library contains native C interfaces for each of the routines contained in LAPACK BLAS FFTPACK VFFTPACK and LINPACK The Sun Performance Library C interfaces have the following features m Function names have C names Function interfaces follow C conventions m C functions do not contain redundant or unnecessary arguments for a C function The following example compares the standard LAPACK Fortran interface and the Sun Performance Library C interfaces for the DGBCON routine CALL DGBCON NORM N NSUB NSUPER DA LDA IPIVOT DANORM DRCOND DWORK IWORK2 INFO void dgbcon char norm int n int nsub int nsuper double da int lda int ipivot double danorm double drcond int L n o Note that the names of the arguments are the same and that arguments with the same name have the same base type Scalar arguments that are used only as input values such as NORM and N are passed by value in the C version Arrays and scalars that will be used to return values are passed by reference The Sun Performance Library C interfaces improve on CLAPACK available on Netlib which is an 2c translation of the standard libraries F
76. as idamax would also return a 1 to indicate the first element of a vector This convention is observed in function return values permutation vectors and anywhere else that vector or array indices are used Note Some Sun Performance Library routines use malloc internally so user codes that make calls to Sun Performance Library and to sbrk might not work correctly Sun Performance Library uses global integer registers g2 g3 and g4 in 32 bit mode and g2 through g5 in 64 bit mode as scratch registers User code should not use these registers for temporary storage and then call a Sun Performance Library routine The data will be overwritten when the Sun Performance Library routine uses these registers Sun Performance Library User s Guide May 2003 C Examples Transforming user written code sequences into calls to Sun Performance Library routines increases application performance The following code example adapted from LAPACK shows one example int i float a n b n largest largest a 0 for i 0 i lt n itt if a i gt largest largest a i if b i gt largest largest b i No Sun Performance Library routine exactly replicates the functionality of this code example However the code can be accelerated by replacing it with several calls to the Sun Performance Library routine isamax as shown in the following code example int i large_index float a n b n largest larg
77. at process triangular arrays A triangular matrix can be stored using packed storage a 0 0 ay A Any O az a31 437 433 31 ajj a32 433 Triangular Matrix Triangular Array in Packed Storage 52 Sun Performance Library User s Guide May 2003 A triangular banded matrix can be stored using banded storage as shown below Elements shown with the symbol x are never accessed by routines that process banded arrays ay 0 0 411 A22 933 aj ay 0 1 d3 X 0 az 433 Triangular Banded Matrix Triangular Banded Array in Banded Storage Symmetric Matrices A symmetric matrix is similar to a triangular matrix in that the data in either the upper or lower triangle corresponds to the elements of the array The contents of the other elements in the array are assumed and those array elements are never accessed by routines that process symmetric or Hermitian arrays A symmetric matrix can be stored using packed storage 411 412 443 1 41 4n 493 21 431 437 433 31 479 439 433 Symmetric Matrix Symmetric Array in Packed Storage Chapter 4 Working With Matrices 53 A symmetric banded matrix can be stored using banded storage as shown below Elements shown with the symbol x are never accessed by routines that process banded arrays 41 4 7 0 0 X aiz a23 A354 azi an an3 0 a11 422 433 444 0 a33 433 a34 az 439 Ay X 0 O a43 a44 Symmetric Banded Matrix Symmetric Banded Array in Banded Storage Tridiagonal M
78. ation Routines 115 IBLAS Prefixes and Matrix Types 122 Vector Reductions 124 TABLE 6 3 TABLE 6 4 TABLE 6 5 TABLE 6 6 TABLE 6 7 TABLE 6 8 TABLE 6 9 TABLE 6 10 TABLE 6 11 TABLE 6 12 TABLE A 1 TABLE A 2 TABLE A 3 TABLE A 4 TABLE A 5 TABLE A 6 TABLE A 7 TABLE A 8 TABLE A 9 TABLE A 10 TABLE A 11 TABLE A 12 Add or Cancel Vectors 125 Vector Movements 125 Matrix Vector Operations 125 O n2 Matrix Operations 126 O n3 Matrix Operations 126 Matrix Movements 126 Vector Set Operations 127 Matrix Set Operations 127 Vector Utilities 127 Matrix Utilities 128 LAPACK Linear Algebra Package Routines 130 BLAS1 Basic Linear Algebra Subprograms Level 1 Routines 145 BLAS2 Basic Linear Algebra Subprograms Level 2 Routines 146 BLAS3 Basic Linear Algebra Subprograms Level 3 Routines 147 Sparse BLAS Routines 147 Sparse Solver Routines 149 FFT Routines 150 Sine and Cosine Transform Routines 152 Convolution and Correlation Routines 153 Convolution and Correlation Routines 153 Interval BLAS Routines 154 Sort Routines 160 8 Sun Performance Library User s Guide May 2003 Before You Begin This book describes how to use the Sun specific extensions and features included with the Sun Performance Library subroutines that are supported by the Sun Open Net Environment Sun ONE Studio Compiler Collection Fortran 95 and C compilers TM Who Should Use Thi
79. atrices A tridiagonal matrix has elements only on the main diagonal the first superdiagonal and the first subdiagonal It is stored using three 1 dimensional arrays aap 0 0 ay a a 21 12 az an 493 0 ay a a 32 23 O a3 433 a34 a33 a a 43 34 0 0 ap a44 as Tridiagonal Matrix Tridiagonal Array in Tridiagonal Storage 54 Sparse Matrices The Sun Performance Library sparse solver package is a collection of routines that efficiently factor and solve sparse linear systems of equations Use the sparse solver package to m Solve symmetric structurally symmetric and unsymmetric coefficient matrices Sun Performance Library User s Guide May 2003 m Specify a choice of ordering methods including user specified orderings The sparse solver package contains interfaces for FORTRAN 77 Fortran 95 and C interfaces are not currently provided To use the sparse solver routines from Fortran 95 use the FORTRAN 77 interfaces To use the sparse solver routines with C append an underscore to the routine name dgssin_ dgssor_ and so on pass arguments by reference and use 1 based array indexing Sparse Solver Matrix Data Formats Sparse matrices are usually represented in formats that minimize storage requirements By taking advantage of the sparsity and not storing zeros considerable storage space can be saved The storage format used by the general sparse solver is the compressed sparse column CSC format also called the Harw
80. atrix in packed storage Solves a linear system in a symmetric or Hermitian positive definite matrix in packed storage simple driver Solves a linear system in a symmetric or Hermitian positive definite matrix in packed storage expert driver Computes Cholesky factorization of a symmetric or Hermitian positive definite matrix in packed storage Computes the inverse of a symmetric or Hermitian positive definite matrix in packed storage using the Cholesky factorization returned by XPPTRF Solves a symmetric or Hermitian positive definite system of linear equations where the coefficient matrix is in packed storage using the Cholesky factorization returned by xPPTRF Symmetric or Hermitian Positive Definite Tridiagonal Matrix xPTCON XPTEQR XPTRES xPTSV xXPTSVX xXPTTRE XPTTRS P Estimates the reciprocal of the condition number of a symmetric or Hermitian positive definite tridiagonal matrix using the Cholesky factorization returned by xPTTRF Computes all eigenvectors and eigenvalues of a real symmetric or Hermitian positive definite system of linear equations Refines solution to a symmetric or Hermitian positive definite tridiagonal system of linear equations Solves a symmetric or Hermitian positive definite tridiagonal system of linear equations simple driver Solves a symmetric or Hermitian positive definite tridiagonal system of linear equations expert driver Computes the LDL factorization
81. available from NetLib Intervals Intervals have a dual identity as intervals of numbers and as sets of numbers The empty interval contains no members and is the same as the empty set in the theory of sets In computer input and output the empty interval is denoted 121 122 empty For more information on intrinsic Fortran 95 compiler support for interval data types see the Fortran 95 Interval Arithmetic Programming Reference and the interval white papers referenced therein IBLAS Routine Names This section summarizes IBLAS naming conventions derived from the BLAS specification Language Bindings on page 124 contains a list of IBLAS routine names organized into the following groups For the corresponding detailed Fortran language bindings see the IBLAS man pages or the IBLAS white paper As in the BLAS mathematical operations and routines are grouped into m Vector Operations Tables listed in TABLE 6 2 through TABLE 6 4 m Matrix Vector Operations Table listed in TABLE 6 5 m Matrix Operations Tables listed in TABLE 6 6 through TABLE 6 8 New interval specific routines are grouped into Set Operations on Vectors listed in TABLE 6 9 Set Operations on Matrices listed in TABLE 6 10 Utility Functions of Vectors listed in TABLE 6 11 Utility Functions of Matrices listed in TABLE 6 12 Naming Conventions Except that the suffix _I or _i is added IBLAS routines are named the same as the corresponding BLAS rou
82. ble to access the compilers and tools To Determine Whether You Need to Set Your PATH Environment Variable Display the current value of the PATH variable by typing the following at a command prompt echo PATH Review the output to find a string of paths that contain opt SUNWspro bin If you find the path your PATH variable is already set to access the compilers and tools If you do not find the path set your PATH environment variable by following the instructions in the next procedure Before You Begin 13 v To Set Your PATH Environment Variable to Enable Access to the Compilers and Tools 1 If you are using the C shell edit your home cshrc file If you are using the Bourne shell or Korn shell edit your home profile file 2 Add the following to your PATH environment variable opt SUNWspro bin Accessing the Man Pages Use the following steps to determine whether you need to change your MANPATH variable to access the man pages v To Determine Whether You Need to Set Your MANPATH Environment Variable 1 Request the dbx man page by typing the following at a command prompt man dbx 2 Review the output if any If the dbx 1 man page cannot be found or if the man page displayed is not for the current version of the software installed follow the instructions in the next procedure for setting your MANPATH environment variable v To Set Your MANPATH Environment Variable to Enable Access to the Man
83. brary User s Guide May 2003 Matrix Types 51 General Matrices 51 Triangular Matrices 52 Symmetric Matrices 53 Tridiagonal Matrices 54 Sparse Matrices 54 Sparse Solver Matrix Data Formats 55 Sun Performance Library Sparse BLAS 57 Naming Conventions 57 Sparse Solver Routines 60 Routine Calling Order 61 Sparse Solver Examples 62 References 72 Using Sun Performance Library Signal Processing Routines 73 Forward and Inverse FFT Routines 74 Linear FFT Routines 76 Two Dimensional FFT Routines 84 Three Dimensional FFT Routines 89 Comments 95 Cosine and Sine Transforms 96 Fast Cosine and Sine Transform Routines 97 Fast Cosine Transforms 98 Fast Sine Transforms 99 Discrete Fast Cosine and Sine Transforms and Their Inverse 99 Fast Cosine Transform Examples 104 Fast Sine Transform Examples 106 Convolution and Correlation 108 Convolution 108 Contents 5 Correlation 109 Sun Performance Library Convolution and Correlation Routines 110 Arguments for Convolution and Correlation Routines 111 Work Array WORK for Convolution and Correlation Routines 113 Sample Program Convolution 115 References 120 6 Interval BLAS Routines 121 Introduction 121 Intervals 121 IBLAS Routine Names 122 Naming Conventions 122 Fortran Interface 123 Binding Format 124 Language Bindings 124 References 128 A Sun Performance Library Routines 129 LAPACK Routines 130 BLAS Routines 145 BLAS2 Routines 146 BLAS3 Routines 147 Sparse BLAS Routines
84. by CGELOF or ZGEL F Generates a unitary matrix Q from a QL factorization as returned by CGEQLF or ZGEQLF Generates a unitary matrix Q from a OR factorization as returned by CGEQRF or ZGEQRF Generates a unitary matrix Q from an RQ factorization as returned by CGEROF or ZGEROF Generates a unitary matrix reduced to tridiagonal form by CHETRD or ZHETRD Appendix A Sun Performance Library Routines 143 TABLE A 1 Ro utine LAPACK Linear Algebra Package Routines Continued Function CU ZU CU ZU Z BR or BR Z HR or HR LQ P or LQ P QL P or QL P QR P or QR P RQ P or RQ P RZ or RZ 22 22 22424 2422 ZZ 224 NMTR or NMTR Multiplies a general matrix with the unitary transformation matrix reduced to bidiagonal form as determined by CGEBRD or ZGEBRD Multiplies a general matrix by the unitary matrix reduced to Hessenberg form by CGEHRD or ZGEHRD Multiplies a general matrix by the unitary matrix from an LQ factorization as returned by CGELQF or ZGELQF Multiplies a general matrix by the unitary matrix from a QL factorization as returned by CGEQLF or ZGEQLF Multiplies a general matrix by the unitary matrix from a QR factorization as returned by CGEQRF or ZGEQRF Multiplies a general matrix by the unitary matrix from an RQ factorization as returned by CGERQF or ZGERQF Multiplies a general matrix by the unitary matrix from an RZ facto
85. ces is computed as Fori 0 M 1 N 1 at ee Tlie nn 2k 1 X i k fons sa noH k 0 N 1 n 1 V D COSQF Notes a The input and output sequences are stored row wise m The transform is normalized so that if the inverse routine V D COSQB is called immediately after calling V D COSQF the original data is obtained V D COSQB Inverse FCT of One or More Quarter Wave Even Sequences The inverse FCT of one or more quarter wave even sequences is computed as Fori 0 M 1 N 1 gt tn 2k 1 _ x i n a Koos ZED n 0 N 1 V D COSQB Notes a The input and output sequences are stored row wise m The transform is normalized so that if V D COSQB is called immediately after calling V D COSOQF the original data is obtained D SINT Forward and Inverse Fast Sine Transform FST of a Sequence The forward and inverse FST of a sequence is computed as N 2 XW 2 rsin TRED 0 N 2 n 0 D SINT Notes a N 1 values are needed to compute the FST of an N point sequence Chapter 5 Using Sun Performance Library Signal Processing Routines 101 102 m D SINT also computes the inverse transform When D SINT is called twice the result will be the original sequence scaled by 5 V D SINT Forward and Inverse Fast Sine Transforms of Multiple Sequences VFST The forward and inverse fast sine transforms of multiple sequences are computed as Fori 0 M 1 N 2 X i k T xli nsin ZC HDED k 0
86. characteristic of convolution is that the product of two polynomials is actually a convolution A product of an m term polynomial m 1 a x ag tayxt a _ 4x and an n term polynomial B x bot byxt b yx has m n 1 coefficients that can be obtained by min k n 1 ch L ajbk_js j max k m 1 0 where k 0 m n 2 Correlation Closely related to convolution is the correlation operation It computes the correlation of two sequences directly superposed or when one is shifted relative to the other As with convolution we can compute the correlation of two sequences efficiently as follows using the FFT Compute the FFT of the two input sequences Compute the pointwise product of the resulting transform of one sequence and the complex conjugate of the transform of the other sequence Compute the inverse FFT of the product Chapter 5 Using Sun Performance Library Signal Processing Routines 109 110 The routines in the Performance Library also allow correlation to be computed by the following definition N 1 Corr x y E hae J 0 N 1 k 0 There are various ways to interpret the sampled input data of the convolution and correlation operations The argument list of the convolution and correlation routines contain parameters to handle cases in which a The signal and or response function can start at different sampling time m The user might want only part of the signal to contrib
87. ciently when N1 N2 N3 2 x31x4 x5 x7 x11 x13 where p q r s t u and v are integers and p q r s t u v 2 0 Chapter 5 Using Sun Performance Library Signal Processing Routines 95 The function xFFTOPT can be used to determine the optimal sequence length as shown in CODE EXAMPLE 5 5 CODE EXAMPLE 5 5 REF TOPT Example my_system cat fft_ex01 f PROGRAM TEST INTEGER N N1 N2 N3 RFFTOPT C N 1024 N1 1019 N2 71 N3 49 PRINT N Original N Suggested PRINT I5 112 N RFFTOPT N PRINT I5 112 N1 RFFTOPT N1 PRINT I5 112 N2 RFFTOPT N2 PRINT I5 112 N3 RFFTOPT N3 END my_system 95 dalign fft_ex01 f xlic_lib sunperf my_system a out N Original N Suggested 1024 1024 1019 1024 71 72 49 49 96 Cosine and Sine Transforms Input to the DFT that possess special symmetries occur in various applications A transform that exploits symmetry usually saves in storage and computational count such as with the real to complex and complex to real FFT transforms The Sun Performance Library cosine and sine transforms are special cases of FFT routines that take advantage of the symmetry properties found in even and odd functions Sun Performance Library User s Guide May 2003 Note Sun Performance Library sine and cosine transform routines are based on theroutines contained in FFTPACK
88. ctor Y 4 5 6 Output vector Z 4 Ba Zoe 220 9 UB Chapter 5 Using Sun Performance Library Signal Processing Routines 117 118 Making the output vector longer than the input vectors as in the example above implicitly adds zeros to the end of the input No zeros are actually required in any of the vectors and none are used in the example but the padding provided by the implied zeros has the effect of an end off shift rather than an end around shift of the input vectors CODE EXAMPLE 5 12 will compute the product between the vector 1 2 3 and the circulant matrix defined by the initial column vector 4 5 6 CODE EXAMPLE 5 12 Convolution Used to Compute the Product of a Vector and Circulant Matrix my_system cat con_ex22 f PROGRAM TEST G INTEGER WORK NX NY NZ PARAMETER NX 3 PARAMETER NY NX PARAMETER NZ NY PARAMETER LWORK 4 NZ 32 REAL X NX Y NY Z NZ WORK LWORK C DATA X 1 2 3 Y 4 5 6 WORK LWORK 0O G PRINT 1000 X PRINT 1010 X PRINT 1000 Y PRINT 1010 Y CALL SCNVCOR C V T NX X l L SNY Og lan Yar dep Ay Ez NZ Dy Ape hy y Ay SWORK LWORK PRINT 1020 2 PRINT 1010 Z C 1000 FORMAT 1X Input vector Al 1010 FORMAT 1X 300F5 0 1020 FORMAT 1X Output vector A1 END my_system 95 dalign con_ex22 f xlic_lib sunperf my_system a out
89. d TABLE A 5 Sparse BLAS Routines Routines Function XAXPYI Adds a scalar multiple of a sparse vector X to a full vector Y XBCOMM P Block coordinate matrix matrix multiply XBDIMM P Block diagonal format matrix matrix multiply XBDISM P Block Diagonal format triangular solve XBELMM P Block Ellpack format matrix matrix multiply XBELSM P Block Ellpack format triangular solve XBSCMM P Block compressed sparse column format matrix matrix multiply Appendix A Sun Performance Library Routines 147 TABLE A 5 Sparse BLAS Routines Continued Routines Function XBSCSM P Block compressed sparse column format triangular solve XBSRMM P Block compressed sparse row format matrix matrix multiply XBSRSM P Block compressed sparse row format triangular solve xCOOMM P Coordinate format matrix matrix multiply xCSCMM P Compressed sparse column format matrix matrix multiply xCSCSM P Compressed sparse column format triangular solve xCSRMM P Compressed sparse row format matrix matrix multiply xCSRSM P Compressed sparse row format triangular solve xDIAMM P Diagonal format matrix matrix multiply xDIASM P Diagonal format triangular solve SDOTI DDOTI CDOTUI or ZDOTUI CDOTCI or ZDOTCI XELL XELLS XCGTH XCGTH xJAD SJADR xJADS SROTI XCSCT XSKYM XSKYS XVBRM R P R XVBRS P P Z P or DJADRP P or
90. ded with those libraries Sun Performance Library User s Guide May 2003 For example Netlib provides a CLAPACK library but the CLAPACK interfaces differ from the C interfaces included with Sun Performance Library A LAPACK 90 library package is also available on Netlib The LAPACK 90 library contains interfaces that differ from the Sun Performance Library Fortran 95 interfaces and the Netlib LAPACK version 3 0 interfaces If using LAPACK 90 refer to the documentation provided with that library For the base libraries supported by Sun Performance Library Netlib provides detailed information that can supplement this user s guide The LAPACK 3 0 Users Guide describes LAPACK algorithms and how to use the routines but it does not describe the Sun Performance Library extensions made to the base routines Sun Performance Library Features Sun Performance Library routines can increase application performance on both serial and MP platforms because the serial speed of many Sun Performance Library routines has been increased and many routines have been parallelized that might be serial in other products Sun Performance Library routines also have SPARC specific optimizations that are not present in the base Netlib libraries Sun Performance Library provides the following optimizations and extensions to the base Netlib libraries m Extensions that support Fortran 95 and C language interfaces m Fortran 95 language features including
91. e Scale a vector Swap two vectors Compute scaled product of complex vectors Appendix A Sun Performance Library Routines 145 BLAS2 Routines TABLE A 3 lists the Sun Performance Library BLAS2 routines P denotes routines that are parallelized TABLE A 3 BLAS2 Basic Linear Algebra Subprograms Level 2 Routines Routine Function xGBMV Product of a matrix in banded storage and a vector xGEMV P Product of a general matrix and a vector SGER P DGER P Rank 1 update to a general matrix CGE CGE Q T El Q T al SSB SSP SSP SSP SSY SSY SSY RC P ZGERC P RU P ZGERU P V ZHBMV v P ZHEMV P R P ZHER P R2 ZHER2 v P ZHPMV P R ZHPR R2 ZHPR2 V DSBMV v P DSPMV P R DSPR R2 P DSPR2 P v P DSYMV P R P DSYR P R2 P DSYR2 P xTBMV xTBSV xTPMV XTPSV XTRMV P XTRSV P Product of a Hermitian matrix in banded storage and a vector Product of a Hermitian matrix and a vector Rank 1 update to a Hermitian matrix Rank 2 update to a Hermitian matrix Product of a Hermitian matrix in packed storage and a vector Rank 1 update to a Hermitian matrix in packed storage Rank 2 update to a Hermitian matrix in packed storage Product of a symmetric matrix in banded storage and a vector Product of a Symmetric matrix in packed storage and a vector Rank 1 update to a real symmetric matrix in packed storage Rank 2 update to a real
92. e dimensional array CODE EXAMPLE 5 4 Three Dimensional Real to Complex FFT and Complex to Real FFT of a Three Dimensional Array my_system cat testsc3 f PROGRAM TESTSC3 IMPLICIT NONE INTEGER LW NCPUS IERR I J K USING_THREADS IFAC 128 3 INTEGER PARAMETER N1 3 N2 4 N3 2 LDX1 N1 LDX2 N2 LDY1 N1 2 1 LDY2 N2 LDR1 2 N1 2 1 LDR2 N2 REAL PARAMETER ONE 1 0 SCALE ONE N1 N2 N3 Chapter 5 Using Sun Performance Library Signal Processing Routines 91 CODE EXAMPLE 5 4 Three Dimensional Real to Complex FFT and Complex to Real FFT of a Three Dimensional Array Continued NCP REAL V LDR1 LDR2 N3 X LDX1 LDX2 N3 Z LDR1 LDR2 N3 TRIGS 2 N1 N2 N3 REAL DIMENSION ALLOCATABLE SW COMPLEX Y LDY1 LDY2 N3 WRITE S Three dimensional complex to real and real to complex FFT WRITE get number of threads US USING_THREADS compute workspace size required LW MAX MAX N1 2 N2 2 N3 16 N3 NCPUS ALLOCATE SW LW X RESHAPE SOURCE Che eile aen AR aD Op la SOR aerd ay a Weel Vg By 2AS S E E oA E S E Op oO Sly RS S SHAPE LDX1 LDX2 N3 V RESHAPE SOURCE Velra2rasy lnr Sap pe rO ae pe Bs Oy Ee Oyla L2 Oey Be O E EE O EES ey 6 EE
93. e_index isamax n a l 1 largest a large_index large_index isamax n b 1 1 if b large_index gt largest largest b large_index Chapter 2 Using Sun Performance Library 35 36 Compare the differences between calling the native C isamax routine in Sun Performance Library shown in the previous code example with calling the isamax routine in CLAPACK shown in the following code example 1 Declare scratch variable to allow 1 to be passed by value int one l 2 Append underscore to conform to FORTRAN naming system EI 3 Pass all arguments even scalar input only by reference 4 Subtract one to convert from FORTRAN indexing conventions large_index isamax_ amp n a amp one l largest a large_index large_index isamax_ amp n b amp one l if b large_index gt largest largest b large_index Sun Performance Library User s Guide May 2003 CHAPTER 3 SPARC Optimization and Parallel Processing This chapter describes how to use compiler and linking options to optimize applications for m Specific SPARC instruction set architectures m 64 bit enabled Solaris operating environment m Parallel processing TABLE 3 1 shows a comparison of the 32 bit and 64 bit operating environments These items are described in greater detail in the following sections TABLE 3 1 Comparison of 32 bit and 64 bit Operating Environments 32 bit ILP 32 64 bit
94. ed Orlando FL Harcourt Brace amp Company 1988 Van Loan Charles Computational Frameworks for the Fast Fourier Transform Philadelphia PA SIAM 1992 Walker James S Fast Fourier Transforms Boca Raton FL CRC Press 1991 120 Sun Performance Library User s Guide May 2003 CHAPTER 6 Interval BLAS Routines Introduction This chapter provides a brief overview of an interval Fortran 95 version of the basic linear algebra subroutine BLAS library The interval BLAS version is referred to as the IBLAS library For a more complete description of the IBLAS library routines see the white paper Interval Version of the Basic Linear Algebra Subprograms IBLAS For information on the Fortran 95 interfaces and types of arguments used in each IBLAS routine see the section 3P man pages for the individual routines For example to display the man page for the SFFTC routine type man s 3P sfftc Routine names must be lowercase For more information on the non interval version of the BLAS library see the document Basic Linear Algebra Subprogram Technical BLAST Forum Standard available at http www netlib org blas blast forum Note For the compiler collection Fortran 95 IBLAS routines information contained in the Interval Version of the Basic Linear Algebra Subprograms IBLAS white paper supersedes interval information contained in the Basic Linear Algebra Subprogram Technical BLAST Forum Standard document that is
95. ell Boeing format The CSC format represents a sparse matrix with two integer arrays and one floating point array The integer arrays colptr and rowind specify the location of the nonzeros of the sparse matrix and the floating point array values is used for the nonzero values The column pointer colptr array consists of n 1 elements where colptr i points to the beginning of the ith column and colptr i 1 1 points to the end of the ith column The row indices rowind array contains the row indices of the nonzero values The values arrays contains the corresponding nonzero numerical values The following matrix data formats exist for a sparse matrix of neqns equations and nnz nonzeros m Symmetric m Structurally symmetric m Unsymmetric The most efficient data representation often depends on the specific problem The following sections show examples of sparse matrix data formats Symmetric Sparse Matrices A symmetric sparse matrix is a matrix where a i j a j i for all i and j Because of this symmetry only the lower triangular values need to be passed to the solver routines The upper triangle can be determined from the lower triangle An example of a symmetric matrix is shown below This example is derived from A George and J W H Liu Computer Solution of Large Sparse Positive Definite Systems Chapter 4 Working With Matrices 55 56 40 10 20 05 2 0 10 05 00 0 0 0 0 A 20 00 30 00 00 05 0 0 0 0 0 6
96. ents 27 29 P packed storage 50 PARALLEL environment variable 43 46 parallel processing degree of parallelism 43 examples 45 PATH environment variable setting 14 POSIX Solaris threads 45 promoting integer arguments to 64 bits 39 40 Q quarter wave even sequences fast cosine transform routines 97 quarter wave odd sequences fast sine transform routines 97 R real orthogonal matrix 136 real orthogonal matrix in packed storage 136 real symmetric band matrix 139 real symmetric tridiagonal matrix 140 regular interface 60 replacing routines 26 routines 2D FFT routines 74 85 3D FFT routines 74 90 BLAS1 145 BLAS2 146 BLAS3 147 C calling conventions 33 convolution and correlation 110 95 calling conventions 26 fast cosine transform routines 97 98 fast cosine transform routines multiple sequences 100 fast sine transform routines 97 99 FFTPACK 150 152 forward and inverse FFT 74 forward fast cosine transform routines 99 forward fast cosine transform routines multiple quarter wave even sequences 101 forward fast cosine transform routines quarter wave even sequence 100 forward fast sine transform routines 101 forward fast sine transform routines multiple quarter wave odd sequences 103 forward fast sine transform routines multiple sequences 102 forward fast sine transform routines quarter wave odd sequence 102 IBLAS add or cancel vectors routines 125 IBLAS matr
97. ents 74 complex sequences as input 76 conjugate symmetry 76 data storage format 77 forward and inverse 74 linear FFT routines 74 77 linear forward FFT 76 linear forward FFT polar form 76 linear inverse FFT 76 linear inverse FFT polar form 76 real sequences as input 76 sequence length for most efficient computation 75 95 FFTPACK 97 150 152 Fortran 95 64 bit code 40 compile time checking 27 optional arguments 27 29 type independence 27 USE SUNPERF 27 Fortran interfaces IBLAS routines 123 G general band matrix 130 general matrix 51 130 general tridiagonal matrix 132 global integer registers 34 H Hermitian band matrix 133 Hermitian matrix 133 Hermitian matrix in packed storage 134 IBLAS add or cancel vectors routines 125 95 language bindings 124 Fortran interfaces 123 matrix movements routines 126 matrix set operations routines 127 matrix utilities routines 128 matrix vector operations routines 125 O n matrix operations routines 126 O n3 matrix operations routines 126 vector movements routines 125 vector reductions routines 124 vector set operations routines 127 vector utilities routines 127 IBLAS naming conventions 122 including routines in development environment 25 interval BLAS See IBLAS intervals 121 empty interval 121 isalist 38 L LAPACK 19 130 LAPACK 90 21 LAPACK compatibility 20 22 M malloc 34 man pages section 3P
98. er does The per thread stack size must be set to at least 4 Mbytes with the STACKSIZE environment variable as follows setenv STACKSIZE 4000 Setting the STACKSIZE environment variable is not required for programs running with POSIX or Solaris threads In this case user created threads that call Sun Performance Library routines must have a stack size of at least 4 Mbytes Failure to supply an adequate stack size for the Sun Performance Library routines might result in stack overflow problems Symptoms of stack overflow problems include runtime failures that could be difficult to diagnose For more information on setting the stack size of user created threads see the pthread_create 3THR pthread_attr_init 3THR and pthread_attr_setstacksize 3THR man pages for POSIX threads or the thr_create 3THR for Solaris threads 42 Sun Performance Library User s Guide May 2003 Degree of Parallelism Sun Performance Library will attempt to parallelize each Sun Performance Library call according to the user s parallelization model by using either explicit threads or loop based compiler multithreading The number of threads Sun Performance Library routines will attempt to use is set at run time by the user with the PARALLEL environment variable The PARALLEL environment variable can be overridden by calls to the Sun Performance Library USE_THREADS routine For example if user programs with POSIX or Solaris thread code
99. er literal constants append _8 to the constant Consider the following code example INTEGER 8 N REAL 8 ALPHA X N Y N 64 SUFFIX N AND 1_8 ARE 64 BIT INTEGERS CALL DAXPY_64 N ALPHA X 1_8 Y 1_8 INTEGER 8 arguments cannot be used in a 32 bit environment Routines in the 32 bit libraries v8 v8plusa v8plusb cannot be called with 64 bit arguments However the 64 bit routines can be called with 32 bit arguments When passing constants in Fortran 95 code that have not been compiled with xtypemap append _8 to literal constants to effect the promotion For example when using Fortran 95 change CALL DSCAL 20 5 26D0 X 1 to CALL DSCAL 20_8 5 26D0 X 1_8 This example assumes USE SUNPERF is included in the code because the _64 has not been appended to the routine name Sun Performance Library User s Guide May 2003 The following code example shows calling CAXPY from Fortran 95 using 32 bit arguments PROGRAM TEST COMPLEX ALPHA INTEGER INCX INCY N COMPLEX X Y CALL CAXPY N ALPHA X INCX Y INCY Fl The following code example shows calling CAXPY from Fortran 95 without the US SUNPERF statement using 64 bit arguments PROGRAM TEST COMPLEX ALPHA INTEGER 8 INCX INCY N COMPLEX X Y CALL CAXPY_64 N ALPHA X INCX Y INCY When using 64 bit arguments the _64 must be ap
100. erence material for the base routines upon which Sun Performance Library is based Related Documents and Web Sites A number of books and web sites provide reference information on the routines in the base LAPACK and BLAS libraries upon which the Sun Performance Library is based The LAPACK Users Guide 3rd ed Anderson E and others SIAM 1999 augments the material in this manual and provide essential information The LAPACK Users Guide 3rd ed is the official reference for the base LAPACK version 3 0 routines An online version of the LAPACK 3 0 Users Guide is available at http www netlib org lapack lug and the printed version is available from the Society for Industrial and Applied Mathematics SIAM http www siam org Sun Performance Library routines contain performance enhancements extensions and features not described in the LAPACK Users Guide However because Sun Performance Library maintains compatibility with the base LAPACK routines the LAPACK Users Guide can be used as a reference for the LAPACK routines and the Fortran interfaces 10 Sun Performance Library User s Guide May 2003 Online Resources Online information describing the performance library routines that form the basis of the Sun Performance Library can be found at the following URLs LAPACK version 3 0 http www netlib org lapack BLAS levels 1 through 3 http www netlib org blas FFTPACK version 4 http www netlib o
101. ermitian indefinite matrix using the diagonal pivoting method Computes the inverse of a complex Hermitian indefinite matrix using the factorization computed by CHETRF or ZHETRF Solves a complex Hermitian indefinite matrix using the factorization computed by CHETRF or ZHETRF Hermitian Matrix in Packed Storage Estimates the reciprocal of the condition number of a Hermitian indefinite matrix in packed storage using the factorization computed by CHPTRF or ZHPTRF Replacement with newer version CHPEVD or ZHPEVD suggested Computes all the eigenvalues and eigenvectors of a Hermitian matrix in packed storage simple driver Computes selected eigenvalues and eigenvectors of a Hermitian matrix in packed storage expert driver Computes all the eigenvalues and eigenvectors of a Hermitian matrix in packed storage and uses a divide and conquer method to calculate eigenvectors 134 Sun Performance Library User s Guide May 2003 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Routine Function CHPGST or Reduces a Hermitian definite generalized eigenproblem to standard form ZHPGST where the coefficient matrices are in packed storage and uses the factorization computed by CPPTRF or ZPPTRE CHPGV or Replacement with newer version CHPGVD or ZHPGVD suggested ZHPGV Computes all the eigenvalues and eigenvectors of a generalized Hermitian definite eigenproblem where the coefficient matrices are in packed
102. es Computes generalized RQ factorization of two matrices Computes the generalized singular value decomposition Computes an orthogonal or unitary matrix as a preprocessing step for calculating the generalized singular value decomposition General Tridiagonal Matrix xXGTCON XGTRFS Estimates the reciprocal of the condition number of a tridiagonal matrix using the LU factorization as computed by xGTTRE Refines solution to a general tridiagonal system of linear equations 132 Sun Performance Library User s Guide May 2003 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Routine Function xGTSV Solves a general tridiagonal system of linear equations simple driver xGTSVX Solves a general tridiagonal system of linear equations expert driver XGTTRF Computes an LU factorization of a general tridiagonal matrix using partial pivoting and row exchanges xGTTRS P Solves general tridiagonal system of linear equations using the factorization computed by x Hermitian Band Matrix CHBEV or ZHBEV CHBEVD or ZHBEVD CHBEVX or ZHBEVX CHBGST or ZHBGST CHBGV or ZHBGV CHBGVD or ZHBGVD CHBGVX or ZHBGVX CHBTRD or ZHBTRD Replacement with newer version CHBEVD or ZHBEVD suggested Computes all eigenvalues and eigenvectors of a Hermitian band matrix Computes all eigenvalues and eigenvectors of a Hermitian band matrix and uses a divide and conquer method to calculate eigenvectors
103. es of a real symmetric tridiagonal matrix Computes all the eigenvalues and eigenvectors of a symmetric tridiagonal matrix using a divide and conquer method Computes selected eigenvalues and eigenvectors of a real symmetric tridiagonal matrix using Relatively Robust Representations Computes selected eigenvectors of a real symmetric tridiagonal matrix using inverse iteration Computes all the eigenvalues and eigenvectors of a real symmetric tridiagonal matrix using the implicit QL or QR algorithm Computes all the eigenvalues and eigenvectors of a real symmetric tridiagonal matrix using a root free QL or QR algorithm variant Replacement with newer version SSTEVR or DSTEVR suggested Computes all eigenvalues and eigenvectors of a real symmetric tridiagonal matrix simple driver 140 Sun Performance Library User s Guide May 2003 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Routine Function SSTEVX or Computes selected eigenvalues and eigenvectors of a real symmetric DSTEVX tridiagonal matrix expert driver SSTEVD or Replacement with newer version SSTEVR or DSTEVR suggested DSTEVD Computes all the eigenvalues and eigenvectors of a real symmetric tridiagonal matrix using a divide and conquer method SSTEVR or Computes selected eigenvalues and eigenvectors of a real symmetric DSTEVR tridiagonal matrix using Relatively Robust Representations xSTSV Computes the solution to a system of linear equat
104. example of combining a Sun Performance Library routine with an auto parallelizing compiler parallelization directive is shown in the following code example CSPAR DOALL DO TiS A CALL DGBMV No transpose N N ALPHA A LDA B l I 1 BETA C l I 1 END DO Sun Performance Library contains a routine named DGBMV to multiply a banded matrix by a vector By putting this routine into a properly constructed loop Sun Performance Library routines can be used to multiply a banded matrix by a matrix The compiler will not parallelize this loop by default because the presence of subroutine calls in a loop inhibits parallelization However Sun Performance Library routines are MT safe so a user can use parallelization directives that instruct the compiler to parallelize this loop Compiler directives can also be used to parallelize a loop with a subroutine call that ordinarily would not be parallelizable For example it is ordinarily not possible to parallelize a loop containing a call to some of the linear system solvers because some vendors have implemented those routines using code that is not MT safe Loops containing calls to the expert drivers of the linear system solvers routines whose names end in SVX are usually not parallelizable with other implementations Sun Performance Library User s Guide May 2003 of LAPACK Because the implementation of LAPACK in Sun Performance Library allows parallelization of loops cont
105. f the primary target Compile with the most appropriate xarch option for best performance At link time use the same xarch option that was used at compile time to select the version of the Sun Performance Library optimized for a specific SPARC instruction set architecture Note Using SPARC specific optimization options increases application performance on the selected instruction set architecture but limits code portability When using these optimization options the resulting code can be run only on systems using the specific SPARC chip from Sun Microsystems and in some cases a specific Solaris operating environment 32 bit or 64 bit Solaris 7 Solaris 8 or Solaris 9 The SunOS command isalist 1 can be used to display a list of the native instruction sets executable on a particular platform The names output by isalist are space separated and are ordered in the sense of best performance For a detailed description of the different xarch options refer to the Fortran User s Guide or the C User s Guide Use the following command line options to compile for 32 bit addressing in a 32 bit enabled Solaris operating environment a UltraSPARC I or UltraSPARC II systems Use xarch v8plus or xarch v8plusa a UltraSPARC III systems Use xarch v8plus or xarch v8plusb 38 Sun Performance Library User s Guide May 2003 Use the following command line options to compile for 64 bit addressing in a 64 bit enabled S
106. font l objet de ce manuel d entretien et les informations qu il contient sont regis par la legislation americaine en matiere de controle des exportations et peuvent etre soumis au droit d autres pays dans le domaine des exportations et importations Les utilisations finales ou utilisateurs finaux pour des armes nucleaires des missiles des armes biologiques et chimiques ou du nucleaire maritime directement ou indirectement sont strictement interdites Les exportations ou reexportations vers des pays sous embargo des Etats Unis ou vers des entites figurant sur les listes d exclusion d exportation americaines y compris mais de maniere non axdlusive la liste de personnes qui font objet d un ordre de ne pas participer d une facon directe ou indirecte aux exportations des produits ou des services qui sont regi par la legislation americaine en matiere de controle des exportations et la liste de ressortissants specifiquement designes sont rigoureusement interdites LA DOCUMENTATION EST FOURNIE EN L TAT ET TOUTES AUTRES CONDITIONS DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE A L APTITUDE A UNE UTILISATION PARTICULIERE OU A L ABSENCE DE CONTREFA ON QY Please 4 Fl 8 Recycle T Adobe PostScript Contents Before You Begin 9 Who Should Use This Book 9 How This Book Is Organized
107. form Routines and Their Arguments Name Arguments SINQB DSINQB LEN X WORK SINQI DSINQT LEN WORK VSINQF VDSINOF M LEN X WORK LD TABLE VSINQB VDSINQB M LEN X WORK LD TABLE VSINOQI VDSINQT LEN TABLE TABLE 5 5 Notes m M Number of sequences to be transformed m LEN LEN 1 LEN 1 Length of the input sequence or sequences m x A real array which contains the sequence or sequences to be transformed On output the real transform results are stored in X a TABLE Array of constants particular to a transform size that is required by the transform routine The constants are computed by the initialization routine m WORK Workspace required by the transform routine In routines that operate on a single sequence WORK also contains constants computed by the initialization routine Fast Cosine Transforms A special form of the FFT that operates on real even sequences is the fast cosine transform FCT A real sequence x is said to have even symmetry if x n x n where n N 1 0 N An FCT of a sequence of length 2N requires N 1 input data points and produces a sequence of size N 1 Routine COST computes the FCT of a single real even sequence while VCOST computes the FCT of one or more sequences Before calling V COST V COSTI must be called to compute trigonometric constants and factors associated with input length N 1 The FCT is its own inverse tran
108. he number of processors at runtime with the PARALLEL environment variable before running the executable For example to use 24 processors type the following commands my_system s 95 dalign xparallel my_app f xlic_lib sunperf my_system S setenv PARALLEL 24 my_system a out The previous example allows Sun Performance Library routines to run in parallel but no part of the user code my_app f will run in parallel For the compiler to attempt to parallelize my_app f either xparallel or explicitpar is required on the compile line Note Parallel processing options require using either the dalign command line option or establishing a trap 6 handler as described in Enabling Trap 6 on page 24 When using C do not use misalign Sun Performance Library User s Guide May 2003 To use multiple processors Call one or more of the routines Link with xlic_lib sunperf specified at the end of the command line Compile and link with xparallel xexplicitpar or xautopar Set PARALLEL to the number of available processors The following example shows how to compile and link with libsunperf to enable parallel operation on multiple processor systems cc dalign xarch xparallel any c xlic_lib sunperf or 95 dalign xarch xparallel any f95 xlic_lib sunperf Chapter 3 SPARC Optimization and Parallel Processing 47 48 Sun Performance Library User s Guide May 2003 CHAPTER
109. he one exception of the CANCEL routines which perform the same operation as the DSUB operator in 95 vector and set reductions and operations are the same as in the BLAS The CANCEL routines and all the vector and matrix set operations and utilities are interval specific For interval specific routines the 95 equivalent scalar routines are also shown in TABLE 6 3 and TABLE 6 9 through TABLE 6 12 For clarity lowercase and uppercase Fortran variable names are used to distinguish point from interval types See TABLE A 11 for an alphabetical list of all the IBLAS routines TABLE 6 2 Vector Reductions Name Function DOT_I Dot Product NORM_I Vector Norms SUM_I Sum AMIN_VAL_I Minimum Absolute Value and Location AMAX VAL_I Maximum Absolute Value and Location SUMSQ_I Scaled Sum of Squares and Update Sun Performance Library User s Guide May 2003 TABLE 6 3 Add or Cancel Vectors Name RSCALE_I AXPBY_I WAXPBY_TI CANCEL_I WCANCEL_I SUMSQ_I Operation 95 Equivalent Reciprocally Scale Vector Add Scaled Vectors and Update Add Scaled Vectors Cancel Scaled Vectors and Update Y a X DSUB b Y Cancel Scaled Vectors W a X DSUB b y Scaled Sum of Squares and Update TABLE 6 4 Vector Movements Name COPY_I SWAP_I PERMUTE_1I Operation Vector Copy Vector Swap Permute Vector and Update TABLE 6 5 Matrix Vector Operations Name GE GB MV_I SY SB SP MV_I TR TB TP MV_I
110. ies base Netlib routines can be replaced with Sun Performance Library routines Application performance is increased because Sun Performance Library routines can be faster than the corresponding Netlib routines or similar routines provided by other vendors 25 Improving Performance of Other Libraries Many commercial math libraries are built around a core of generic BLAS and LAPACK routines When an application has a dependency on proprietary interfaces in another library that prevents the library from being completely replaced the BLAS and LAPACK routines used in that library can be replaced with the Sun Performance Library BLAS and LAPACK routines Because replacing the core routines does not require any code changes the proprietary library features can still be used and the other routines in the library can remain unchanged Using Tools to Restructure Code Some libraries that do not directly use Sun Performance Library routines can be modified by using automatic code restructuring tools that replace existing code with Sun Performance Library code For example a source to source conversion tool can replace existing BLAS code structures with calls to the Sun Performance Library BLAS routines These conversion tools can also recognize many user written matrix multiplications and replace them with calls to the matrix multiplication subroutine in Sun Performance Library 26 Fortran Interfaces Sun Performance Library contains f95
111. ifically named routines such as DGEMM are maintained to support legacy code To determine the optional arguments for a routine refer to the section 3P man pages In the section 3P man pages optional arguments are enclosed in square brackets m 64 bit Integer Support When using the 64 bit interfaces provided with Sun Performance Library integer arguments need to be promoted to 64 bits and the routine name needs to be modified by appending _64 to the routine name With Chapter 2 Using Sun Performance Library 27 the SUNPERF module 64 bit integers will automatically be recognized which eliminates the need for appending _ 64 to the routine name as shown in the following code example SUBROUTINE SUB N ALPHA X Y USE SUNPERE INTEGER 8 N R EAL 8 ALPHA X N Y N EQUIVALENT TO DAXPY_64 N ALPHA X 1_8 Y 1_8 CALL DAXPY N ALPHA X 1_8 Y 1_8 END When using Sun Performance Library routines with optional arguments the _64 suffix is required for 64 bit integers as shown in the following code example SUBROUTINE SUB N ALPHA X Y USE SUNPERE INTEGER 8 N R EAL 8 ALPHA X N Y N EQUIVALENT TO DAXPY_64 N ALPHA X 1_8 Y 1_8 CALL AXPY_64 ALPHA ALPHA X X Y Y For a detailed description of using the Sun Performance Library 64 bit interfaces see Compiling Code for a 64 Bit Enabled Solaris
112. ing Sun Performance Library Signal Processing Routines 119 CODE EXAMPLE 5 13 Two Dimensional Convolution Using Direct Method Continued my_system 95 dalign con_ex23 f xlic_lib sunperf my_system a out Pl 1 0 0 0 3 0 0 01 5 0 0 0 S20 00 4 0 0 01 6 0 0 0 P2 S10 tt 0 60 3 2 0 0 04 moO st 0 30 2 0 0 0 4 0 0 01 6 0 0 0 P3 8320 J0 02 83 0 0 08 59 20 4 0 0 80 0 0 01 80 0 0 01 56 0 0 01 References For additional information on the DFT or FFT see the following sources Briggs William L and Henson Van Emden The DFT An Owner s Manual for the Discrete Fourier Transform Philadelphia PA SIAM 1995 Brigham E Oran The Fast Fourier Transform and Its Applications Upper Saddle River NJ Prentice Hall 1988 Chu Eleanor and George Alan Inside the FFT Black Box Serial and Parallel Fast Fourier Transform Algorithms Boca Raton FL CRC Press 2000 Press William H Teukolsky Saul A Vetterling William T and Flannery Brian P Numerical Recipes in C The Art of Scientific Computing 2 ed Cambridge United Kingdom Cambridge University Press 1992 Ramirez Robert W The FFT Fundamentals and Concepts Englewood Cliffs NJ Prentice Hall Inc 1985 Swartzrauber Paul N Vectorizing the FFTs In Rodrigue Garry ed Parallel Computations New York Academic Press Inc 1982 Strang Gilbert Linear Algebra and Its Applications 3
113. interfaces and legacy 77 interfaces for maintaining compatibility with the standard LAPACK and BLAS libraries and existing codes Sun Performance Library 95 and legacy 77 interfaces use the following conventions a All arguments are passed by reference m Types of arguments must be consistent within a call For example do not mix REAL 8 and REAL 4 parameters in the same call m Arrays are stored columnwise m Indices are based at one in keeping with standard Fortran practice When calling Sun Performance Library routines Do not prototype the subroutines with the Fortran 95 INTERFACE statement Use the USE SUNPERF statement instead a Do not use ext_names plain to compile routines that call routines from Sun Performance Library Sun Performance Library User s Guide May 2003 Fortran SUNPERF Module for Use With Fortran 95 Sun Performance Library provides a Fortran module for additional ease of use features with Fortran 95 programs To use this module include the following line in Fortran 95 codes USE SUNPEREF USE statements must precede all other statements in the code except for the PROGRAM or SUBROUTINE statement The SUNPERF module contains interfaces that simplify the calling sequences and provides the following features m Type Independence Sun Performance Library supports interfaces where the type of the data arguments will automatically be recogni
114. ions where the coefficient matrix is a symmetric tridiagonal matrix XSTTRF Computes the factorization of a symmetric tridiagonal matrix XSTTRS P Computes the solution to a system of linear equations where the coefficient matrix is a symmetric tridiagonal matrix Symmetric Matrix XSYCON SSYEV or DSYI SSY EV EVX or DSYEVX SSY I EVD or DSYEVD SSYEVR or DSYEVR SSYGST or DSYGST SSYGV or DSYGV SSYGVX or DSYGVX SSYGVD or DSYGVD XSYRFS XSYSV Estimates the reciprocal of the condition number of a symmetric matrix using the factorization computed by SSYTRF or DSYTRF Replacement with newer version SSYEVR or DSYEVR suggested Computes all eigenvalues and eigenvectors of a symmetric matrix Computes eigenvalues and eigenvectors of a symmetric matrix expert driver Replacement with newer version SSYEVR or DSYEVR suggested Computes all eigenvalues and eigenvectors of a symmetric matrix and uses a divide and conquer method to calculate eigenvectors Computes selected eigenvalues and eigenvectors of a symmetric tridiagonal matrix Reduces a symmetric definite generalized eigenproblem to standard form using the factorization computed by SPOTRF or DPOTRF Replacement with newer version SSYGVD or DSYGVD suggested Computes all the eigenvalues and eigenvectors of a generalized symmetric definite eigenproblem Computes selected eigenvalues and eigenvectors of a generalized symme
115. ix movements routines 126 IBLAS matrix set operations routines 127 IBLAS matrix utilities routines 128 IBLAS matrix vector operations routines 125 IBLAS O n2 matrix operations routines 126 IBLAS O n matrix operations routines 126 IBLAS vector movements routines 125 IBLAS vector reductions 124 IBLAS vector set operations routines 127 IBLAS vector utilities routines 127 inverse fast cosine transform routines 99 inverse fast cosine transform routines multiple quarter wave even sequences 101 inverse fast cosine transform routines quarter wave even sequence 100 inverse fast sine transform routines 101 inverse fast sine transform routines multiple quarter wave odd sequences 103 inverse fast sine transform routines multiple sequences 102 inverse fast sine transform routines quarter wave odd sequence 102 LAPACK 130 linear FFT routines 74 77 sparse BLAS 147 sparse solvers 149 VFFTPACK 150 152 S section 3P man pages 73 121 129 shell prompts 12 sine transforms 96 single processor 46 sparse BLAS 147 sparse matrices CSC storage format 55 structurally symmetric 56 symmetric 55 unsymmetric 56 sparse solver 149 sparse solver package 54 one call interface 60 regular interface 60 routine calling order 61 routines 60 using with C 55 STACKSIZE environment variable 42 structurally symmetric sparse matrix 56 SUNW_MP_THR_IDLE 45 symmetric banded matrix 54 symmetric
116. ldrhs 4 call dgsssl nrhs rhs ldrhs handle ier if ier ne 0 goto 110 c deallocate sparse solver storage c call dgssda handle ier if ier ne 0 goto 110 print values of sol write 6 200 i rhs i expected rhs i error do i 1 neqns write 6 300 i rhs i xexpct i rhs i xexpct i enddo stop 110 continue Sun Performance Library User s Guide May 2003 CODE EXAMPLE 4 3 Solving a Structurally Symmetric System With Unsymmetric Values Regular Interface Continued c c call to sparse solver returns an error c write 6 400 amp example FAILED sparse solver error number ier stop 200 format a5 3a20 300 format 15 3d20 12 i sol xexpct values 400 format a60 i20 fail message sparse solver error number end my_system S 95 dalign example_su f xlic_lib sunperf my_system a out rhs i expected rhs i error 100000000000D 01 100000000000D 01 000000000000D 00 200000000000D 01 200000000000D 01 000000000000D 00 300000000000D 01 300000000000D 01 000000000000D 00 400000000000D 01 400000000000D 01 000000000000D 00 oOo OOO oOo OO oO Oo OO fo Dm wN eB CODE EXAMPLE 4 4 Solving an Unsymmetric System Regular Interface my_system cat example_uu f program example_uu This program is an example driver that calls the sparse solver It factors and solves an unsymmetric system qaaaa implicit none in
117. m by using an orthogonal similarity transform Symmetric Matrix in Packed Storage XSPCON SSPEV or DSPEV SSPEVX or DSPEVX SSPEVD or DSPEVD SSPGST or DSPGST SSPGVD or DSPGVD Estimates the reciprocal of the condition number of a symmetric packed matrix using the factorization computed by xSPTRF Replacement with newer version SSPEVD or DSPEVD suggested Computes all the eigenvalues and eigenvectors of a symmetric matrix in packed storage simple driver Computes selected eigenvalues and eigenvectors of a symmetric matrix in packed storage expert driver Computes all the eigenvalues and eigenvectors of a symmetric matrix in packed storage and uses a divide and conquer method to calculate eigenvectors Reduces a real symmetric definite generalized eigenproblem to standard form where the coefficient matrices are in packed storage and uses the factorization computed by SPPTRF or DPPTRF Computes all the eigenvalues and eigenvectors of a real generalized symmetric definite eigenproblem where the coefficient matrices are in packed storage and uses a divide and conquer method to calculate eigenvectors Appendix A Sun Performance Library Routines 139 TABLE A 1 Routine LAPACK Linear Algebra Package Routines Continued Function SSPGV or DSPGV SSPGVX or DSPGVX XSPRFS XSPSV XSPSVX SSPTRD or DSPTRD XSPTRE XSPTRI XSPTRS P Replacement with newer version SSPGVD or DS
118. matrices sb_copy_i Symmetric band interval matrix copy sb_disjm_i If two interval matrices are disjoint sb_emptyelem_i Empty entry and its location sb_encm_i If an interval matrix is enclosed in another sb_hullm_i Convex hull of two interval matrices sb_infm_i Left endpoint of an interval matrix sb_interiorm_i If an interval matrix is in interior of another sb_interm_i Intersection of two interval matrices sb_lrscale_i Two sided diagonal scaling sb_midm_i Midpoint matrix of an interval matrix sb_norm_i Symmetric band interval matrix norms sb_supm_i Right endpoint of an interval matrix sb_whullm_i Convex hull of two interval matrices sb_widthm_i Elementwise width of an interval matrix sb_winterm_i Intersection of two interval matrices spmv_i Interval symmetric matrix vector product spr_i Symmetric rank one update sp_acc_i Symmetric packed matrix accumulation and scale 156 Sun Performance Library User s Guide May 2003 TABLE A 11 Interval BLAS Routines Continued Routine Function sp_add_i sp_constructm_i sp_copy_i sp_disjm_i sp_emptyelem_i sp_encm_i sp_hullm_i sp_infm_i sp_interiorm_i sp_interm_i sp_lrscale_i sp_midm_i sp_norm_i sp_supm_i sp_whullm_i sp_widthm_i sp_winterm_i sumsq_i sum_i supv_i swap_i symm_i symv_i syr_i sy_acc_i sy_add_i sy_constructm_i sy_copy_i sy_disjm_i sy_emptyelem_i Symmetric packed matrix add and scale Constructs an interval matrix from two floati
119. matrix from two floating point matrices gb_copy_i General band interval matrix copy gb_diag_scale_i Diagonal scaling of an interval matrix gb_disjm_i If two interval matrices are disjoint gb_emptyelem_i Empty entry and its location gb_encm_i If an interval matrix is enclosed in another gb_hullm_i Convex hull of two interval matrices gb_infm_i Left endpoint of an interval matrix gb_interiorm_i If an interval matrix is in interior of another gb_interm_i Intersection of two interval matrices gb_lrscale_i Two sided diagonal scaling 154 Sun Performance Library User s Guide May 2003 TABLE A 11 Interval BLAS Routines Continued Routine Function gb_midm_i gb_norm_i gb_supm_i gb_whullm_i gb_widthm_i gb_winterm_i gemm_i gemv_i ger_i ge_acc_i ge_add_i ge_constructm_i ge_copy_i ge_diag_scale_i ge_disjm_i ge_emptyelem_i ge_encm_i ge_hullm_i ge_infm_i ge_interiorm_i ge_interm_i ge_lrscale_i ge_midm_i ge_norm_i ge_permute_i ge_supm_i ge_trans_i ge_whullm_i ge_widthm_i ge_winterm_i Midpoint matrix of an interval matrix General band interval matrix norms Right endpoint of an interval matrix Convex hull of two interval matrices Elementwise width of an interval matrix Intersection of two interval matrices General interval matrix product General interval matrix and vector multiplication Rank one update General matrix accumulation and scale General interval matrix add and scale Construct
120. mber of implicit zeros prefixed to each row of the input matrix NPRE Number of implicit zeros prefixed to each column of the input matrix MPOST AX 0 MZ MYC NPOST AX 0 NZ NYC MYC PRE MPOST MYC_INIT where MYC_INIT depends upon filter and input matrices as shown in TABLE 5 10 NYC NPRE NPOST NYC_INIT where NYC_INIT depends upon filter and input matrices as shown in TABLE 5 10 MYC_INIT and NYC_INIT depend upon the following where X is the filter matrix and Y is the input matrix TABLE 5 10 MYC_INIT and NYC_INIT Dependencies MYC_INIT NYC_INIT Transpose X Transpose X MAX NX MY MAX NX NY AX NX NY MAX MX NY AX NX MY MAX MX MY Sun Performance Library User s Guide May 2003 The values assigned to the minimum work array size is shown in TABLE 5 11 TABLE 5 11 Minimum Dimensions and Data Types for WORK Work Array Used With Convolution and Correlation Routines Routine Minimum Work Array Size WORK Type SCNVCOR DCNVCOR 4 MAX NX NPRE NY REAL REAL 8 MAX 0 NZ NY CCNVCOR ZCNVCOR 2 MAX NX NPRE NY COMPLEX MAX 0 NZ NY COMP LEX 16 SCNVCOR2 DCNVCOR2 mMy NY 30 COMPLEX COMP LEX 16 CCNVCOR2 ZCNVCOR2 IfMY NY MYC 8 COMPLEX If MY NY MYC NYC 16 COMP LEX 16 1 Memory will be allocated within the routine if the workspace size indicated by LWORK is not large enough Sample Program Convolution CODE EXAMPLE 5 10 uses CCN
121. me For example use daxpy_64 in place of daxpy However if calling the 64 bit integer interfaces indirectly do not append _64 to the name of the Sun Performance Library routine Calls to the Sun Performance Library routine will access a 32 bit wrapper that promotes the 32 bit integers to 64 bit integers calls the 64 bit routine and then demotes the 64 bit integers to 32 bit integers For best performance call the routine directly by appending _ 64 to the routine name Chapter 3 SPARC Optimization and Parallel Processing 39 40 For C programs use long instead of int arguments The following code example shows calling the 64 bit integer interfaces directly include lt sunperf h gt long n incx incy double alpha x y daxpy_64 n alpha x incx y incy The following code example shows calling the 64 bit integer interfaces indirectly include lt sunperf h gt int on incx incy double alpha x y daxpy n alpha x incx y incy For Fortran programs use 64 bit integers for all integer arguments The following methods can be used to convert integer arguments to 64 bits a To promote all default integers integers declared without explicit byte sizes and literal integer constants from 32 bits to 64 bits compile with xt ypemap integer 64 a To promote specific integer declarations change INTEGER or INTEGER 4 to INTEGER 8 To promote integ
122. mplex CFFTB P sequence CFFTC2 P CFFT2I Initialize the trigonometric weight and factor tables or compute CFFT2F P the two dimensional forward or inverse FFT of a two CFFT2B P dimensional complex array CFFTC3 P CFFT3I Initialize the trigonometric weight and factor tables or compute CFFT3F P the three dimensional forward or inverse FFT of three CFFT3B P dimensional complex array CFFTCM P VCFFTI Initialize the trigonometric weight and factor tables or compute VCFFTF P the one dimensional forward or inverse FFT of a set of data VCFFTB P sequences stored in a two dimensional complex array CFFTS RFFTI RFFTB Initialize the trigonometric weight and factor tables or compute EZFFTI EZFFTB the one dimensional inverse FFT of a complex sequence CFFTS2 RFFT2I Initialize the trigonometric weight and factor tables or compute RFFT2B the two dimensional inverse FFT of a two dimensional complex array CFFTS3 P RFFT3I Initialize the trigonometric weight and factor tables or compute RFFT3B the three dimensional inverse FFT of three dimensional complex array CFFTSM VRFFTI Initialize the trigonometric weight and factor tables or compute VRFFTB P the one dimensional inverse FFT of a set of data sequences stored in a two dimensional complex array DFFTZ DFFTI DFFTF Initialize the trigonometric weight and factor tables or compute DEZFFTI DEZFFTF the one dimensional forward FFT of a double precision sequence 150 Sun Performance Library
123. mputed by CHPTRF or ZHPTRF Upper Hessenberg Matrix XHSEIN Computes right and or left eigenvectors of upper Hessenberg matrix using inverse iteration XHSEQR Computes eigenvectors and Shur factorization of upper Hessenberg matrix using multishift QR algorithm Upper Hessenberg Matrix Generalized Problem Hessenberg and Triangular Matrix XHGEQZ Implements single double shift version of QZ method for finding the generalized eigenvalues of the equation det A w i B 0 Appendix A Sun Performance Library Routines 135 TABLE A 1 Routine LAPACK Linear Algebra Package Routines Continued Function Real Orthogonal Matrix in Packed Storage SOPGTR or DOPGTR SOPMTR or DOPMTR Generates an orthogonal transformation matrix from a tridiagonal matrix determined by SSPTRD or DSPTRD Multiplies a general matrix by the orthogonal transformation matrix reduced to tridiagonal form by SSPTRD or DSPTRD Real Orthogonal Matrix SORGBR or DORGBR SORGHR or DORGHR SORGLO or DORGLO SORGQL or DORGOL SORGOR or DORGOR SORGRQ or DORGRO SORGTR or DORGTR SOR DORI SOR DORI SOR DORI SOR DORI SOR DORI SOR DORI SOR DORI SOR DORI BR or BR HR or HR LQ P or LO P QL P or QL P QR P or QR P R3 or R3 RQ P or RO P RZ or RZ Generates the orthogonal transformation matrices from reduction to bidiagonal form as determined by
124. mputes convolution or correlation XCNVCOR2 Computes two dimensional convolution or correlation Miscellaneous Signal Processing Routines TABLE A 10 lists the miscellaneous Sun Performance Library signal processing routines TABLE A 10 Convolution and Correlation Routines Routines Function RFFTOPT DFFTOPT Compute the length of the closest FFT CFFTOPT ZFFTOPT SWIENER or DWEINER Performs Wiener deconvolution of two signals XTRANS Transposes array Appendix A Sun Performance Library Routines 153 Interval BLAS IBLAS Routines Sun Performance Library includes the interval BLAS routines listed in TABLE A 11 which operate on interval scalars interval vectors and interval matrices dense banded symmetric and triangular TABLE A 11 Interval BLAS Routines Routine Function amax_val_i Max absolute value and location amin_val_i Min absolute value and location axpby_i Scaled vector accumulation cancel_i Scaled cancellation constructv_i Constructs an interval vector copy_i Interval vector copy disjv_i Checks if two interval vectors disjoint dot_i Scaled dot product of two interval vectors emptyelev_i Empty entry and its location encv_i Check if an interval vector is enclosed in another fpinfo_i Environmental enquiry gbmv_i Interval matrix vector multiplication gb_acc_i General band matrix accumulation and scale gb_add_i General band matrix add and scale gb_constructm_i Constructs an interval
125. ms have different memory requirements for input and output data Care must be taken to ensure that the input array is large enough to acommodate the transform results when computing an in place tranform TRIGS Array containing the trigonometric weights IFAC Array containing factors of the problem dimensions The problem sizes are as follows a Linear FFT Problem size of dimension N1 a Two dimensional FFT Problem size of dimensions N1 and N2 a Three dimensional FFT Problem size of dimensions N1 N2 and N3 While N1 N2 and N3 can be of any size a real to complex or a complex to real transform can be computed most efficiently when N1 N2 N3 2 x31x4 x5 and a complex to complex transform can be computed most efficiently when N1 N2 N3 2 x3 x4 x5 x7 x 11 x 13 where p q r s t u and v are integers and p q r s t u v 2 0 WORK Workspace whose size depends on the routine and the number of threads that are being used to compute the transform if the routine is parallelized LWORK Size of workspace If LWORK is zero the routine will allocate a workspace with the required size SCALE A scalar with which the output is scaled Occasionally in literature the inverse transform is defined with a scaling factor of 1 N1 for one dimensional transforms 1 N1 x N2for two dimensional transforms and 1 N1 x N2 x N3 for three dimensional transforms In such case the inverse transform is said to be normalized
126. n Requirements In place Out of Place OPT 0 initialization OPT 1 real to complex N1 x N2 Real 41 1 xN2 LDX1 2xLDY1 LDX12 gt NI1 forward two dimensional Complex LDY1 gt oa 1 LDY1 gt 1 1 FFT OPT 0 initialization OPT 1 complex to real 4 1 xN2 N1xN2 Real LDX12 Xl 1 LDX1 2 y 1 inverse two dimensional Complex LDY1 2 x LDX1 LDY1 gt 2 x LDX1 FFT LDY1 is even Chapter 5 Using Sun Performance Library Signal Processing Routines 85 TABLE 5 3 Single Precision Two Dimensional FFT Routines Size Type of Size Type of Name Purpose Input Output CFFTC2 OPT O initialization OPT 1 complex to N1 x N2 N1 x N2 complex forward two Complex Complex dimensional FFT OPT 1 complex to N1 x N2 N1 x N2 complex inverse two Complex Complex dimensional FFT TABLE 5 3 Notes N1 is first dimension of the FFT problem Leading Dimension Requirements In place Out of Place LDX1 N1 LDX1 N1 LDY1 LDX1 LDY1 2 N1 LDX1 gt N1 LDX1 2 gt N1 LDY1 LDX1 LDY1 LDX1 LDX1 is leading dimension of input array LDY1 is leading dimension of output array N2 is second dimension of the FFT problem When calling routines with OPT 0 to initialize the routine the only error checking that is done is to determine if N1 N2 lt 0 The following example shows how to compute a two dimensional real to complex FFT and complex to real FFT of a two dimensional array CODE EXAMPLE 5 3 Dimensional
127. ndler_ retl ta 6 2 Assemble trap6_handler s my_system fbe trap6_handler s The first parallelizable subroutine invoked from Sun Performance Library will call a routine named trap6_handler_ Ifa trap6_handler_ is not specified Sun Performance Library will call a default handler that does nothing Not supplying a handler for any misaligned data will cause a trap that will be fatal fbe 1 is the Solaris assembler for SPARC platforms 3 Include trap6_handler o on the command line my_system 95 any f trap6_handler o xlic_lib sunperf 24 Sun Performance Library User s Guide May 2003 CHAPTER 2 Using Sun Performance Library This chapter describes using the Sun Performance Library to improve the execution speed of applications written in Fortran 95 or C The performance of many applications can be increased by using Sun Performance Library without making source code changes or recompiling However some modifications to applications might be required to gain peak performance with Sun Performance Library Improving Application Performance The following sections describe ways of using Sun Performance Library routines without making source code changes or recompiling Replacing Routines With Sun Performance Library Routines Many applications use one or more of the base Netlib libraries such as LAPACK or BLAS Because Sun Performance Library maintains the same interfaces and functionality of these librar
128. ng point matrices Symmetric packed interval matrix copy If two interval matrices are disjoint Empty entry and its location If an interval matrix is enclosed in another Convex hull of two interval matrices Left endpoint of an interval matrix If an interval matrix is in interior of another Intersection of two interval matrices Two sided diagonal scaling Midpoint matrix of an interval matrix Symmetric packed interval matrix norms Right endpoint of an interval matrix Convex hull of two interval matrices Elementwise width of an interval matrix Intersection of two interval matrices Sum of squares Sum the entries of an interval vector The right endpoint of an interval vector Interval vector swap Symmetric interval matrix product Interval symmetric matrix vector product Symmetric rank one update Symmetric interval matrix accumulation and scale Symmetric matrix add and scale Constructs an interval matrix from two floating point matrices Symmetric interval matrix copy If two interval matrices are disjoint Empty entry and its location Appendix A Sun Performance Library Routines 157 TABLE A 11 Interval BLAS Routines Continued Routine Function sy_encm_i If an interval matrix is enclosed in another sy_hullm_i Convex hull of two interval matrices sy_infm_i Left endpoint of an interval matrix sy_interiorm_i If an interval matrix is in interior of another sy_interm_i Intersection of two inte
129. ntrat de licence standard de Sun Microsystems Inc ainsi qu aux dispositions en vigueur de la FAR Federal Acquisition Regulations et des supplements a celles ci Distribue par des licences qui en restreignent l utilisation Cette distribution peut comprendre des composants developpes par des tierces parties Des parties de ce produit pourront etre derivees Cray CF90 un produit de Cray Inc Des parties de ce produit pourront etre derivees des systemes Berkeley BSD licencies par l Universite de Californie UNIX est une marque deposee aux Etats Unis et dans d autres pays et licenciee exclusivement par X Open Company Ltd libdwarf et libredblack sont d posent 2000 Silicon Graphics Inc et sont disponible sous le GNU Moins G n ral Public Permis de http www sgi com Sun Sun Microsystems le logo Sun Java Sun ONE Studio le logo Solaris et le logo Sun ONE sont des marques de fabrique ou des marques deposees de Sun Microsystems Inc aux Etats Unis et dans d autres pays Netscape et Netscape Navigator sont des marques de fabrique ou des marques d pos es de Netscape Communications Corporation aux Etats Unis et dans d autres pays Toutes les marques SPARC sont utilisees sous licence et sont des marques de fabrique ou des marques deposees de SPARC International Inc aux Etats Unis et dans d autres pays Les produits protant les marques SPARC sont bases sur une architecture developpee par Sun Microsystems Inc Les produits qui
130. nverse linear FFT of M vectors Size and Type of Input N1 Complex N1 xM Real N1 1 xM T i Complex N1 xM Complex N1 xM Complex Size and Type of Output N1 Complex Fi 2 Complex N1 xM Real N1 xM Complex N1 xM Complex In place LDX1 2x LDY1 N1 LDX1 1 2 1 2 x LDX1 Leading Dimension Requirements Out of Place LDX1 gt N1 LDX1 gt 1 LDY1 2 LDX1 LDY1 2 LDX1 LDY1 2 CODE EXAMPLE 5 1 TABLE 5 2 Notes m LDX1 is the leading dimension of the input array m LDY1 is the leading dimension of the output array m N1 is the first dimension of the FFT problem m N2 is the second dimension of the FFT problem a When calling routines with OPT 0 to initialize the routine the only error checking that is done is to determine if N1 lt 0 CODE EXAMPLE 5 1 shows how to compute the linear real to complex and complex to real FFT of a set of sequences Linear Real to Complex FFT and Complex to Real FFT my_system cat testscm f PROGRAM TESTSCM IMPLICIT NONE 78 Sun Performance Library User s Guide May 2003 CODE EXAMPLE 5 1 Linear Real to Complex FFT and Complex to Real FFT Continued INTEGER LW IERR I J K LDX LDC INTEGER PARAMETER
131. o compute the FCT of an N point sequence Chapter 5 Using Sun Performance Library Signal Processing Routines 99 100 m D COST also computes the inverse transform When D COST is called twice the result will be the original sequence scaled by 5 V D COST Forward and Inverse Fast Cosine Transforms of Multiple Sequences VFCT The forward and inverse FCTs of multiple sequences are computed as Fori 0 M 1 N 1 X i k HAN nE n eos CM cos nk k 0 N V D COST Notes a Mx N 1 values are needed to compute the VFCT of M N point sequences m The input and output sequences are stored row wise m V D COST is normalized and is its own inverse When V D COST is called twice the result will be the original data D COSQF Forward FCT of a Quarter Wave Even Sequence The forward FCT of a quarter wave even sequence is computed as N 1 X k x 0 2 be x n e0s mOr 1 en ee E 2N n 1 N values are needed to compute the forward FCT of an N point quarter wave even sequence D COSQB Inverse FCT of a Quarter Wave Even Sequence The inverse FCT of a quarter wave even sequence is computed as 2k 1 ma n 0 N 1 x n X keos TN k 0 Calling the forward and inverse routines will result in the original input scaled by 1 IN Sun Performance Library User s Guide May 2003 V D COSQF Forward FCT of One or More Quarter Wave Even Sequences The forward FCT of one or more quarter wave even sequen
132. olaris operating environment m UltraSPARC I or UltraSPARC II systems Use xarch v9 or xarch v9a a UltraSPARC III systems Use xarch v9 or xarch v9b Compiling Code for a 64 Bit Enabled Solaris Operating Environment To compile code for a 64 bit enabled Solaris operating environment use xarch v9 alb and convert all integer arguments to 64 bit arguments 64 bit routines require the use of 64 bit integers Sun Performance Library provides 32 bit and 64 bit interfaces To use the 64 bit interfaces a Modify the Sun Performance Library routine name For C and Fortran 95 code append _64 to the names of Sun Performance Library routines for example rfftf_64 or CFFTB_64 For Fortran 95 code with the USE SUNPERF statement the _64 suffix is not strictly required for specific interfaces such as DGEMM The _64 suffix is still required for the generic interfaces such as GEMM m Promote integers to 64 bits Double precision variables and the real and imaginary parts of double complex variables are already 64 bits Only the integers are promoted to 64 bits 64 Bit Integer Arguments These additional 64 bit integer interfaces are available only in the v9 v9a and v9b libraries Codes compiled for 32 bit operating environments xarch set to v8plusa or v8plusb can not call the 64 bit integer interfaces To call the 64 bit integer interfaces directly append the suffix _64 to the standard library na
133. onstants and factors associated with input length N 1 The FST is its own inverse transform Calling VSINT twice will result in the original N 1 data points Calling SINT twice will result in the original N 1 data points multiplied by 2N An odd sequence with symmetry such that x n x n 1 where n N 1 0 N is said to have quarter wave odd symmetry SINOF and SINQB compute the FST and its inverse respectively of a single real quarter wave odd sequence while VSINQF and VSINQB operate on one or more sequences SINQB is unnormalized so using the results of SINOF as input in SINQB produces the original sequence scaled by a factor of 4N However VSINQB is normalized so a call to VSINOF followed by a call to VSINQB will produce the original sequence An FST of a real sequence of length 2N that has quarter wave odd symmetry requires N input data points and produces an N point resulting sequence Initialization is required before calling the transform routines by calling V SINQI Discrete Fast Cosine and Sine Transforms and Their Inverse Sun Performance Library routines use the equations in the following sections to compute the fast cosine and sine transforms and inverse transforms D COST Forward and Inverse Fast Cosine Transform FCT of a Sequence The forward and inverse FCT of a sequence is computed as N 1 X k x 0 2 x n eos TEE x WW cos mk k 0 N n l D COST Notes a N 1 values are needed t
134. or example all of the CLAPACK routines are followed by a trailing underscore to maintain compatibility with Fortran compilers which often postfix routine names in the object 0 file with an underscore The Sun Performance Library C interfaces do not require a trailing underscore Sun Performance Library C interfaces use the following conventions m Input only scalars are passed by value rather than by reference Complex and double complex arguments are not considered scalars because they are not implemented as a scalar type by C Chapter 2 Using Sun Performance Library 33 34 Complex scalars can be passed as either structures or arrays of length 2 m Types of arguments must match even after C does type conversion For example be careful when passing a single precision real value because a C compiler can automatically promote the argument to double precision m Arrays are stored columnwise For Fortran programmers this is the natural order in which arrays are stored For C programmers this is the transpose of the order in which they usually work References in the documentation and man pages to rows refer to columns and vice versa a Array indices are based at one in conformance with Fortran conventions rather than being zero as in C For example the Fortran interface to IDAMAX which C programs access as idamax_ would return a 1 to indicate the first element in a vector The C interface to idamax which C programs access
135. ore to actually compute the transform The initialization includes computing the factors of N1 N2 and N3 and the trigonometric weights associated with those factors In subsequent forward or inverse transforms initialization is not necessary as long as N1 N2 and N3 remain unchanged IMPORTANT Upon returning from a three dimensional FFT routine Y 0 N 1 contains the transform results and the original contents of Y N LDY1 1 is overwritten Here N N1 in the complex to complex and complex to real transforms and N 1 in the real to complex transform TABLE 5 4 lists the single precision three dimensional FFT routines and their purposes The same information applies to the corresponding double precision routines except that their data types are double precision and double complex See TABLE 5 4 for the mapping See the individual man pages for a complete description of the routines and their arguments TABLE 5 4 Single Precision Three Dimensional FFT Routines Name Purpose Size Type of Input Size Type of Output Leading Dimension Requirements In place Out of Place SFFTC3 OPT 0 initialization OPT 1 real to N1xN2xN3 Real 1 xN2xN3 LDX1 2xXLDY1 LDX1 gt NI1 complex forward Complex LDX2 gt N2 LDX2 gt N2 three f LDY12 1 woyi gt 1 dimensional FFT LDY2 LDX2 LDY2 gt N2 CFFTS3 OPT 0 initialization OPT 1complex 41 1 xN2xN3 N1xN2xN3 Real Loxi gt 1 1LDX12 1 to real inverse Complex LDX2 g
136. outines that have two dimensional arrays as input and output TABLE 5 2 also lists the leading dimension requirements The same information applies to the corresponding double precision routines except that their data types are double precision and double complex See TABLE 5 2 for the mapping See the individual man pages for a complete description of the routines and their arguments TABLE 5 2 Single Precision Linear FFT Routines Size and Type Size and Type Name Purpose of Input of Output Leading Dimension Requirements In place Out of Place SFFTC OPT 0 initialization OPT 1 real to N1 m 1 complex forward linear Real FFT of a single vector Complex SFFTC OPT 0 initialization OPT 1 complex to 4 1 N1 real inverse linear FFT Real of single vector Complex CFFTC OPT 0 initialization OPT 1 complex to N1 N1 complex forward linear Complex Complex FFT of a single vector Chapter 5 Using Sun Performance Library Signal Processing Routines 77 TABLE 5 2 Name SFFTCM CFETSM CFEFTCM Single Precision Linear FFT Routines Continued Purpose OPT 1 complex to complex inverse linear FFT of a single vector OPT 0 initialization OPT 1 real to complex forward linear FFT of M vectors OPT 0 initialization OPT 1 complex to real inverse linear FFT of M vectors OPT 0 initialization OPT 1 complex to complex forward linear FFT of M vectors OPT 1 complex to complex i
137. parse matrices Solve symmetric structurally symmetric and unsymmetric coefficient matrices using direct methods and a choice of fill reducing ordering algorithms and user specified orderings m Convolution and correlation in one and two dimensions a Fast Fourier transforms Fourier synthesis cosine and quarter wave cosine transforms cosine and quarter wave sine transforms Complex vector FFTs and FFTs in two and three dimensions 22 Compatibility With Previous LAPACK Versions The Sun Performance Library routines that are based on LAPACK support the expanded capabilities and improved algorithms in LAPACK 3 0 but are completely compatible with both LAPACK 1 x and LAPACK 2 0 Maintaining compatibility with previous LAPACK versions m Reduces linking errors due to changes in subroutine names or argument lists m Ensures results are consistent with results generated with previous LAPACK versions a Minimizes programs terminating due to differences between argument lists Sun Performance Library User s Guide May 2003 Getting Started With Sun Performance Library This section shows the most basic compiler options used to compile an application that uses the Sun Performance Library routines To use the Sun Performance Library type one of the following commands my_system s 95 dalign my _file f xlic_lib sunperf or my_system cc dalign my_file c xlic_lib sunperf Because Sun Pe
138. pended to the routine name if the USE SUNPERF statement is not used The following Fortran 95 code example shows calling CAXPY using 64 bit arguments PROGRAM TEST USE SUNPERF COMP LEX ALPHA INTEGER 8 INCX INCY N COMP LEX X Y CALL CAXPY N ALPHA X INCX Y INCY In C routines the size of long is 32 bits when compiling for V8 or V8plus and 64 bits when compiling for V9 The following code example shows calling the dgbcon routine using 32 bit arguments void dgbcon char norm int n int nsub int nsuper double da int lda int ipivot double danorm double drcond int info Chapter 3 SPARC Optimization and Parallel Processing 41 The following code example shows calling the dgbcon routine using 64 bit arguments void dgbcon_64 char norm long n long nsub long nsuper double da long lda long ipivot double danorm double drcond long info Parallel Processing To enable parallel processing for the Sun Performance Library routines use one of the parallelization options xparallel xexplicitpar or xautopar at link time as shown in the following examples cc dalign xarch xparallel a c xlic_lib sunperf or 95 dalign xarch xparallel a f95 xlic_lib sunperf Run Time Issues At run time if running with compiler parallelization Sun Performance Library uses the same pool of threads that the compil
139. r Interior Test Disjoint Vector Test Intersect Vectors and Update Intersect Vectors Hull of Vectors and Update Hull of Vectors 9 oa Equivalent SB DJ sX INTY X IX Y X IX Y X IH Y K CHAY TABLE 6 10 Matrix Set Operations Prefix GE GB SY SB SP TR TB TP _ Name Operation 95 Equivalent ENCLOSEM_I Enclose Matrix Test A SB B INTERIORM_I Matrix Interior Test A INT B DISJOINTM_I Disjoint Matrix Test A DJ B INTERSECTM_I Intersect Matrices and Update B X IX B WINTERSECTM_I Intersect Matrices W X IX B HULLM_I Hull of Matrices and Update B X IH B WHULLM_I Hull of Matrices W X IH B Note Prefix depends upon matrix type and applies to all routine names in this table TABLE 6 11 Vector Utilities Name Operation 95 Equivalent EMPTYV_I INFV_I SUPV_I MIDV_I WIDTHV_I INTERVALV_I Empty Vector Element Test and Location Vector Infimum Vector Supremum Vector Midpoint Vector Width Vector Type Conversion to Interval Chapter 6 ISEMPTY X v INF xX v SUP X v MID X v WID X X INTERVAL u v Interval BLAS Routines 127 TABLE 6 12 Matrix Utilities Prefix Name Operation EMPTYM_I Empty Matrix Element Test and Location INFM_I Matrix Infimum GE GB SY SB SP TR TB TP _ SUPM_I Matrix Supremum MIDM_I Matrix Midpoint WIDTHM_I Matrix Width INTERVALM_I Matrix Type Conversion to Interval Note Prefix
140. ration Root of Name Prefix and Suffix Dot product Scalar times a added to a ve DOT vector AXPY ctor Apply Givens ROT rotation Gather x into Scatter x into y GTHR y SCTR SSI D I CFUL 470D C7 CL A QI S L D I Cri 4 1 S I D I S D CE Z Se Do C Z Z Z S D C Z The prefix can be one of the following data types C COMPLI S SINGLE D DOUBLE any ty EX Z COMPLI EX 16 or DOUBLE COMPLE X The I CI and UI suffixes denote sparse BLAS routines that are direct extensions to dense BLAS routines NIST Fortran Sparse BLAS Each NIST Fortran Sparse BLAS routine has a six character name of the form XYYYZZ where m X represents the data type m YYY represents the sparse storage format m ZZ represents the operation Sun Performance Library User s Guide May 2003 TABLE 4 2 shows the values for X Y and Z TABLE 4 2 NIST Fortran Sparse BLAS Routine Naming Conventions X Data Type x S single precision D double precision C complex Z double complex YYY Sparse Storage Format YYY Single entry formats Block entry formats ZZ Operation COO coordinate CSC compressed sparse column CSR compressed sparse row DIA diagonal ELL ellpack JAD jagged diagonal SKY skyline BCO block coordinate BSC block compressed sparse column BSR block compressed sparse row BDI block diagonal BEL block ellpack
141. re undefined after returning from a call where X is used for scratch space N or n specifies that Y is the input matrix T or t specifies that the transpose of Y is the input matrix N or n specifies that Y must be preserved S or s specifies that Y can be used for scratch space The contents of X are undefined after returning from a call where Y is used for scratch space Number of rows in the filter matrix X where Mx gt 0 Number of columns in the filter matrix X where NX gt 0 Filter matrix X is unchanged on exit when SCRATCHX is N or n and undefined on exit when SCRATCHX is S or s Leading dimension of array containing the filter matrix X Number of rows in the input matrix Y where My 2 0 Sun Performance Library User s Guide May 2003 TABLE 5 8 Arguments for Two Dimensional Convolution and Correlation Routines SCNVCOR2 DCNVCOR2 CCNVCOR2 and ZCNVCOR2 Continued Argument Definition NY Number of columns in the input matrix Y where NY gt 0 MPRE Number of implicit zeros prefixed to each row of the input matrix Y vectors where MPRE 2 0 NPRE Number of implicit zeros prefixed to each column of the input matrix Y where NPRE gt 0 Y Input matrix Y is unchanged on exit when SCRATCHY is N or n and undefined on exit when SCRATCHY is S or s LDY Leading dimension of array containing the input matrix Y MZ Number of output vectors where MZ 2 0
142. rete correlation operations Convolution Given two functions x t and y t the Fourier transform of the convolution of x t and y t denoted as x y is the product of their individual Fourier transforms DFT x y X Y where denotes the convolution operation and denotes pointwise multiplication Typically x t is a continuous and periodic signal that is represented discretely by a set of N data points x j 0 N 1 sampled over a finite duration usually for one period of x t at equal intervals y t is usually a response that starts out as zero peaks to a maximum value and then returns to zero Discretizing y t at equal 108 Sun Performance Library User s Guide May 2003 intervals produces a set of N data points yy k 0 N 1 If the actual number of samplings in y is less than N the data can be padded with zeros The discrete convolution can then be defined as N 2 y Yoox ie F 0 N 1 kee The values of y k s 1 alga are the same as those of k 0 N 1 but in the wrap around order The Sun Performance Library routines allow the user to compute the convolution by using the definition above with k 0 N 1 or by using the FFT If the FFT is used to compute the convolution of two sequences the following steps are performed Compute X forward FFT of x Compute Y forward FFT of y Compute Z X Y DFT x y Compute z inverse FFT of Z z x y One interesting
143. rformance Library routines are compiled with dalign the dalign option should be used for compilation of all files if any routine in the program makes a Sun Performance Library call If dalign cannot be used enabling Trap 6 described in the section Enabling Trap 6 on page 24 is a low performance workaround that allows misaligned data Sun Performance Library is linked into an application with the xlic_lib switch rather than the 1 switch that is used to link in other libraries The xlic_lib switch gives the same effect as if 1 was used to specify the Sun Performance Library and added 1 switches for all of the supporting libraries that Sun Performance Library requires To summarize use the following m dalign on all files at compile time or enable trap 6 m The same command line options for compiling and linking m xlic_lib sunperf Additional compiler options exist that optimize application performance for the following m Specific SPARC instruction set architectures as described in Compiling for SPARC Platforms on page 38 m Parallel processing as described in Parallel Processing on page 42 Chapter 1 Introduction 23 Enabling Trap 6 If an application cannot be compiled using dalign enable trap 6 to provide a handler for misaligned data To enable trap 6 on SPARC do the following 1 Place this assembly code in a file called trap6_handler s global trap6_handler_ etext align 4 trap6_ha
144. rg fftpack VFFTPACK version 2 1 http www netlib org vfftpack Sparse BLAS http www netlib org sparseblas index html NIST National Institute of http math nist gov spblas Standards and Technology Fortran Sparse BLAS Note LINPACK has been removed from the Sun Performance Library The LINPACK libraries and documentation are still available from www netlib org Typographic Conventions TABLE P 1 Typeface Conventions Typeface Meaning Examples AaBbCc123 The names of commands files and directories on screen computer output AaBbCc123 What you type when contrasted with on screen computer output AaBbCc123 Book titles new words or terms words to be emphasized AaBbCc123 Command line placeholder text replace with a real name or value Edit your login file Use 1s a to list all files You have mail o 5 su Password Read Chapter 6 in the User s Guide These are called class options You must be superuser to do this To delete a file type rm filename Before You Begin 11 TABLE P 2 Code Conventions Code Symbol Meaning Notation Code Example Brackets contain arguments ofn 04 O that are optional Braces contain a set of choices d y n dy for a required option The pipe or bar symbol B dynamic static Bstatic separates arguments only one of which may be chosen The colon like the comma is Rdir dir R local libs U a sometimes used to
145. rization as returned by CTZRZF or ZTZRZF Multiplies a general matrix by the unitary transformation matrix reduced to tridiagonal form by CHETRD or ZHETRD itary Matrix in Packed Storage CU ZU CU PGTR or PGTR PMTR or PMTR Generates the unitary transformation matrix from a tridiagonal matrix determined by CHPTRD or ZHPTRD Multiplies a general matrix by the unitary transformation matrix reduced to tridiagonal form by CHPTRD or ZHPTRD 144 Sun Performance Library User s Guide May 2003 BLAS1 Routines TABLE A 2 lists the Sun Performance Library BLAS1 routines No Sun Performance Library BLAS1 routines are currently parallelized TABLE A 2 BLAS1 Basic Linear Algebra Subprograms Level 1 Routines Routine Function SASUM DASUM SCASUM DZASUM XAXPY XCOPY SDOT DDOT DSDOT SDSDOT CDOTU ZDOTU DQDOTA DQDOTI CDOTC ZDOTC SNRM2 DNRM2 SCNRM2 DZNRM2 XROTG XROT CSROT ZDROT SROTMG DROTMG SROTM DROTM ISAMAX DAMAX ICAMAX IZAMAX XSCAL CSSCAL ZDSCAL XSWAP CVMUL ZVMUL Sum of the absolute values of a vector Product of a scalar and vector plus a vector Copy a vector Dot product inner product Dot product conjugating first vector Euclidean norm of a vector Set up Givens plane rotation Apply Given s plane rotation Set up modified Given s plane rotation Apply modified Given s rotation Index of element with maximum absolute valu
146. rval matrices sy_lrscale_i Two sided diagonal scaling sy_midm_i Midpoint matrix of an interval matrix sy_norm_i Symmetric interval matrix norms sy_supm_i Right endpoint of an interval matrix sy_whullm_i Convex hull of two interval matrices sy_widthm_i Elementwise width of an interval matrix sy_winterm_i Intersection of two interval matrices tbmv_i Interval triangular matrix vector product tbsv_i Interval triangular solve with a vector tb_acc_i Matrix accumulation and scale tb_add_i Triangular band matrix add and scale tb_constructm_i Constructs an interval matrix from two floating point matrices tb_copy_i Triangular band interval matrix copy tb_disjm_i If two interval matrices are disjoint tb_emptyelem_i Empty entry and its location tb_encm_i If an interval matrix is enclosed in another tb hullm_i Convex hull of two interval matrices tb_infm_i Left endpoint of an interval matrix tb_interiorm_i If an interval matrix is in interior of another tb_interm_i Intersection of two interval matrices tb_midm_i Midpoint matrix of an interval matrix tb_norm_i Triangular band interval matrix norms tb_supm_i Right endpoint of an interval matrix tb_whullm_i Convex hull of two interval matrices tb_widthm_i Elementwise width of an interval matrix 158 Sun Performance Library User s Guide May 2003 TABLE A 11 Interval BLAS Routines Continued Routine Function tb _winterm_i tpmv_i tpsv_i tp_acc_i tp_add_i tp_cons
147. ry can be done on general arrays In many cases there are routines that will work with the other forms of the arrays For example DGEMM will form the product of two general matrices and DTRMM will form the product of a triangular and a general matrix General Matrices A general matrix is stored so that there is a one to one correspondence between the elements of the matrix and the elements of the array Element A of a matrix A is stored in element A I J of the corresponding array A The general form is the Chapter 4 Working With Matrices 51 most common form A general matrix because it is dense has no special storage scheme In a general banded matrix however the diagonal of the matrix is stored in the row below the upper diagonals For example as shown below the general banded matrix can be represented with banded storage Elements shown with the symbol x are never accessed by routines that process banded arrays aj 42443 0 0 X X 413 da4 435 az 477 473 a4 O X ajz 473 434 445 O 37 433 434 435 411 422 433 444 455 O O a43 a44 445 az a32 A43 as4 X L 0 0 0 a4 s5 General Banded Matrix General Banded Array in Banded Storage Triangular Matrices A triangular matrix is stored so that there is a one to one correspondence between the nonzero elements of the matrix and the elements of the array but the elements of the array corresponding to the zero elements of the matrix are never accessed by routines th
148. s CALL SFFTCM 1 N1 N2 ONE X LDX X LDC TRIGS IFAC SW LW IERR Chapter 5 Using Sun Performance Library Signal Processing Routines 79 CODE EXAMPLE 5 1 Linear Real to Complex FFT and Complex to Real FFT Continued IF IERR NE 0 THEN PRINT ROUTINE RETURN WITH ERROR CODE IERR STOP END IF WRITE in place forward FFT of X CALL PRINT_REAL AS COMPLEX N1 2 1 N2 1 X LDC N2 WRITE Compute out of place inverse linear FFT CALL CFFTSM 1 N1 N2 SCALE Z LDZ X LDX TRIGS IFAC SW LW IERR IF IERR NE 0 THEN PRINT ROUTINE RETURN WITH ERROR CODE IERR STOP END IF WRITE out of place inverse FFT of Z DO I 0 N1 1 WRITE 2 F4 1 2X X I J J 0 N2 1 END DO WRITE Compute in place inverse linear FFT CALL CFFTSM 1 N1 N2 SCALE Z LDZ Z LDZ 2 TRIGS IFAC SW 0 IERR IF IERR NE 0 THEN PRINT ROUTINE RETURN WITH ERROR CODE IERR STOP END IF WRITE in place inverse FFT of Z CALL PRINT_COMPLEX_AS_REAL N1 N2 1 Z LDZ 2 N2 WRITE END PROGRAM TESTSCM SUBROUTINE PRINT_COMPLEX_AS_REAL N1 N2 N3 A LD1 LD2 INTEGER N1 N2 N3 I J K REAL A LD1 LD2
149. s Sun Performance Library does not support the auxiliary routines because auxiliary routines can change or be removed from LAPACK without notice Because the auxiliary routines are not supported they are not documented in the Sun Performance Library User s Guide or the section 3P man pages Many auxiliary routines contain LA as the second and third characters in the routine name however some do not Appendix B of the LAPACK Users Guide contains a list of auxiliary routines Auxiliary routines are available in the shared dynamic libraries and the static libraries However there is no guarantee that auxiliary routines will continue to be available in any form in future versions of the Sun Performance Library Netlib Netlib is an online repository of mathematical software papers and databases maintained by AT amp T Bell Laboratories the University of Tennessee Oak Ridge National Laboratory and professionals from around the world Netlib provides many libraries in addition to the libraries used in Sun Performance Library While some of these libraries can appear similar to libraries used with Sun Performance Library they can be different from and incompatible with Sun Performance Library Using routines from other libraries can produce compatibility problems not only with Sun Performance Library routines but also with the base Netlib LAPACK routines When using routines from other libraries refer to the documentation provi
150. s Book This book is a user s guide intended for programmers who have a working knowledge of the Fortran or C language and some understanding of the base LAPACK and BLAS libraries available from Netlib http www netlib org How This Book Is Organized This book is organized into the following chapters and appendixes Chapter 1 describes the benefits of using the Sun Performance Library and the features of the Sun Performance Library Chapter 2 describes how to use the 95 and C interfaces provided with the Sun Performance Library Chapter 3 shows how to use compiler and linking options to maximize library performance for specific SPARC instruction set architectures and different parallel processing modes Chapter 4 includes information on matrix storage schemes matrix types and sparse matrices Chapter 5 describes the one dimensional two dimensional and three dimensional fast Fourier transform routines provided with the Sun Performance Library Chapter 6 provides an introduction to the Interval Basic Linear Algebra Subroutine IBLAS library provided with the Sun Performance Library Appendix A lists the Sun Performance Library routines organized according to name routine and library What Is Not in This Book This book does not repeat information included in existing LAPACK books or sources on Netlib Refer to the next section Related Documents and Web Sites on page 10 for a list of sources that contain ref
151. s an interval matrix from two floating point matrices General interval matrix copy Diagonal scaling an interval matrix If two interval matrices are disjoint Empty entry and its location If an interval matrix is enclosed in another Convex hull of two interval matrices Left endpoint of an interval matrix If an interval matrix is in interior of another Intersection of two interval matrices Two sided diagonal scaling Midpoint matrix of an interval matrix General interval matrix norms Permute an general interval matrix Right endpoint of an interval matrix Matrix transposition Convex hull of two interval matrices Elementwise width of an interval matrix Intersection of two interval matrices Appendix A Sun Performance Library Routines 155 TABLE A 11 Interval BLAS Routines Continued Routine Function hullv_i Convex hull of an interval vector with another infv_i The left endpoint of an interval vector interiorv_i If an interval vector is in the interior of another interv_i Intersection of an interval vector with another midv_i The approximate midpoint of an interval vector norm_i Interval vector norms permute_i Permute interval vector rscale_i Reciprocal scale of an interval vector sbmv_i Interval symmetric matrix vector product sb_acc_i Symmetric band matrix accumulation and scale sb_add_i Symmetric band matrix add and scale sb_constructm_i Constructs an interval matrix from two floating point
152. s are linked with xparallel xexplicitpar or xautopar each Sun Performance Library call will produce PARALLEL threads The code will oversubscribe the machine if m One bound thread per CPU is created m Each thread makes a Sun Performance Library call m PARALLEL is set to a value greater than one For codes using compiler parallelization Sun Performance Library routines are parallelized with loop based compiler directives Because nested parallelism is not supported Sun Performance Library calls made from a parallel region will not be further parallelized In the following code example none of the calls to DGEMM is parallelized because the loop is parallelized and only one level of parallelization is supported S lt some parallelization directive gt DO I 1 N CALL DGEMM END DO The loop consists of many DGEMM instances running in parallel with one another but each DGEMM instance uses only one thread In the following code example the loop is not parallelized DO I 1 N CALL DGEMM END DO If the code is linked for parallelization with xparallel xexplicitpar or xautopar the individual calls to DGEMM will be parallelized The number of threads used by each DGEMM call will be taken from the run time value of the environment variable PARALLEL However if a higher level loop has already parallelized this region no further parallelization would be performed Chapter
153. section 3P man pages for the individual routines For example to display the man page for the SFFTC routine type man s 3P sfftc Routine names must be lowercase For an overview of the FFT routines type man s 3P fft 73 74 Forward and Inverse FFT Routines TABLE 5 1 lists the names of the FFT routines and their calling sequence Double precision routine names are in square brackets See the individual man pages for detailed information on the data type and size of the arguments TABLE 5 1 FFT Routines and Their Arguments Routine Name Arguments Linear Routines CFFTS ZFFTD OPT SFFTC DFFTZ OPT CFFTSM ZFFTDM OPT N1 N2 SCALE X LDX1 Y WORK LWORK ERR SFFTCM DFFTZM OPT N1 N2 SCALE X LDX1 Y WORK LWORK ERR CFFTIC ZFFTZ OPT CFFTCM ZFFTZM OPT N1 N2 SCALE X LDX1 Y WORK LWORK ERR Two Dimensional Routines CFFTS2 ZFFTD2 OPT WORK SFFTC2 DFFTZ2 OPT WORK CFFTC2 ZFFTZ2 OPT WORK Three Dimensional Routines CFFTS3 ZFFTD3 OPT TRIGS SFFTC3 DFFTZ3 OPT TRIGS CFFTC3 ZFFTZ3 OPT TRIGS N1 N2 SCALE X LDX1 Y LWORK ERR Nl N2 SCALE X LDX1 Y LWORK ERR N1 N2 SCALE X LDX1 Y LWORK ERR N1 N2 N3 SCALE X LDX1 IFAC WORK LWORK ERR N1 N2 N3 SCALE X LDX1 IFAC WORK LWORK ERR N1 N2 N3 SCALE X LDX1 IFAC WORK LWORK ERR N1 SCALE X Y TRIGS IFAC WORK LWORK E
154. separate arguments The ellipsis indicates omission xinline fl fn xinline alpha dos in a series Shell Prompts Shell Prompt C shell machine name C shell superuser machine name Bourne shell and Korn shell S Superuser for Bourne shell and Korn shell 12 Sun Performance Library User s Guide May 2003 Accessing Compiler Collection Tools and Man Pages The compiler collection components and man pages are not installed into the standard usr bin and usr share man directories To access the compilers and tools you must have the compiler collection component directory in your PATH environment variable To access the man pages you must have the compiler collection man page directory in your MANPATH environment variable For more information about the PATH variable see the csh 1 sh 1 and ksh 1 man pages For more information about the MANPATH variable see the man 1 man page For more information about setting your PATH variable and MANPATH variables to access this release see the installation guide or your system administrator Note The information in this section assumes that your Sun ONE Studio Compiler Collection components are installed in the opt directory If your software is not installed in the opt directory ask your system administrator for the equivalent path on your system Accessing the Compilers and Tools Use the steps below to determine whether you need to change your PATH varia
155. sform Calling VCOST twice will result in the original N 1 data points Calling COST twice will result in the original N 1 data points multiplied by 2N An even sequence x with symmetry such that x n x n 1 where n N 1 0 N is said to have quarter wave even symmetry COSQF and COSQB compute the FCT and its inverse respectively of a single real quarter wave even sequence VCOSQF and VCOSQB operate on one or more sequences The results of V COSQB are unormalized and if scaled by m the original sequences are obtained An FCT of a real sequence of length 2N that has quarter wave even symmetry requires N input data points and produces an N point resulting sequence Initialization is required before calling the transform routines by calling V COSQI Sun Performance Library User s Guide May 2003 Fast Sine Transforms Another type of symmetry that is commonly encountered is the odd symmetry where x n x n for n N 1 0 N As in the case of the fast cosine transform the fast sine transform FST takes advantage of the odd symmetry to save memory and computation For a real odd sequence x symmetry implies that x 0 x 0 0 Therefore if x is of length 2N then only N 1 values of x are required to compute the FST Routine SINT computes the FST of a single real odd sequence while VSINT computes the FST of one or more sequences Before calling V SINT V SINTI must be called to compute trigonometric c
156. ssl nrhs rhs ldrhs handle ier if ier ne 0 goto 110 Chapter 4 Working With Matrices 65 CODE EXAMPLE 4 2 Solving a Symmetric System Regular Interface Continued deallocate sparse solver storage c call dgssda handle ier if ier ne 0 goto 110 print values of sol write 6 200 i rhs i expected rhs i error do i 1 neqns write 6 300 i rhs i xexpct i rhs i xexpct 1 enddo stop 110 continue c c call to sparse solver returns an error c write 6 400 amp example FAILED sparse solver error number ier stop 200 format a5 3a20 300 format i5 3d20 12 i sol xexpct values 400 format a60 i20 fail message sparse solver error number end my_system s 95 dalign example_ss f xlic_lib sunperf my_sytem a out rhs i expected rhs i error 0 200000000000D 01 0 200000000000D 01 0 528466159722D 13 0 200000000000D 01 0 200000000000D 01 105249142734D 12 100000000000D 01 0 100000000000D 01 350830475782D 13 0 800000000000D 01 0 800000000000D 01 426325641456D 13 0 500000000000D 00 0 500000000000D 00 660582699652D 14 OPW WN FH oO D OO CO Oo 66 Sun Performance Library User s Guide May 2003 CODE EXAMPLE 4 3 Solving a Structurally Symmetric System With Unsymmetric Values Regular Interface my_system cat example_su f program example_su c c This program is an example driver that calls the sparse solver
157. storage simple driver CHPGVX or Computes selected eigenvalues and eigenvectors of a generalized ZHPGVX Hermitian definite eigenproblem where the coefficient matrices are in packed storage expert driver CHPGVD or Computes all the eigenvalues and eigenvectors of a generalized ZHPGVD Hermitian definite eigenproblem where the coefficient matrices are in packed storage and uses a divide and conquer method to calculate eigenvectors CHPRE S or Improves the computed solution to a system of linear equations when the ZHPRES coefficient matrix is Hermitian indefinite in packed storage CHPSV or Computes the solution to a complex system of linear equations where the ZHPSV coefficient matrix is Hermitian in packed storage simple driver CHP SVX or Uses the diagonal pivoting factorization to compute the solution to a ZHPSVX complex system of linear equations where the coefficient matrix is Hermitian in packed storage expert driver CHPTRD or Reduces a complex Hermitian matrix stored in packed form to real ZHPTRD symmetric tridiagonal form CHPTRF or Computes the factorization of a complex Hermitian indefinite matrix in ZHPTRE packed storage using the diagonal pivoting method CHPTRI or Computes the inverse of a complex Hermitian indefinite matrix in packed ZHPTRI storage using the factorization computed by CHPTRF or ZHPTRF CHPTRS P or Solves a complex Hermitian indefinite matrix in packed storage using ZHPTRS P the factorization co
158. t N2 LDX2 gt N2 three LDY1 2xLDX1 LDY12 dimensional FFT 2x LDX1 LDY2 LDX2 LDY1 is even LDY2 gt N2 90 Sun Performance Library User s Guide May 2003 TABLE 5 4 Single Precision Three Dimensional FFT Routines Continued Name Purpose Size Type of Input Size Type of Output Leading Dimension Requirements In place Out of Place CFEICS OPT 0 initialization OPT 1 N1 x N2 x N3 N1 x N2 x N3 LDX1 gt N1 LDX1 2 N1 complex to Complex Complex LDX2 gt N2 LDX2 gt N2 complex forward LDY1 LDX1 LDY1 gt N1 three 2 gt NO dimensional FFT TDYZSLDXZ DYA Z OPT 1 complex N1 x N2 x N3 N1 x N2 x N3 LDX1 gt N1 LDX1 2 gt N1 to complex Complex Complex LDX2 gt N2 LDX2 gt N2 inyerse three LDY1 LDX1 LDY1 gt N1 dimensional FFT i l CSA LDY2 LDX2 LDY2 gt N2 TABLE 5 4 Notes m LDX1 is first leading dimension of input array m LDX2 is the second leading dimension of the input array m LDY1 is the first leading dimension of the output array m LDY2 is the second leading dimension of the output array m N1 is the first dimension of the FFT problem m N2 is the second dimension of the FFT problem m N3 is the third dimension of the FFT problem a When calling routines with OPT 0 to initialize the routine the only error checking that is done is to determine if N1 N2 N3 lt 0 CODE EXAMPLE 5 4 shows how to compute the three dimensional real to complex FFT and complex to real FFT of a thre
159. t it covers the most common case of factoring a single matrix and solving some number right hand sides Additional calls to dgsss1 are allowed to solve for additional right hand sides as shown in the following example 60 Sun Performance Library User s Guide May 2003 call dgssfs initialization input coefficient matrix structure fill reducing ordering symbolic factorization input coefficient matrix values numeric factorization triangular solve do r 1 number_of_right_hand_sides call dgsssl triangular solve enddo Routine Calling Order To solve problems with the sparse solver package use the sparse solver routines in the order shown in TABLE 4 4 TABLE 4 4 Sparse Solver Routine Calling Order One Call Interface For solving single matrix Start DGSSFS Initialize order factor solve DGSSSL Additional solves optional repeat dgsss1 as needed DGSSDA Deallocate working storage Finish End of One Call Interface Regular Interface For solving multiple matrices with the same structure Start DGSSIN Initialize DGSSOR Order DGSSFA Factor DGSSSL Solve repeat dgssfa or dgsssl as needed DGSSDA Deallocate working storage Finish End of Regular Interface Chapter 4 Working With Matrices 61 Sparse Solver Examples CODE EXAMPLE 4 1 shows solving a symmetric system using the one call interface and CODE EXAMPLE 4 2 shows solving a symmetric sys
160. te P with the permutation vector BLAS_DPERMUTE P Permutes a real double precision array in terms of the permutation vector P output by DSORTV 160 Sun Performance Library User s Guide May 2003 TABLE A 12 Sort Routines Continued Routines Function BLAS_ISORT P Sorts an integer vector X in increasing or decreasing order using quick sort algorithm BLAS_ISORTV P Sorts a real vector X in increasing or decreasing order using quick sort algorithm and overwrite P with the permutation vector BLAS_IPERMUTE P Permutes an integer array in terms of the permutation vector P output by DSORTV BLAS_SSORT P Sorts a real vector X in increasing or decreasing order using quick sort algorithm BLAS_SSORTV P Sorts a real vector X in increasing or decreasing order using quick sort algorithm and overwrite P with the permutation vector BLAS_SPERMUTE P Permutes a real array in terms of the permutation vector P output by DSORTV Appendix A Sun Performance Library Routines 161 162 Sun Performance Library User s Guide May 2003 Index SYMBOLS g2 g3 g4 and g5 global integer registers 34 _64 appending to routine name 28 39 NUMERICS 2D FFT routines complex sequences as input 84 conjugate symmetry 85 data storage format 84 forward 2D FFT 84 inverse 2D FFT 84 real sequences as input 84 routines 74 85 32 bit addressing 38 3D FFT routines complex sequences as input 89 conjugate s
161. teger neqns ier msglvl outunt ldrhs nrhs character mtxtyp 2 pivot l ordmthd 3 double precision handle 150 integer colstr 6 rowind 10 double precision values 10 rhs 5 xexpct 5 integer i Chapter 4 Working With Matrices 69 70 CODE EXAMPLE 4 4 Sun Performance Library User s Guide May 2003 Solving an Unsymmetric System Regular Interface Continued c c Sparse matrix structure and value arrays Unsummetric matrix A Cc Ax b solve for x where G c 1 0 0 0 0 0 0 0 0 0 1 0 1 0 c 2 0 6 0 0 0 0 0 9 0 220 5970 c A 3 0 0 0 Ted 0 0 0 0 x 3 0 b 24 0 c 4 0 0 0 0 0 8 0 0 0 4 0 36 0 G 5 0 0 0 0 0 0 0 10 0 5 0 55 0 e data colstr 1 6 7 8 9 11 data rowand J Ly 25 gt 3p ay Sy 2e Be 4 2p DA data values 1 0d0 2 0d0 3 0d0 4 0d0 5 0d0 6 0d0 7 0d0 amp 8 0d0 9 0d0 10 0d0 data rhs 1 0d0 59 0d0 24 0d0 36 0d0 55 0d0 data xexpct 1 0d0 2 0d0 3 0d0 4 0d0 5 0d0 initialize solver mtxtyp uu pivot n neqns 5 outunt 6 msglvl 3 call dgssin mtxtyp pivot neqns colstr rowind amp outunt msglvl handle ier if ier ne 0 goto 110 ordering and symbolic factorization ordmthd mmd call dgssor ordmthd handle ier if ier ne 0 goto 110 numeric factorization call dgssfa neqns colstr rowind values handle ier if ier ne 0 goto 110 c solution e CODE EXAMPLE 4 4 Solving an Unsymmetric System Regular Interface
162. tem using the regular interface CODE EXAMPLE 4 1 Solving a Symmetric System One Call Interface my_system S cat example_lcall f program example_licall Cc c This program is an example driver that calls the sparse solver G It factors and solves a symmetric system by calling the c one call interface Cc implicit none integer neqns ier msglvl outunt ldrhs nrhs character mtxtyp 2 pivot l ordmthd 3 double precision handle 150 integer colstr 6 rowind 9 double precision values 9 rhs 5 xexpct 5 integer an c c Sparse matrix structure and value arrays From George and Liu c page 3 G Ax b solve for x where c c 4 0 1 0 2 0 0 5 20 230 7 0 c 1 0 0 5 0 0 0 0 0 0 2 0 3 0 c A 2 0 0 0 3 0 0 0 0 0 x 1 0 b 7 0 c 0 5 0 0 0 0 0 625 0 0 8 0 4 0 c 20 0 0 0 0 0 0 16 0 50 25 4 0 c data colstr 1 6 7 8 9 10 data rowind 1 2 3 4 5 2 3 4 5 data values 4 0d0 1 0d0 2 0d0 0 5d0 2 0d0 0 5d0 3 0d0 amp 0 625d0 16 0d0 data rhs 7 000 3 0d0 7 0d0 4 0d0 4 0d0 data xexpct 2 0d0 2 0d0 1 0d0 8 0d0 0 5d0 set calling parameters 62 Sun Performance Library User s Guide May 2003 CODE EXAMPLE 4 1 Solving a Symmetric System One Call Interface Continued G mtxtyp s pivot n neqns 5 nrhs 1 ldrhs outunt toil a U msglvl ordmthd mmd call single call interface call dgssfs mtxtyp pivot neqns colstr rowind values nrhs rhs ldrhs
163. that compute three dimensional FFT In this case the FFT is computed along all three dimensions of a three dimensional array The forward FFT computes N3 1 N2 1 NI 1 2nihm 2niln 2nijk N3 N2 N1 Xikanm LY Lhe e eM h 0 l 0 j 0 k 0 N1 1 n 0 N2 1 m 0 N3 1 and the inverse FFT computes N3 1 N2 1 NI 1 2Qnihm 2niln 2nijk x j l h E Xk nm e T E m 0 n 0 k 0 j 0 N1 1 1 0 N2 1 h 0 N3 1 In the complex to complex transform if the input problem is N1 x N2 x N3 a three dimensional transform will yield a complex array that is also N1 x N2 x N3 When computing a real to complex three dimensional transform if the real input array is of dimensions N1 x N2 x N3 the result will be a complex array of dimensions 4 1 x N2 x N3 Conversely when computing a complex to real FFT of dimensions N1 x N2 x N3 an 414 1 x N2xN3 complex array is required as input As with the real to complex and complex to real linear FFT because of conjugate symmetry only the first 1 complex data points need to be stored along the first dimension The complex subarray x 1 N1 1 can be obtained from xo 4 as follows Chapter 5 Using Sun Performance Library Signal Processing Routines 89 X k n m X N1 k n m N1 k 4 1 N1 1 5 N n 0 N2 1 m 0 N3 1 To compute a three dimensional transform an FFT routine must be called twice Once to initialize and once m
164. the IBLAS man pages and the BLAST standard In general actual argument shape inconsistencies cause IBLAS routines to return the largest impossible value of 1 for integer indices a default NaN for REAL values and the interval R 00 for computed intervals The normal BLAS error handling mechanism is also used to communicate actual parameter errors Chapter 6 Interval BLAS Routines 123 124 Binding Format Each interface is summarized as a SUBROUTINE or FUNCTION statement in which all the required and optional arguments appear Optional arguments are grouped in square brackets after the required arguments Binding format is illustrated with the Scaled Vector Sum Update AXPBY_I routine SUBROUTINE axpby_i x y alpha beta YPE INTERVAL lt wp gt INTENT IN x YPE INTERVAL lt wp gt INTENT INOUT y YPE INTERVAL lt wp gt INTENT IN OPTIONAL alpha beta Because generic interfaces are used the working precision denoted lt wp gt is implicitly defined by the following actual arguments lt wp gt KIND 4 KIND 8 KIND 16 Variables in IBLAS routines are INTEGER REAL or TYPE INTERVAL See the IBLAS man pages or the IBLAS white pager for individual routine bindings Language Bindings This section is a brief overview of the IBLAS Fortran routine names and their function With t
165. tine Function CHEEVX or ZHEEVX CHEGST or ZHEGST CHEGV or ZHEGV CHEGVD or ZHEGVD CHEGVX or ZHEGVX CHERF S or ZHERF S CHESV or ZHESV CHESVX or ZHESVX CHETRD or ZHETRD CHETRF or ZHERTF CHETRI or ZHETRI CHETRS P or ZHETRS P CHPCON or ZHPCON CHPEV or ZHPEV Q T las EVX or ZHPEVX CHPEVD or ZHPEVD Computes selected eigenvalues and eigenvectors of a Hermitian matrix expert driver Reduces a Hermitian definite generalized eigenproblem to standard form using the factorization computed by CPOTRF or ZPOTRF Replacement with newer version CHEGVD or ZHEGVD suggested Computes all the eigenvalues and eigenvectors of a complex generalized Hermitian definite eigenproblem Computes all the eigenvalues and eigenvectors of a complex generalized Hermitian definite eigenproblem and uses a divide and conquer method to calculate eigenvectors Computes selected eigenvalues and eigenvectors of a complex generalized Hermitian definite eigenproblem Improves the computed solution to a system of linear equations when the coefficient matrix is Hermitian indefinite Solves a complex Hermitian indefinite system of linear equations simple driver Solves a complex Hermitian indefinite system of linear equations simple driver Reduces a Hermitian matrix to real symmetric tridiagonal form by using a unitary similarity transformation Computes the factorization of a complex H
166. tines 89 FFT routines 77 data types arguments 110 degree of parallelism 43 DFT efficiency of FFT versus DFT 73 diagonal matrix 130 discrete Fourier transform See DFT documentation index 15 documentation accessing 15 to 16 DOSERIAL directive 44 164 Sun Performance Library User s Guide May 2003 E empty interval 121 enable trap 6 24 environment variable OMP_NUM_THREADS 44 PARALLEL 43 46 STACKSIZE 42 SUNW_MP_THR_IDLE 45 even sequences fast cosine transform routines 97 F 95 interfaces calling conventions 26 fast cosine transform routines 98 even sequences 97 forward and inverse 99 forward transform multiple quarter wave even sequences 101 forward transform quarter wave even sequence 100 inverse transform multiple quarter wave even sequences 101 inverse transform quarter wave even sequence 100 multiple sequences 100 quarter wave even sequences 97 fast Fourier transform See FFT fast sine transform routines 99 forward and inverse 101 forward and inverse multiple sequences 102 forward transform multiple quarter wave odd sequences 103 forward transform quarter wave odd sequence 102 inverse transform multiple quarter wave odd sequences 103 inverse transform quarter wave odd sequence 102 odd sequences 97 quarter wave odd sequences 97 FFT efficiency of FFT versus DFT 73 FFT routines 2D FFT routines 74 3D FFT routines 74 argum
167. tines described in ref BLAST Standard IBLAS routine names have the same prefixes as the BLAS routines Routines with prefixes identify the matrix type TABLE 6 1 lists the IBLAS prefixes and matrix types TABLE 6 1 IBLAS Prefixes and Matrix Types Prefix Matrix Type GE General GB General Banded SY Symmetric SB Symmetric Banded SP Symmetric Packed Sun Performance Library User s Guide May 2003 TABLE 6 1 IBLAS Prefixes and Matrix Types Continued Prefix Matrix Type GE General TR Triangular TB Triangular Banded TP Triangular Packed As in the BLAS sparse or complex interval matrices are not treated A number of interval specific set and utility IBLAS routines are given new BLAS style names See TABLE 6 9 through TABLE 6 12 Fortran Interface The IBLAS Fortran bindings are implemented in a module Its interface block defines the default interval data type to be TYPE INTERVAL Interval BLAS routines are consistent with regard to Generic interfaces Precision Rank Assumed shape arrays Derived types Operator arguments Error handling is described in the Basic Linear Algebra Subprogram Technical BLAST Forum Standard and in the IBLAS white paper Numeric error handling is not required because exceptions are not possible in the closed interval system implemented in the compiler collection 95 compiler Argument inconsistency errors are handled as described in IBLAS white paper
168. tor X to a series of vectors stored in the columns of Y with the result placed into the columns of Z In that case INCX 1 INC1Y 1 INC2Y 2 NY INC1Z 1 INC2Z 2 NZ Another common form is applying a filter vector X to a series of vectors stored in the rows of Y and store the result in the row of Z in which case INCX 1 INC1LY NY INC2Y 1 INC1Z gt NZ and INC2Z 1 Sun Performance Library User s Guide May 2003 Convolution can be used to compute the products of polynomials CODE EXAMPLE 5 11 uses SCNVCOR to compute the product of 1 2x 3x and 4 5x 6x2 CODE EXAMPLE 5 11 One Dimensional Convolution Using Fourier Transform Method and REAL Data my_system cat con_ex21 f PROGRAM TEST INTEGER WORK NX NY NZ PARAMETER NX 3 PARAMETER NY NX PARAMETER NZ 2 NY 1 PARAMETER LWORK 4 NZ 32 REAL X NX Y NY Z NZ WORK LWORK C DATA X 1 2 3 Y 4 5 6 WORK LWORK 0 C PRINT 1000 X PRINT 1010 X PRINT 1000 Y PRINT 1010 Y CALL SCNVCOR V T NX X 1 1 SNY Oro ip PX Ly ee Ty NZ 1 Z 1 1 1 WORK LWORK PRINT 1020 2 PRINT 1010 Z 1000 FORMAT 1X Input vector Al 1010 FORMAT 1X 300F5 0 1020 FORMAT 1X Output vector A1 END my_system s 95 dalign con_ex21 f xlic_lib sunperf my_system a out Input vector X Ls 2i 3 Input ve
169. tric definite eigenproblem Computes all the eigenvalues and eigenvectors of a generalized symmetric definite eigenproblem and uses a divide and conquer method to calculate eigenvectors Improves the computed solution to a system of linear equations when the coefficient matrix is symmetric indefinite Solves a real symmetric indefinite system of linear equations simple driver Appendix A Sun Performance Library Routines 141 TABLE A 1 Routine LAPACK Linear Algebra Package Routines Continued Function XSYSVX SSYTRD or DSYTRD XSYTRE XSYTRI XSYTRS P Solves a real symmetric indefinite system of linear equations expert driver Reduces a symmetric matrix to real symmetric tridiagonal form by using a orthogonal similarity transformation Computes the factorization of a real symmetric indefinite matrix using the diagonal pivoting method Computes the inverse of a symmetric indefinite matrix using the factorization computed by xSYTRF Solves a system of linear equations by the symmetric matrix using the factorization computed by xSYTRF Triangular Band Matrix xTBCON XTBRES XTBTRS P Estimates the reciprocal condition number of a triangular band matrix Determines error bounds and estimates for solving a triangular banded system of linear equations Solves a triangular banded system of linear equations Triangular Matrix Generalized Problem Pair of Triangular Matrices XT
170. tric weight and factor tables or compute DFFT2B the two dimensional inverse FFT of a two dimensional double complex array ZFFTD3 P DFFT31 Initialize the trigonometric weight and factor tables or compute DFFT3B the three dimensional inverse FFT of three dimensional double complex array ZFFTDM VDFFTI Initialize the trigonometric weight and factor tables or compute VDFFTB P the one dimensional inverse FFT of a set of data sequences stored in a two dimensional double complex array ZFFTZ P ZFFTI Initialize the trigonometric weight and factor tables or compute ZFFTF P the one dimensional forward or inverse FFT of a double ZFFTB P complex sequence Appendix A Sun Performance Library Routines 151 TABLE A 7 FFT Routines Continued Routine Replaces Function ZFFTZ2 P ZFFT2I Initialize the trigonometric weight and factor tables or compute ZFFT2F P the two dimensional forward or inverse FFT of a two ZFFT2B P dimensional double complex array ZFFTZ3 P ZFFT3I Initialize the trigonometric weight and factor tables or compute ZFFT3F P the three dimensional forward or inverse FFT of three ZFFT3B P dimensional double complex array ZFFTZM P VZFFTI Initialize the trigonometric weight and factor tables or compute VZFFTF P the one dimensional forward or inverse FFT of a set of data VZFFTB P sequences stored in a two dimensional double complex array Fast Cosine and Sine Transforms Sun Performance
171. tructm_i tp_copy_i tp_disjm_i tp_emptyelem_i tp_encm_i tp_hullm_i tp_infm_i tp_interiorm_i tp_interm_i tp_midm_i tp_norm_i tp_supm_i tp_whullm_i tp_widthm_i tp_winterm_i trmm_i trmv_i trsm_i trsv_i tr acei tr _add_i tr_constructm_i tr_copy_i tr_disjm_i tr_emptyelem_i Intersection of two interval matrices Interval triangular matrix vector product Interval triangular solve with a vector Matrix accumulation and scale Triangular packed matrix add and scale Constructs an interval matrix from two floating point matrices Triangular packed interval matrix copy If two interval matrices are disjoint Empty entry and its location If an interval matrix is enclosed in another Convex hull of two interval matrices Left endpoint of an interval matrix If an interval matrix is in interior of another Intersection of two interval matrices Midpoint matrix of an interval matrix Triangular packed interval matrix norms Right endpoint of an interval matrix Convex hull of two interval matrices Elementwise width of an interval matrix Intersection of two interval matrices Triangular interval matrix matrix product Interval triangular matrix vector product Interval triangular solve Interval triangular solve with a vector Matrix accumulation and scale Triangular matrix add and scale Constructs an interval matrix from two floating point matrices Triangular interval matrix copy If two interval matrices are
172. type independence compile time checking and optional arguments m Consistent API across the different libraries in Sun Performance Library a Compatibility with LAPACK 1 x LAPACK 2 0 and LAPACK 3 0 libraries m Increased performance and in some cases greater accuracy m Optimizations for specific SPARC instruction set architectures Support for 64 bit enabled Solaris operating environment m Support for parallel processing compiler options Support for multiple processor hardware options Mathematical Routines The Sun Performance Library routines are used to solve the following types of linear algebra and numerical problems Chapter 1 Introduction 21 a Elementary vector and matrix operations Vector and matrix products plane rotations 1 2 and infinity norms rank 1 2 k and 2k updates m Linear systems Solve full rank systems compute error bounds solve Sylvester equations refine a computed solution equilibrate a coefficient matrix m Least squares Full rank generalized linear regression rank deficient linear equality constrained m Eigenproblems Eigenvalues generalized eigenvalues eigenvectors generalized eigenvectors Schur vectors generalized Schur vectors a Matrix factorizations or decompositions SVD generalized SVD QL and LQ OR and RQ Cholesky LU Schur LDLT and UDUT m Support operations Condition number in place or out of place transpose inverse determinant inertia m S
173. uarter wav ven sequences write results are normalized do j 1 m write a3 1i1 a4 4 5 3 2x al seq j z j i i 1 len end do end Chapter 5 Using Sun Performance Library Signal Processing Routines 105 CODE EXAMPLE 5 7 Compute the FCT and the Inverse FCT of Two Real Quarter wave Even Sequences Continued my_system s 95 dalign vcosq f xlic_lib sunperf my_system a out Input sequences of length 4 seql 0 557 0 352 0 990 0 539 seq2 0 603 0 867 0 417 0 156 Forward fast cosine transform for quarter wav ven sequences seql 0 755 392 029 0 224 seq2 0 729 0 097 091 132 Inverse fast cosine transform for quarter wav ven sequences results are normalized seql 0 557 0 352 0 990 0 539 seq2 0 603 0 867 0 417 0 156 Fast Sine Transform Examples In CODE EXAMPLE 5 8 SINT is called to compute the FST and the inverse transform of a real odd sequence If the real sequence is of length 2N only N 1 input data points need to be stored and the number of resulting data points is also N 1 The results are stored in the input array CODE EXAMPLE 5 8 Compute FST and the Inverse FST of a Real Odd Sequence my_system s cat sint f program sint implicit none integer parameter len 4 real x 0 len 2 work 3 len 1 15 z 0O len 2 scale integer 1 call RANDOM_NUMBER x 0 len 2 z O len 2 x 0 len 2 scale 1 0 2
174. umber of a triangular matrix Computes right and or left eigenvectors of an upper triangular matrix Reorders Schur factorization of matrix using an orthogonal or unitary similarity transformation Determines error bounds and estimates for triangular system of a linear equations Reorders Schur factorization of matrix to group selected cluster of eigenvalues in the leading positions on the diagonal of the upper triangular matrix T and the leading columns of Q form an orthonormal basis of the corresponding right invariant subspace Estimates the reciprocal condition numbers of selected eigenvalues and eigenvectors of an upper quasi triangular matrix Solves Sylvester matrix equation Computes the inverse of a triangular matrix Solves a triangular system of linear equations Trapezoidal Matrix xTZROF XTZRZE Depreciated routine replaced by routine xTZRZF Reduces a rectangular upper trapezoidal matrix to upper triangular form by means of orthogonal transformations Unitary Matrix CUNGBR or ZUNGBR CUNGHR or ZUNGHR CUNGLO or ZUNGLO CUNGOQL or ZUNGOL CUNGOR or ZUNGOR CUNGRO or ZUNGRQ CUNGTR or ZUNGTR Generates the unitary transformation matrices from reduction to bidiagonal form as determined by CGEBRD or ZGEBRD Generates the orthogonal transformation matrix reduced to Hessenberg form as determined by CGEHRD or ZGEHRD Generates a unitary matrix Q from an LQ factorization as returned
175. ute to the output m The signal and or response function can begin with one or more zeros that are not explicitly stored Sun Performance Library Convolution and Correlation Routines Sun Performance Library contains the convolution routines shown in TABLE 5 6 TABLE 5 6 Convolution and Correlation Routines Routine Arguments Function SCNVCOR CNVCOR FOUR NX X IFX Convolution or correlation of a filter DCNVCOR INCX NY NPRE M Y IFY with one or more vectors CCNVCOR INC1Y INC2Y NZ K Z ZCNVCOR IFZ INC1Z INC2Z WORK LWORK SCNVCOR2 CNVCOR METHOD TRANSX Two dimensional convolution or DCNVCOR2 SCRATCHX TRANSY correlation of two matrices CCNVCOR2 SCRATCHY MX NX X LDX ZCNVCOR2 MY NY MPRE NPRE Y LDY MZ NZ Z LDZ WORKIN LWORK SWIENER POINTS ACOR XCOR Wiener deconvolution of two signals DWIENER FLTR EROP ISW IERR The S D C Z CNVCOR routines are used to compute the convolution or correlation of a filter with one or more input vectors The S D C Z CNVCOR2 routines are used to compute the two dimensional convolution or correlation of two matrices Sun Performance Library User s Guide May 2003 Arguments for Convolution and Correlation Routines The one dimensional convolution and correlation routines use the arguments shown in TABLE 5 7 TABLE 5 7 Arguments for One Dimensional Convolution and Correlation Routines SCNVCOR DCNVCOR CCNVCOR and ZCNVCOR Argument Definition CNVCOR
176. utes SVD of general rectangular matrix Computes an LU factorization of a general rectangular matrix using partial pivoting with row interchanges Computes inverse of a general matrix using the factorization computed by xGETRFE Solves a general system of linear equations using the factorization computed by xGETRF General Matrix Generalized Problem Pair of General Matrices XGGBAK XGGBAL XGGES XGGESX XGGEV XGGEVX XGGGLM XGGHRD XGGLSE XGGQRF XGGROF XGGSVD XGGSVP Forms the right or left eigenvectors of a generalized eigenvalue problem based on the output by xGGBAL Balances a pair of general matrices for the generalized eigenvalue problem Computes the generalized eigenvalues Schur form and left and or right Schur vectors for two nonsymmetric matrices Computes the generalized eigenvalues Schur form and left and or right Schur vectors Computes the generalized eigenvalues and the left and or right generalized eigenvalues for two nonsymmetric matrices Computes the generalized eigenvalues and the left and or right generalized eigenvectors Solves the GLM Generalized Linear Regression Model using the GQR Generalized QR factorization Reduces two matrices to generalized upper Hessenberg form using orthogonal transformations Solves the LSE Constrained Linear Least Squares Problem using the GRQ Generalized RQ factorization Computes generalized OR factorization of two matric
177. xX write a27 i1 Input sequences of length len do j 1 m write a3 i1 a4 4 5 3 2x al seq j x j 1 i 1 len end do call vsingi len work call vsingf m len z xt ld work write Forward fast sine transform for quarter wave odd sequences do j 1 m write a3 i1 a4 4 5 3 2x al segt Jri S 4i rayi enyi end do Chapter 5 Using Sun Performance Library Signal Processing Routines 107 CODE EXAMPLE 5 2 Compute FST and Inverse FST of Two Real Quarter Wave Odd Sequences Continued call vsingb m len z xt ld work write Inverse fast sine transform for quarter wave odd sequences write results are normalized do j 1 m write a3 i1 a4 4 f 5 3 2x al seq j 2 j 1 i 1 len end do end my_system s 95 vsing f xlic_lib sunperf my_system a out Input sequences of length 4 seql seq2 0 557 0 352 0 990 0 539 0 603 0 867 0 417 0 156 Forward fast sine transform for quarter wave odd sequences seql seq2 0 823 0 057 0 078 0 305 0 654 0 466 069 037 Inverse fast sine transform for quarter wave odd sequences results are normalized seql seq2 0 557 0 352 0 990 0 539 0 603 0 867 0 417 0 156 Convolution and Correlation Two applications of the FFT that are frequently encountered especially in the signal processing area are the discrete convolution and disc
178. ymmetry 89 data storage format 89 forward 3D FFT 89 inverse 3D FFT 89 real sequences as input 89 routines 74 90 64 bit addressing 39 64 bit code C 41 Fortran 95 40 See also 64 bit enabled Solaris operating environment 64 bit enabled Solaris operating environment appending _ 64 to routine names 39 compiling code 39 integer promotion 40 64 bit integer arguments 27 promoting integers to 64 bits 39 40 64 bit integer interfaces calling 39 A accessible documentation 16 argument data types summary 110 arguments convolution and correlation 111 FFT routines 74 automatic code restructuring tools 26 B banded matrix 49 bidiagonal matrix 130 BLAS1 19 145 BLAS2 19 146 BLAS3 19 147 Cc C 64 bit code 41 array storage 34 163 routine calling conventions 33 C interfaces advantages 33 compared to Fortran interfaces 33 routine calling conventions 33 calling 64 bit integer interfaces 39 calling conventions C 33 77 95 26 CLAPACK 21 compatibility LAPACK 20 22 compiler parallelization 45 compilers accessing 13 compile time checking 27 compressed sparse column CSC format 55 conjugate symmetric 76 conjugate symmetry 2D FFT routines 85 3D FFT routines 89 FFT routines 76 convolution 108 convolution and correlation arguments 111 routines 110 correlation 109 cosine transforms 96 D dalign 23 38 data storage format 2D FFT routines 84 3D FFT rou
179. zed eliminating the need for type dependent prefixes S D C or Z In the FORTRAN 77 routines the type must be specified as part of the routine name For example DGEMM is a double precision matrix multiply and SGEMM is a single precision matrix multiply When calling GEMM with the Fortran 95 interfaces Fortran will infer the type from the arguments that are passed Passing single precision arguments to GEMM gets results that are equivalent to specifying SGEMM and passing double precision arguments gets results that are equivalent to DGEMM For example CALL DSCAL 20 5 26D0 X 1 could be changed to CALL SCAL 20 5 26D0 X 1 m Compile Time Checking In FORTRAN 77 it is generally impossible for the compiler to determine what arguments should be passed to a particular routine In Fortran 95 the USE SUNPERF statement allows the compiler to determine the number type size and shape of each argument to each Sun Performance Library routine It can check the calls against the expected value and display errors during compilation m Optional Arguments Sun Performance Library supports interfaces where some arguments are optional In FORTRAN 77 all arguments must be specified in the order determined by the interface for all routines All interfaces will support 95 style OPTIONAL attributes on arguments that are not required Using routines with optional arguments such as GEMM are useful for new development Spec

Sun Performance Library User`s Guide

Contents

Download Pdf Manuals

Related Search

Related Contents