Home

Forte/Sun Performance Library 6 Collection User`s Guide

1. As an example of a program that uses Sun Performance Library routines from user managed threads consider a real time signal processing application running on a 4 processor server with one processor dedicated to acquiring the data two processors dedicated to performing FFTs on the data and one processor dedicated to postprocessing the data after the FFTs It begins by creating multiple running instances of the function that performs the FFT for i 0 i lt NCPUS_FOR_FFT i who i i do_fft i 0 fft_done_buff_available i 1 void thr_create void 0 size_t 0 fft_func void amp who i J long 0 thread_t 0 Chapter 2 Using Sun Performance Library 27 28 The code below is a simplified implementation of part of fft_func started by thr_create in the loop above Note that production code should check the return value from thr_create above and should use semaphores rather than busy waits at the synchronization points in the code below cpu_id who_am_i while 1 while do_fft cpu_id rfftf n amp dataset 0 cpu_id amp scratch 0 cpu_id while fft_done_buff_available cpu_id fft_done_buff_available cpu_id 0 scopy n amp dataset 0 cpu_id 1 amp fft_done_buff 0 cpu_id 1 do_fft cpu_id 0 Sun Performance Library User s Guide May 2000 CHAPTER 3 SPARC Optimization and Parallel Processing This chapter describes how to use compiler
2. ier stop 200 format a5 3a20 300 format i5 3d20 12 i sol xexpct values 400 format a60 i20 fail message sparse solver error number end my_system S 95 dalign example_lcall f xlic_lib sunperf my_sytem a out i rhs i expected rhs i error 1 0 200000000000D 01 0 200000000000D 01 0 528466159722D 13 2 0 200000000000D 01 0 200000000000D 01 105249142734D 12 3 0 100000000000D 01 0 100000000000D 01 350830475782D 13 4 0 800000000000D 01 0 800000000000D 01 426325641456D 13 5 0 500000000000D 00 0 500000000000D 00 660582699652D 14 Oo OO CO oO Sun Performance Library User s Guide May 2000 CODE EXAMPLE 4 2 Solving a Symmetric System Regular Interface my_system S cat example_ss f program example_ss c c This program is an example driver that calls the sparse solver c It factors and solves a symmetric system implicit none integer neqns ier msglvl outunt ldrhs nrhs character mtxtyp 2 pivot l ordmthd 3 double precision handle 150 integer colstr 6 rowind 9 double precision values 9 rhs 5 xexpct 5 integer alt c c Sparse matrix structure and value arrays From George and Liu c page 3 cS Ax b solve for x where c c 4 0 1 0 2 20 0 5 2 0 2 0 7 0 1 0 0 5 0 0 0 0 0 0 2 0 3 0 c A 2 0 0 0 340 0 0 0 0 x 1 0 b 7 0 035 0 0 0 0 0 625 0 0 8 0 4 0 c 2 0 0 0 0 0 0 0 16 0 0 5 4 0 data colstr 1 6 7 8 9 10 data rowind Ty 27 3
3. Display the current value of the PATH variable by typing echo PATH Review the output for a string of paths containing opt SUNWspro bin If you find the paths your PATH variable is already set to access Sun WorkShop development tools If you do not find the paths set your PATH environment variable by following the instructions in this section To determine if you need to set your MANPATH environment variable Preface 3 1 Request the workshop man page by typing man workshop 2 Review the output if any If the workshop 1 man page cannot be found or if the man page displayed is not for the current version of the software installed follow the instructions in this section for setting your MANPATH environment variable Note The information in this section assumes that your Sun WorkShop 6 products were installed in the opt directory Contact your system administrator if your Sun WorkShop software is not installed in opt The PATH and MANPATH variables should be set in your home cshrc file if you are using the C shell or in your home profile file if you are using the Bourne or Korn shells m To use Sun WorkShop commands add the following to your PATH variable opt SUNWspro bin m To access Sun WorkShop man pages with the man command add the following to your MANPATH variable opt SUNWspro man For more information about the PATH variable see the csh 1 sh 1 and ksh 1 man pages For mo
4. on page 33 Enabling Trap 6 If an application cannot be compiled using dalign enable trap 6 to provide a handler for misaligned data To enable trap 6 on SPARC do the following 1 Place this assembly code in a file called trap6_handler s global trap6_handler_ text align 4 trap6_handler_ retl ta 6 2 Assemble trap6_handler s my_system fbe trap6_handler s The first parallelizable subroutine invoked from Sun Performance Library will call a routine named trap6_handler_ If a trap6_handler_ is not specified Sun Performance Library will call a default handler that does nothing Not supplying a handler for any misaligned data will cause a trap that will be fatal fbe 1 is the Sun WorkShop assembler for SPARC platforms 3 Include trap6_handler o on the command line my_system 95 any f trap6_handler o xlic_lib sunperf Chapter1 Introduction 15 16 Sun Performance Library User s Guide May 2000 CHAPTER 2 Using Sun Performance Library This chapter describes using the Sun Performance Library to improve the execution speed of applications written in either FORTRAN 77 Fortran 95 or C Although some modifications to applications might be required to gain peak performance many applications can benefit significantly from using Sun Performance Library without making source code changes or recompiling Improving Application Performance Use Sun Performance Library in the follo
5. and C compilers Who Should Use This Book This is a reference manual intended for programmers who have a working knowledge of the Fortran or C language and some understanding of the base LAPACK BLAS FFTPACK VFFTPACK and LINPACK libraries available from Netlib http www netlib org What Is in This Book This book is organized into the following chapters and appendixes Chapter 1 Introduction describes the benefits of using the Sun Performance Library and the features of the Sun Performance Library Chapter 2 Using Sun Performance Library describes how to use the 77 95 and C interfaces provided with the Sun Performance Library Chapter 3 SPARC Optimization and Parallel Processing shows how to use compiler and linking options to maximize library performance for specific SPARC instruction set architectures and different parallel processing modes Chapter 4 Working With Matrices includes information on matrix storage schemes matrix types and sparse matrices Appendix A Sun Performance Library Routines lists the Sun Performance Library routines organized according to name routine and library What Is Not in This Book This book does not repeat information included in existing LAPACK and LINPACK books or sources on Netlib Refer to the section Related Documents and Web Sites on page 4 for a list of sources that contain reference material for the base routines upon which Su
6. org sparse ttp math nist gov spblas LINPACK http www netlib org linpack 6 Related Sun WorkShop 6 Documentation You can access documentation related to the subject matter of this book in the following ways a Through the Internet at the docs sun com Web site You can search for a specific book title or you can browse by subject document collection or product at the following Web site http docs sun com Through the installed Sun WorkShop products on your local system or network Sun WorkShop 6 HTML documents manuals online help man pages component readme files and release notes are available with your installed Sun WorkShop 6 products To access the HTML documentation do one of the following a In any Sun WorkShop or Sun WorkShop TeamWare window choose Help gt About Documentation a In your Netscape Communicator 4 0 or compatible version browser open the following file opt SUNWspro docs index html Sun Performance Library User s Guide May 2000 Contact your system administrator if your Sun WorkShop software is not installed in the opt directory Your browser displays an index of Sun WorkShop 6 HTML documents To open a document in the index click the document s title TABLE P 3 lists related Sun WorkShop 6 manuals by document collection TABLE P 3 Related Sun WorkShop 6 Documentation by Document Collection Document Collection Document Title Forte De
7. refer to the man pages Optional arguments are enclosed in square brackets For example the SAXPY routine is defined as follows in the man page SUBROUTINE SAXPY N ALPHA X INCX Y INCY REAL ALPHA INTEGER INCX INCY N REAL X Y Note that the arguments N INCX and INCY are optional Suppose the user tries to call the SAXPY routine with the following ar UNPERE EX ALPHA X 100 ER INCX RALPHA Y 100 INCY XA 100 100 guments If mismatches in the type shape or number of arguments occur the compiler would issue the following error message ERROR No specific match can be found for the generic Call AXPY I subprogram Using the arguments defined above the following examples show incorrect calls to the SAXPY routine due type shape or number mismatches m Incorrect type of the arguments If SAXPY is called as follows CALL AXPY 100 ALPHA X INCX Y INCY A compiler error occurs because the variable ALPHA is type COMPLI interface describes it as being type REAL Sun Performance Library User s Guide May 2000 EX but the m Incorrect shape of the arguments If SAXPY is called as follows CALL AXPY N RALPHA XA INCX Y INCY A compiler error occurs because the XA argument is two dimensional but the interface is expecting a one dimensional argument m Incorrect number of arguments If SA
8. symmetric definite eigenproblem where the coefficient matrices are in packed storage simple driver Computes selected eigenvalues and eigenvectors of a real generalized symmetric definite eigenproblem where the coefficient matrices are in packed storage expert driver Improves the computed solution to a system of linear equations when the coefficient matrix is symmetric indefinite in packed storage Computes the solution to a system of linear equations where the coefficient matrix is a symmetric matrix in packed storage simple driver Uses the diagonal pivoting factorization to compute the solution to a system of linear equations where the coefficient matrix is a symmetric matrix in packed storage expert driver Reduces a real symmetric matrix stored in packed form to real symmetric tridiagonal form using an orthogonal similarity transform Computes the factorization of a symmetric packed matrix using the Bunch Kaufman diagonal pivoting method Computes the inverse of a symmetric indefinite matrix in packed storage using the factorization computed by xSPTRF Solves a system of linear equations by the symmetric matrix stored in packed format using the factorization computed by xSPTRF Real Symmetric Tridiagonal Matrix SSTEBZ or DSTEBZ XSTEDC XSTEGR XSTEIN XSTEQR SSTERF or DSTERF SSTEV or DSTEV Computes the eigenvalues of a real symmetric tridiagonal matrix Computes all the eigenvalues and eige
9. 6 handler as described in Enabling Trap 6 on page 15 When using C do not use misalign Starting Threads When Sun Performance Library starts threads in shared mode it uses a stack size that it determines as follows 1 Checks the value of the STACKSIZE environment variable and interpret the units as kbytes 1024 bytes 2 Computes the maximum stack size required by Sun Performance Library 3 Uses the largest of the values determined in steps 1 and 2 for the size of the stack in the created thread When Sun Performance Library starts threads in dedicated mode use the STACKSIZE environment variable to specify a stack size of at least 4 MB setenv STACKSIZE 4000 Sun Performance Library User s Guide May 2000 Parallel Processing Examples The following sections demonstrate using the PARALLEL environment variable and the compile and linking options for creating code that supports using m A single processor m Multiple processors in shared mode m Multiple processors in dedicated mode Using a Single Processor To use a single processor 1 Call one or more of the routines 2 Set PARALLEL equal to 1 3 Link with xlic_lib sunperf specified at the end of the command line Do not compile or link with parallel explicitpar or autopar For example compile and link with libsunperf so default cc dalign xarch any c xlic_lib sunperf or 77 dalign xarch any f xlic_lib sunper
10. A lj 2 Vy 254 32 25 ae 7 data values 1 0d0 2 0d0 3 0d0 4 0d0 5 0d0 6 0d0 7 0d0 amp 8 0d0 data rhs 7 0d0 38 0d0 18 0d0 42 0d0 data xexpct 1 0d0 2 0d0 3 0d0 4 0d0 initialize solver mtxtyp su pivot n neqns 4 outunt 6 msglvl 0 58 Sun Performance Library User s Guide May 2000 CODE EXAMPLE 4 3 Solving a Structurally Symmetric System With Unsymmetric Values Regular Interface Continued call regular interface c call dgssin mtxtyp pivot neqns colstr rowind amp outunt msglvl handle ier if ier ne 0 goto 110 c ordering and symbolic factorization ordmthd mmd call dgssor ordmthd handle ier if ier ne 0 goto 110 G numeric factorization c call dgssfa neqns colstr rowind values handle ier if ier ne 0 goto 110 solution nrhs 1 ldrhs 4 call dgsssl nrhs rhs ldrhs handle ier if ier ne 0 goto 110 deallocate sparse solver storage c call dgssda handle ier if ier ne 0 goto 110 print values of sol write 6 200 i rhs i expected rhs i error do i 1 negns write 6 300 i rhs i xexpct i rhs i xexpct 1 enddo stop 110 continue Chapter 4 Working With Matrices 59 CODE EXAMPLE 4 3 Solving a Structurally Symmetric System With Unsymmetric Values Regular Interface Continued c c call to sparse solver returns an error c write 6 400 am
11. Fortran 95 interfaces and the Netlib LAPACK version 3 0 interfaces If using LAPACK 90 refer to the documentation provided with that library For the base libraries supported by Sun Performance Library Netlib provides detailed information that can supplement this user s guide The LAPACK 3 0 Users Guide describes LAPACK algorithms and how to use the routines However these documents do not describe the Sun specific extensions made to the base routines Sun Performance Library User s Guide May 2000 Sun Performance Library Features Sun Performance Library provides the following optimizations and extensions to the base Netlib libraries Extensions that support Fortran 95 and C language interfaces Fortran 95 language features including type independence compile time checking and optional arguments Consistent API across the different libraries in Sun Performance Library Compatibility with LAPACK 1 x LAPACK 2 0 and LAPACK 3 0 libraries Increased performance and in some cases greater accuracy Optimizations for specific SPARC instruction set architectures Support for 64 bit code on UltraSPARC Support for parallel processing compiler options Support for multiple processor hardware options Mathematical Routines The Sun Performance Library routines are used to solve the following types of linear algebra and numerical problems Elementary vector and matrix operations Vector and matrix products plane ro
12. SGER DGER Rank 1 update to a general matrix CGERC ZGERC CGERU ZGERU CHBMV ZHBMV Product of a Hermitian matrix in banded storage and a vector CHEMV ZHEMV Product of a Hermitian matrix and a vector CHER ZHER Rank 1 update to a Hermitian matrix CHER2 ZHER2 Rank 2 update to a Hermitian matrix CHPMV ZHPMV Product of a Hermitian matrix in packed storage and a vector Appendix A Sun Performance Library Routines 81 82 TABLE A 3 Routine CHPR CHPR2 SSBMV XSPMV SSPR SSPR2 SSYMV SSYR SSYR2 XTBMV xTBSV XTPMV XTPSV XTRMV XTRSV BLAS2 Basic Linear Algebra Subprograms Level 2 Routines Continued ZHP ZHP DSB DSP DSP DSY DSY DSY R2 R2 R2 Function Rank 1 update to a Hermitian matrix in packed storage Rank 2 update to a Hermitian matrix in packed storage Product of a symmetric matrix in banded storage and a vector Product of a Symmetric matrix in packed storage and a vector Rank 1 update to a real symmetric matrix in packed storage Rank 2 update to a real symmetric matrix in packed storage Product of a symmetric matrix and a vector Rank 1 update to a real symmetric matrix Rank 2 update to a real symmetric matrix Product of a triangular matrix in banded storage and a vector Solution to a triangular system in banded storage of linear equations Product of a triangular matrix in packed storage and a vector Solution to a triangular system of lin
13. SSP wH m lt lt ia EVX Or EVX EVD or DSP EVD SSPGST or DSPGST SSPGVD or DSPGVD Estimates the reciprocal of the condition number of a symmetric packed matrix using the factorization computed by xSPTRF Replacement with newer version SSPEVD or DSPEVD suggested Computes all the eigenvalues and eigenvectors of a symmetric matrix in packed storage simple driver Computes selected eigenvalues and eigenvectors of a symmetric matrix in packed storage expert driver Computes all the eigenvalues and eigenvectors of a symmetric matrix in packed storage and uses a divide and conquer method to calculate eigenvectors Reduces a real symmetric definite generalized eigenproblem to standard form where the coefficient matrices are in packed storage and uses the factorization computed by SPPTRF or DPPTRE Computes all the eigenvalues and eigenvectors of a real generalized symmetric definite eigenproblem where the coefficient matrices are in packed storage and uses a divide and conquer method to calculate eigenvectors Appendix A Sun Performance Library Routines 75 76 TABLE A 1 Routine LAPACK Linear Algebra Package Routines Continued Function SSPGV or DSPGV SSPGVX or DSPGVX XSPRFS XSPSV XSPSVX SSPTRD or DSPTRD XSPTRE XSPTRI XSPTRS Replacement with newer version SSPGVD or DSPGVD suggested Computes all the eigenvalues and eigenvectors of a real generalized
14. Sets user specified ordering permutation DGSSRP Returns permutation used by solver DGSSCO Returns condition number estimate of coefficient matrix DGSSDA De allocates sparse solver DGSSPS Prints solver statistics Use the regular interface to solve multiple matrices with the same structure but different numerical values as shown below call dgssin initialization input coefficient matrix structure call dgssor fill reducing ordering symbolic factorization do m 1 number_of_structurally_identical_matrices call dgssfa input coefficient matrix values numeric factorization do r 1 number_of_right_hand_sides call dgsssl triangular solve enddo enddo 50 Sun Performance Library User s Guide May 2000 The one call interface is not as flexible as the regular interface but it covers the most common case of factoring a single matrix and solving some number right hand sides Additional calls to dgsss1 are allowed to solve for additional right hand sides as shown below call dgssfs initialization input coefficient matrix structure fill reducing ordering symbolic factorization input coefficient matrix values numeric factorization triangular solve do r 1 number_of_right_hand_sides call dgsssl triangular solve enddo Routine Calling Order To solve problems with the sparse solver package use the sparse solver routines in
15. accuracy of floating point computations Standard Library 2 Standard C Class Library Reference Provides details on the Standard C Library Preface 9 TABLE P 3 Related Sun WorkShop 6 Documentation by Document Collection Continued Document Collection Document Title Description Standard C Library Describes how to use the User s Guide Standard C Library Tools h 7 Tools h Class Library Provides details on the Reference Tools h class library Tools h User s Guide Discusses use of the C classes for enhancing the efficiency of your programs TABLE P 4 describes related Solaris documentation available through the docs sun com Web site TABLE P 4 Related Solaris Documentation Document Collection Document Title Description Solaris Software Developer Linker and Libraries Guide Describes the operations of the Solaris link editor and runtime linker and the objects on which they operate Programming Utilities Guide Provides information for developers about the special built in programming tools that are available in the Solaris operating environment 10 Sun Performance Library User s Guide May 2000 CHAPTER 1 Introduction Sun Performance Library is a set of optimized high speed mathematical subroutines for solving linear algebra and other numerically intensive problems Sun Performance Library is based on a collection of public domain applications available from Netlib at
16. corresponding routines that take advantage of these special storage forms For example DGBMV will form the product of a general matrix in banded storage and a vector and DTPMV will form the product of a triangular matrix in packed storage and a vector 39 40 Banded Storage A banded matrix is stored so the jth column of the matrix corresponds to the jth column of the Fortran array The following code copies a banded general matrix in a general array into banded storage mode Cc Copy the matrix A from the array AG to the array AB The C matrix is stored in general storage mode in AG and it will be stored in banded storage mode in AB The code to copy C from general to banded storage mode is taken from the Cc comment block in the original DGBFA by Cleve Moler C NSUB 1 NSUPER 2 NDIAG NSUB 1 NSUPER DO ICOL 1 N Il MAXO 1 ICOL NSUPER I2 MINO N ICOL NSUB DO IROW Il I2 IROWB IROW ICOL NDIAG AB IROWB ICOL AG IROW ICOL END DO END DO Note that this method of storing banded matrices is compatible with the storage method used by LAPACK BLAS and LINPACK but is inconsistent with the method used by EISPACK Packed Storage A packed vector is an alternate representation for a triangular symmetric or Hermitian matrix An array is packed into a vector by storing the elements sequentially column by column into the vector Space for the diagonal elements is alwa
17. definite matrix XPOFA Cholesky factorization of a symmetric positive definite matrix xPOSL Solution to a linear system in a Cholesky factored symmetric positive definite matrix XPPCO Cholesky factorization and condition number of a symmetric positive definite matrix in packed storage XPPDI Determinant and inverse of a Cholesky factored symmetric positive definite matrix in packed storage XPPFA Cholesky factorization of a symmetric positive definite matrix in packed storage xPPSL Solution to a linear system in a Cholesky factored symmetric positive definite matrix in packed storage xPTSL Solution to a linear system in a symmetric positive definite tridiagonal matrix xQRDC OR factorization of a general matrix xORSL Solution to a linear system in a QR factored general matrix xSICO UDU factorization and condition number of a symmetric matrix xSIDI Determinant inertia and inverse of a UDU factored symmetric matrix XSIFA UDU factorization of a symmetric matrix xSISL Solution to a linear system in a UDU factored symmetric matrix xSPCO UDU factorization and condition number of a symmetric matrix in packed storage xSPDI Determinant inertia and inverse of a UDU factored symmetric matrix in packed storage XSPFA UDU factorization of a symmetric matrix in packed storage Appendix A Sun Performance Library Routines 89 90 TABLE A 9 Routine LINPACK Routines Continued Function XSPSL XSVDC xTRCO XTRDI XTRSL So
18. from a QR factorization as returned by CGEQRF or ZGEQRF Generates a unitary matrix Q from an RQ factorization as returned by CGEROF or ZGEROF Generates a unitary matrix reduced to tridiagonal form by CHETRD or ZHETRD Appendix A Sun Performance Library Routines 79 80 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Routine Function CUNMBR or Multiplies a general matrix with the unitary transformation matrix reduced ZUNMBR to bidiagonal form as determined by CGEBRD or ZGEBRD CUNMHR or Multiplies a general matrix by the unitary matrix reduced to Hessenberg ZUNMHR form by CGEHRD or ZGEHRD CUNMLQ or Multiplies a general matrix by the unitary matrix from an LQ factorization ZUNMLOQ as returned by CGELOF or ZGELOF CUNMQL or Multiplies a general matrix by the unitary matrix from a QL factorization ZUNMQL as returned by CGEQLF or ZGEQLF CUNMOR or Multiplies a general matrix by the unitary matrix from a QR factorization ZUNMOR as returned by CGEQRF or ZGEQRF CUNMRQ or Multiplies a general matrix by the unitary matrix from an RQ factorization ZUNMRQ as returned by CGEROF or ZGEROF CUNMRZ or Multiplies a general matrix by the unitary matrix from an RZ factorization ZUNMRZ as returned by CTZRZF or ZTZRZF CUNMTR or Multiplies a general matrix by the unitary transformation matrix reduced ZUNMTR to tridiagonal form by CHETRD or ZHETRD Unitary Matrix in Packed Storage CUPGTR or Ge
19. http www netlib org Sun has enhanced these public domain applications and bundled them as the Sun Performance Library The Sun Performance Library User s Guide explains the Sun specific enhancements to the base applications available from Netlib Reference material describing the base routines is available from Netlib and the Society for Industrial and Applied Mathematics SIAM Libraries Included With Sun Performance Library Sun Performance Library contains enhanced versions of the following standard libraries a LAPACK version 3 0 For solving linear algebra problems m BLAS1 Basic Linear Algebra Subprograms For performing vector vector operations a BLAS2 For performing matrix vector operations m BLAS3 For performing matrix matrix operations a FFTPACK version 4 For performing the fast Fourier transform a VFFTPACK version 2 1 A vectorized version of FFTPACK for performing the fast Fourier transform m LINPACK For solving linear algebra problems in legacy applications containing routines that have not been upgraded to LAPACK 3 0 12 Note LAPACK version 3 0 supersedes LINPACK EISPACK and all previous versions of LAPACK Use LAPACK for new development and LINPACK to support legacy applications Sun Performance Library is available in both static and dynamic library versions optimized for the V8 V8 and V9 architectures Sun Performance Library supports static and shared libraries on S
20. including the USE SUNPERF statement in the program The USE SUNPERF statement enables the following features m Type Independence In the FORTRAN 77 routines the type must be specified as part of the name DGEMM is a double precision matrix multiply and SGEMM is single precision With the Fortran 95 interfaces when calling GEMM Fortran will infer the type from the arguments that are passed Passing single precision arguments to GEMM gets results that are equivalent to specifying SGEMM passing double precision arguments gets results that are equivalent to DGEMM and so on For example CALL DSCAL 20 5 26D0 X 1 could be changed to CALL SCAL 20 5 26D0 X 1 m Compile Time Checking In FORTRAN 77 it is generally impossible for the compiler to determine what arguments should be passed to a particular routine In Fortran 95 the USE SUNPERF statement allows the compiler to determine the Chapter 2 Using Sun Performance Library 19 20 number type size and shape of each argument to each Sun Performance Library routine It can check the calls against the expected value and display errors during compilation Optional 95 Interfaces In FORTRAN 77 all arguments must be specified in the order determined by the interface for all routines All interfaces will support 95 style OPTIONAL attributes on arguments that are not required To determine the optional arguments for a routine
21. the same high quality products you have come to expect from Sun the only thing that has changed is the name We believe that the Forte name blends the traditional quality and focus of Sun s core programming tools with the multi platform business application deployment focus of the Forte tools such as Forte Fusion and Forte for Java The new Forte organization delivers a complete array of tools for end to end application development and deployment For users of the Sun WorkShop tools the following is a simple mapping of the old product names in WorkShop 5 0 to the new names in Forte Developer 6 Old Product Name New Product Name Sun Visual WorkShop C Forte C Enterprise Edition 6 Sun Visual WorkShop C Personal Forte C Personal Edition 6 Edition Sun Performance WorkShop Fortran Forte for High Performance Computing 6 Sun Performance WorkShop Fortran Forte Fortran Desktop Edition 6 Personal Edition Sun WorkShop Professional C Forte C 6 Sun WorkShop University Edition Forte Developer University Edition 6 In addition to the name changes there have been major changes to two of the products m Forte for High Performance Computing contains all the tools formerly found in Sun Performance WorkShop Fortran and now includes the C compiler so High Performance Computing users need to purchase only one product for all their development needs a Forte Fortran Desk
22. 00 format a5 3a20 300 format i5 3d20 12 i sol xexpct values 400 format a60 i20 fail message sparse solver error number end my_system s 95 dalign example_ss f xlic_lib sunperf my_sytem a out rhs i expected rhs i error 0 200000000000D 01 0 200000000000D 01 0 528466159722D 13 0 200000000000D 01 0 200000000000D 01 105249142734D 12 100000000000D 01 0 100000000000D 01 350830475782D 13 0 800000000000D 01 0 800000000000D 01 426325641456D 13 0 500000000000D 00 0 500000000000D 00 660582699652D 14 OF WN FH oO Oo OO CO oO Chapter 4 Working With Matrices 57 CODE EXAMPLE 4 3 Solving a Structurally Symmetric System With Unsymmetric Values Regular Interface my_system cat example_su f program example_su c c This program is an example driver that calls the sparse solver c It factors and solves a structurally symmetric system c w unsymmetric values c implicit none integer neqns ier msglvl outunt ldrhs nrhs character mtxtyp 2 pivot 1 ordmthd 3 double precision handle 150 integer colstr 5 rowind 8 double precision values 8 rhs 4 xexpct 4 integer i c c Sparse matrix structure and value arrays Coefficient matrix G has a symmetric structure and unsymmetric values Ax b solve for x where c G 1 0 3 0 0 0 0 0 10 7 0 c 2 0 4 0 0 0 7 0 2 0 38 0 c A 0 0 0 0 6 0 0 0 x 3 0 b 18 0 G 00 530 0 0 8 0 4 0 42 0 c data colstr 1 3 6 7 9 data rowind
23. 3 52 Sun Performance Library User s Guide May 2000 CODE EXAMPLE 4 1 Solving a Symmetric System One Call Interface Continued G Ax b solve for x where c 4 0 1 0 2 0 05 2 0 2 0 7 0 c 1 0 01 5 0 0 0 0 0 0 2 0 3 0 c A 2 0 0 0 3 40 0 0 0 0 x 1 0 b 7 0 035 0 0 0 0 0 625 0 0 8 0 4 0 c 2 0 0 0 0 0 0 0 16 0 0 5 4 0 c data colstr 1 6 7 8 9 10 data rowind 1 2 3 4 5 2 3 4 5 data values 4 0d0 1 0d0 2 0d0 0 5d0 2 0d0 0 5d0 3 0d0 amp 0 625d0 16 0d0 data rhs 7 000 3 0d0 7 0d0 4 0d0 4 0d0 data xexpct 2 0d0 2 0d0 1 0d0 8 0d0 0 5d0 set calling parameters mtxtyp pivot neqns nrhs ldrhs outunt ll COAOomOrFuo Ss a msglvl ordmthd mmd call single call interface call dgssfs mtxtyp pivot neqns colstr rowind values nrhs rhs ldrhs ordmthd outunt msglvl handle ier if ier ne 0 goto 110 deallocate sparse solver storage call dgssda handle ier if ier ne 0 goto 110 print values of sol write 6 200 i rhs i expected rhs i error Chapter 4 Working With Matrices 53 54 CODE EXAMPLE 4 1 Solving a Symmetric System One Call Interface Continued do i 1 neqns write 6 300 i rhs i xexpct i rhs i xexpct 1 enddo stop 110 continue c c call to sparse solver returns an error c write 6 400 amp example FAILED sparse solver error number
24. 5 5 7 data values 1 0d0 2 0d0 3 0d0 4 0d0 5 0d0 6 0d0 7 0d0 amp 8 0d0 9 0d0 10 0d0 data rhs 1 0d0 59 0d0 24 0d0 36 0d0 55 0d0 data xexpct 1 0d0 2 0d0 3 0d0 4 0d0 5 0d0 c initialize solver mtxtyp uu pivot n neqns 5 outunt 6 msglvl 3 call dgssin mtxtyp pivot neqns colstr rowind amp outunt msglvl handle ier if ier ne 0 goto 110 Chapter 4 Working With Matrices 61 62 CODE EXAMPLE 4 4 Sun Performance Library User s Guide May 2000 Solving an Unsymmetric System Regular Interface Continued c ordering and symbolic factorization c ordmthd mmd call dgssor ordmthd handle ier if ier ne 0 goto 110 c numeric factorization c call dgssfa neqns colstr rowind values handle ier if ier ne 0 goto 110 solution nrhs 1 ldrhs 5 call dgsssl nrhs rhs ldrhs handle ier if ier ne 0 goto 110 c deallocate sparse solver storage G call dgssda handle ier if ier ne 0 goto 110 print values of sol write 6 200 i rhs i expected rhs i error do i 1 neqns write 6 300 i rhs i xexpct i rhs i xexpct i enddo stop 110 continue CODE EXAMPLE 4 4 Cc c call to sparse solver returns an error Cc write 6 example stop 200 format a5 3a20 400 FAILI 300 format 15 3d20 12 400 format a60 1i20 end fail message ED sparse sol
25. 6 0 0 0 0 0 5 0 0 0 8 0 To represent A in CSC format colptr 1 3 6 7 9 a rowind 1 2 1 2 4 3 2 4 m values 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 Sun Performance Library User s Guide May 2000 Unsymmetric Sparse Matrices An unsymmetric sparse matrix does not have a i j a j i for all i and j The structure of the matrix does not have an apparent pattern When solving an unsymmetric system the entire matrix must be passed to the solver routines An example of an unsymmetric matrix is shown below 10 0 0 0 0 0 0 0 0 2 0 6 0 0 0 0 0 9 0 A 130 00 7 0 0 0 0 0 40 0 0 0 0 80 0 0 5 0 0 0 0 0 0 0 10 0 To represent A in CSC format colptr 1 6 7 8 9 11 m rowind 1 2 3 4 5 2 3 4 2 5 m values 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 Sun Performance Library Sparse BLAS The Sun Performance Library sparse BLAS package is based on the following two packages a Netlib Sparse BLAS package by Dodson Grimes and Lewis consists of sparse extensions to the Basic Linear Algebra Subroutines that operate on sparse vectors a NIST National Institute of Standards and Technology Fortran Sparse BLAS Library consists of routines that perform matrix products and solution of triangular systems for sparse matrices in a variety of storage formats Refer to the following sources for additional sparse BLAS information a For information on the Sun Performance Library Sparse BLAS routines refer to the secti
26. ACK calls to LAPACK calls Several vendors market automatic code restructuring tools that replace existing code with Sun Performance Library code For example a source to source conversion tool can replace existing BLAS code structures with calls to the BLAS in Sun Performance Library These tools can also recognize many user written matrix multiplications and replace them with calls to the matrix multiplication subroutine in Sun Performance Library Sun Performance Library User s Guide May 2000 Fortran 77 95 Interfaces The Sun Performance Library routines can be called from within a FORTRAN 77 Fortran 95 or a C program However C programs must still use the FORTRAN 77 calling sequence Sun Performance Library 77 95 interfaces use the following conventions m All arguments are passed by reference a The number of arguments to a routine is fixed m Types of arguments must match m Arrays are stored columnwise m Indices are based at one in keeping with standard Fortran practice When calling Sun Performance Library routines a Do not prototype the subroutines with the Fortran 95 INTERFACE statement Use the USE SUNPERF statement instead m Do not use ext_names plain to compile routines that call routines from Sun Performance Library Using Fortran 95 Features This release supports Fortran 95 language features To use the Sun Performance Library Fortran 95 modules and definitions
27. ACK versions m Reduces linking errors due to changes in subroutine names or argument lists m Ensures results are consistent with results generated with previous LAPACK versions a Minimizes programs terminating due to differences between argument lists With Sun Performance Library users can safely use programs intended for the original LAPACK 1 x or 2 0 At the same time developers can gradually upgrade the portions of their applications that use LAPACK 3 0 14 Getting Started With Sun Performance Library This section shows the most basic compiler options used to compile an application using the Sun Performance Library To use the Sun Performance Library type one of the following commands my_system s 95 dalign my_file f xlic_lib sunperf or my_system cc dalign my_file c xlic_lib sunperf Sun Performance Library User s Guide May 2000 The routines in the Sun Performance Library are compiled with dalign For best performance compiling applications with dalign is suggested If there are cases when dalign cannot be used enable trap 6 which allows misaligned data as described in the following section Additional compiler options exist that optimize application performance for m Specific SPARC instruction set architectures as described in Compiling for SPARC Platforms on page 30 m Shared or dedicated parallel processing models as described in Optimizing for Parallel Processing
28. Banded Storage Symmetric Matrices A symmetric matrix is similar to a triangular matrix in that the data in either the upper or lower triangle corresponds to the elements of the array The contents of the other elements in the array are assumed and those array elements are never accessed by routines that process symmetric or Hermitian arrays A symmetric matrix can be stored using packed storage aii 412 443 711 41 4n 493 21 431 437 433 31 an 439 433 Symmetric Matrix Symmetric Array in Packed Storage Chapter 4 Working With Matrices 43 A symmetric banded matrix can be stored using banded storage as shown below Elements shown with the symbol x are never accessed by routines that process banded arrays 41 4 0 0 X ajz a23 A354 Ay an a3 0 411 472 433 444 0 a33 433 434 x1 439 443 X 0 O a43 a44 Symmetric Banded Matrix Symmetric Banded Array in Banded Storage Tridiagonal Matrices A tridiagonal matrix has elements only on the main diagonal the first superdiagonal and the first subdiagonal It is stored using three 1 dimensional arrays 41 4 0 0 a 0 az a12 47 972 973 479 0 439 493 439 433 434 P 433 P 43 34 0 0 ay ay a44 Tridiagonal Matrix Tridiagonal Array in Tridiagonal Storage 44 Sparse Matrices The Sun Performance Library sparse solver package is a collection of routines that efficiently factor and solve sparse linear systems of equations Use the sparse solver package to m Solve symmetric
29. CSM or Block compressed sparse column format triangular solve DBSCS SBSRMM or Block compressed sparse row format matrix matrix multiply DBSRM SBSRSM or Block compressed sparse row format triangular solve DBSRS SCOOMM or Coordinate format matrix matrix multiply DCOOM SCSCMM or Compressed sparse column format matrix matrix multiply DCSCM SCSCSM or Compressed sparse column format triangular solve DCSCS Appendix A Sun Performance Library Routines 83 84 TABLE A 5 Routines Sparse BLAS Routines Continued Function SCSRMM or DCSRM SCSRSM or DCSRS SDIAMM or DDIAM SDIASM or DDIAS SDOTI DDOTI CDOTUI or ZDOTUI CDOTCI or ZDOTCI SELLMM or DELLM SELLSM or DELLS xXCGTHR XCGTHRZ SJAD or DJAD SJADRP or DJADRP SJADSM or DJADSM SROTI or DROTI xXCSCTR SSKYMM or DSKYMM Compressed sparse row format matrix matrix multiply Compressed sparse row format triangular solve Diagonal format matrix matrix multiply Diagonal format triangular solve Computes the dot product of a sparse vector and a full vector Computes the conjugate dot product of a sparse vector and a full vector Ellpack format matrix matrix multiply Ellpack format triangular solve Given a full vector creates a sparse vector and corresponding index vector Given a full vector creates a sparse vector and corresponding index vector and zeros the full vector Jagged diagonal matr
30. DSOR is available as four routines that operate with the following data types SBDSOR Single data type BBDSQR Double data type CBDSQR Complex data type ZBDSQR Double complex data type If a routine name is not available for S B C and z the x prefix will not be used and each routine name will be listed 65 66 LAPACK Routines TABLE A 1 LAPACK Linear Algebra Package Routines Routine Function Bidiagonal Matrix SBDSDC or Computes the singular value decomposition SVD of a bidirectional DBDSDC matrix using a divide and conquer method XBDSQR Computes SVD of real upper or lower bidiagonal matrix using the bidirectional QR algorithm Diagonal Matrix SDISNA or Computes the reciprocal condition numbers for eigenvectors of real DDISNA symmetric or complex Hermitian matrix General Band Matrix xGBBRD Reduces real or complex general band matrix to upper bidiagonal form xGBCON Estimates the reciprocal of the condition number of general band matrix using LU factorization xGBEQU Computes row and column scalings to equilibrate a general band matrix and reduce its condition number XGBRE S Refines solution to general banded system of linear equations xGBSV Solves a general banded system of linear equations simple driver XGBSVX Solves a general banded system of linear equations expert driver XGBTRF LU factorization of a general band matrix using partial pivoting with row interchanges xGBT
31. ETRI XGETRS Computes an LU factorization of a general rectangular matrix using partial pivoting with row interchanges Computes inverse of a general matrix using the factorization computed by XGETRF Solves a general system of linear equations using the factorization computed by xGETRF General Matrix Generalized Problem Pair of General Matrices XGGBAK XGGBAL XGGES XGGESX XGGEV XGGEVX XGGGLM XGGHRD XGGLSE XGGQRF XGGROF XGGSVD XGGSVP Forms the right or left eigenvectors of a generalized eigenvalue problem based on the output by xGGBAL Balances a pair of general matrices for the generalized eigenvalue problem Computes the generalized eigenvalues Schur form and left and or right Schur vectors for two nonsymmetric matrices Computes the generalized eigenvalues Schur form and left and or right Schur vectors Computes the generalized eigenvalues and the left and or right generalized eigenvalues for two nonsymmetric matrices Computes the generalized eigenvalues and the left and or right generalized eigenvectors Solves the GLM Generalized Linear Regression Model using the GQR Generalized QR factorization Reduces two matrices to generalized upper Hessenberg form using orthogonal transformations Solves the LSE Constrained Linear Least Squares Problem using the GRQ Generalized RQ factorization Computes generalized QR factorization of two matrices Computes generalized RQ fac
32. Hessenberg form as determined by SGEHRD or DGEHRD Generates an orthogonal matrix Q from an LQ factorization as returned by SGELQF or DGELOF Generates an orthogonal matrix Q from a QL factorization as returned by SGEQOLF or DGEQLF Generates an orthogonal matrix Q from a QR factorization as returned by SGEORF or DGEQRF Generates orthogonal matrix Q from an RQ factorization as returned by SGERQF or DGEROF Generates an orthogonal matrix reduced to tridiagonal form by SSYTRD or DSYTRD Multiplies a general matrix with the orthogonal matrix reduced to bidiagonal form as determined by SGEBRD or DGEBRD Multiplies a general matrix by the orthogonal matrix reduced to Hessenberg form by SGEHRD or DGEHRD Multiplies a general matrix by the orthogonal matrix from an LQ factorization as returned by SGELOF or DGELOF Multiplies a general matrix by the orthogonal matrix from a QL factorization as returned by SGEQLF or DGEQLF Multiplies a general matrix by the orthogonal matrix from a QR factorization as returned by SGEQRF or DGEQRF Multiplies a general matrix by the orthogonal matrix returned by STZRZF or DTZRZF Multiplies a general matrix by the orthogonal matrix from an RQ factorization returned by SGEROF or DGEROF Multiplies a general matrix by the orthogonal matrix from an RZ factorization as returned by STZRZF or DTZRZF Sun Performance Library User s Guide May 2000 TABLE A 1 LAPACK Linear Alg
33. LL DGBCON NORM N NSUB NSUPER DA LDA IPIVOT DANORM DRCOND DWORK IWORK2 INFO void dgbcon char norm int n int nsub int nsuper double da int lda int ipivot double danorm double drcond int info Note that the names of the arguments are the same and that arguments with the same name have the same base type Scalar arguments that are used only as input values such as NORM and N are passed by value in the C version Arrays and scalars that will be used to return values are passed by reference The Sun Performance Library C interfaces improve on CLAPACK available on Netlib which is an 2c translation of the standard libraries For example all of the CLAPACK routines are followed by a trailing underscore to maintain compatibility with Fortran compilers which often postfix routine names in the object o file with an underscore The Sun Performance Library C interfaces do not require a trailing underscore Sun Performance Library C interfaces use the following conventions a Input only scalars are passed by value rather than by reference which gives added safety and allows constants to be passed without creating a separate variable to hold their value Complex and double complex arguments are not considered scalars because they are not implemented as a scalar type by C m Complex scalars can be passed as either structures or arrays of length 2 a Arguments relating to workspace are not used in Sun Performance Libr
34. RS Solves a general banded system of linear equations using the factorization computed by xGBTRF General Matrix Unsymmetric or Rectangular XxGEBAK Forms the right or left eigenvectors of a general matrix by backward transformation on the computed eigenvectors of the balanced matrix output by xGEBAL xGEBAL Balances a general matrix XGEBRD Reduces a general matrix to upper or lower bidiagonal form by an orthogonal transformation xGECON Estimates the reciprocal of the condition number of a general matrix using the factorization computed by xGETRF XGEEQU Computes row and column scalings intended to equilibrate a general rectangular matrix and reduce its condition number Sun Performance Library User s Guide May 2000 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Routine Function XGEES Computes the eigenvalues and Schur factorization of a general matrix simple driver XGEESX Computes the eigenvalues and Schur factorization of a general matrix expert driver XGEEV Computes the eigenvalues and left and right eigenvectors of a general matrix simple driver XGEEVX Computes the eigenvalues and left and right eigenvectors of a general matrix expert driver XGEGS Depreciated routine replaced by xGGES xGEGV Depreciated routine replaced by xGGEV XGEHRD Reduces a general matrix to upper Hessenberg form by an orthogonal similarity transformation XGELQF Computes LQ factorization of a gene
35. S amp SUN microsystems Sun Performance Library User s Guide Sun WorkShop 6 FORTRAN 77 Fortran 95 and C Sun Microsystems Inc 901 San Antonio Road Palo Alto CA 94303 U S A 650 960 1300 Part No 806 3566 10 May 2000 Revision A Send comments about this document to docfeedback sun com Copyright 2000 Sun Microsystems Inc 901 San Antonio Road Palo Alto CA 94303 4900 USA All rights reserved This product or document is distributed under licenses restricting its use copying distribution and decompilation No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors if any Third party software including font technology is copyrighted and licensed from Sun suppliers Parts of the product may be derived from Berkeley BSD systems licensed from the University of California UNIX is a registered trademark in the U S and other countries exclusively licensed through X Open Company Ltd For Netscape Netscape Navigator and the Netscape Communications Corporation logo the following notice applies Copyright 1995 Netscape Communications Corporation All rights reserved Sun Sun Microsystems the Sun logo docs sun com AnswerBookz2 Solaris SunOS JavaScript SunExpress Sun WorkShop Sun WorkShop Professional Sun Performance Library Sun Performance WorkShop Sun Visual WorkShop and Forte are trademarks registered trademarks o
36. T31I DFFT3I1 CFFT31I DFFT3I j Two dimensional Fourier synthesis r Two dimensional Fourier transform Initialize two dimensional Fourier transform or synthesis Three dimensional Fourier synthesis A Three dimensional Fourier transform F Initialize three dimensional Fourier transform or synthesis Other Routines TABLE A 8 Other Routines Routines Function XCNVCOR Computes convolution or correlation XCNVCOR2 Computes two dimensional convolution or correlation XTRANS Transposes array SWIENER or Performs Wiener deconvolution of two signals DWEINER Appendix A Sun Performance Library Routines 87 88 LINPACK Routines TABLE A 9 Routine LINPACK Routines Function XCHDC XCHDD XCHEX xXCHUD XGBCO XGBDI XGBFA XGBSL XGECO XGEDI XGEFA XGESL xGTSL CHIDI or ZHIDI CHIFA or ZHIFA CHPCO or ZHPCO CHPDI or ZHPDI CHPFA or ZHPFA XPBCO Cholesky decomposition of a symmetric positive definite matrix Downdate an augmented Cholesky decomposition Update an augmented Cholesky decomposition with permutations Update an augmented Cholesky decomposition LU Factorization and condition number of a general matrix in banded storage Determinant of an LU factored general matrix in banded storage LU factorization of a general matrix in banded storage Solution to a linear system in an LU factored matrix in banded storage LU factorization and condition numb
37. WorkShop 6 Compilers Fortran 77 95 Fortran Library Reference Fortran Programming Guide Fortran User s Guide FORTRAN 77 Language Reference Interval Arithmetic Programming Reference Provides details about the library routines supplied with the Fortran compiler Discusses issues relating to input output libraries program analysis debugging and performance Provides information on command line options and how to use the compilers Provides a complete language reference Describes the intrinsic INTERVAL data type supported by the Fortran 95 compiler Forte TeamWare 6 Sun WorkShop TeamWare 6 Sun WorkShop TeamWare 6 User s Guide Describes how to use the Sun WorkShop TeamWare code management tools Forte Developer 6 Sun WorkShop Visual 6 Sun WorkShop Visual User s Guide Describes how to use Visual to create C and Java graphical user interfaces Forte Sun Performance Library 6 Sun Performance Library Reference Sun Performance Library User s Guide Discusses the optimized library of subroutines and functions used to perform computational linear algebra and fast Fourier transforms Describes how to use the Sun specific features of the Sun Performance Library which is a collection of subroutines and functions used to solve linear algebra problems Numerical Computation Guide Numerical Computation Guide Describes issues regarding the numerical
38. XPY is called as follows CALL AXPY RALPHA X INCX Y m A compiler error occurs because the compiler cannot find a routine in the AXPY interface group that takes four parameters of the form AXPY REAL REAL 1 D ARRAY INTEGER REAL 1 D ARRAY In the last example the 95 keyword parameter passing capability can allow a user to make essentially the same call using that capability CALL AXPY ALPHA RALPHA X X INCX INCX Y Y This is a valid call to the AXPY interface It is necessary to use keyword parameter passing on any parameter that appears in the list after the first OPTIONAL parameter is omitted The following calls to the AXPY interface are valid CALL AXPY N RA LPHA X Y Y INCY INCY CALL AXPY N RALPHA X INCX Y CALL AXPY N RALPHA X Y Y CALL AXPY ALPHA RALPHA X X Y Y Chapter 2 Using Sun Performance Library 21 Fortran Examples Getting peak performance from Sun Performance Library for single processor applications is a matter of identifying code constructs in an application that can be replaced by calls to subroutines in Sun Performance Library Multiprocessor applications can get additional speed by identifying opportunities for parallelization The easiest situation occurs when a block of user code exactly duplicates a capability of Sun Performance Library Consider the code below END DO This is the ma
39. a Hermitian definite generalized eigenproblem to standard form using the factorization computed by CPOTRF or ZPOTRE Appendix A Sun Performance Library Routines 69 70 TABLE A 1 Routine LAPACK Linear Algebra Package Routines Continued Function CHEGV or ZHEGV CHEGVD or ZHEGVD CHEGVX or ZHEGVX CHERF S or ZHERF S CHESV or ZHESV CHESVX or ZHESVX CHETRD or ZHETRD CHETRI or ZHETRI CHETRS or ZHETRS CHPCON or ZHPCON CHPEV or ZHPEV CHPEVX or ZHPEVX CHPEVD or ZHPEVD CHPGST or ZHPGST Replacement with newer version CHEGVD or ZHEGVD suggested Computes all the eigenvalues and eigenvectors of a complex generalized Hermitian definite eigenproblem Computes all the eigenvalues and eigenvectors of a complex generalized Hermitian definite eigenproblem and uses a divide and conquer method to calculate eigenvectors Computes selected eigenvalues and eigenvectors of a complex generalized Hermitian definite eigenproblem Improves the computed solution to a system of linear equations when the coefficient matrix is Hermitian indefinite Solves a complex Hermitian indefinite system of linear equations simple driver Solves a complex Hermitian indefinite system of linear equations simple driver Reduces a Hermitian matrix to real symmetric tridiagonal form by using a unitary similarity transformation Computes the factorization of a complex Hermitian indefinite matrix us
40. a reference for the types of LAPACK routines and the FORTRAN 77 interfaces Sparse BLAS and Sparse Solver Books and Papers The following books and papers provide additional information for the sparse BLAS and sparse solver routines a Dodson D S R G Grimes and J G Lewis Sparse Extensions to the Fortran Basic Linear Algebra Subprograms ACM Transactions on Mathematical Software June 1991 Vol 17 No 2 a A George and J W H Liu Computer Solution of Large Sparse Positive Definite Systems Prentice Hall Inc Englewood Cliffs New Jersey 1981 m E Ng and B W Peyton Block Sparse Cholesky Algorithms on Advanced Uniprocessor Computers SIAM M Sci Comput 14 1034 1056 1993 a Jan S Duff Roger G Grimes and John G Lewis User s Guide for the Harwell Boeing Sparse Matrix Collection Release I Technical Report TR PA 92 86 CERFACS Lyon France October 1992 Preface 5 Online Resources Online information describing the performance library routines that form the basis of the Sun Performance Library can be found at LAPACK version 3 0 http www BLAS levels 1 through 3 http www FFTPACK version 4 http www VFFTPACK version 2 1 http www Sparse BLAS http www blas index NIST National Institute of Standards h and Technology Fortran Sparse BLAS ne ne ne ne ne html tlib tlib tlib tlib eLp org lapack org blas org fftpack org vfftpack
41. amming environment Forte C 6 C User s Guide Sun WorkShop 6 Compilers C Describes the C compiler options Sun specific capabilities such as pragmas the lint tool parallelization migration to a 64 bit operating system and ANSI ISO compliant C Forte C 6 C Library Reference Sun WorkShop 6 Compilers C C Migration Guide C Programming Guide C User s Guide Sun WorkShop Memory Monitor User s Manual Describes the C libraries including C Standard Library Tools h class library Sun WorkShop Memory Monitor Iostream and Complex Provides guidance on migrating code to this version of the Sun WorkShop C compiler Explains how to use the new features to write more efficient programs and covers templates exception handling runtime type identification cast operations performance and multithreaded programs Provides information on command line options and how to use the compiler Describes how the Sun WorkShop Memory Monitor solves the problems of memory management in C and C This manual is only available through your installed product see opt SUNWspro docs index html and not at the docs sun com Web site 8 Sun Performance Library User s Guide May 2000 TABLE P 3 Document Collection Document Title Related Sun WorkShop 6 Documentation by Document Collection Continued Description Forte for High Performance Computing 6 Sun
42. and linking options to optimize applications for m Specific SPARC instruction set architectures m 64 bit code m Parallel processing Using Sun Performance Library on SPARC Platforms The Sun Performance Library was compiled using the 95 compiler provided with this release The Sun Performance Library routines were compiled using dalign and xarch set to v8 v8plusa or v9a For each xarch option used to compile the libraries there is a library compiled with xparallel and a library compiled without xparallel When linking the program use dalign xlic_lib sunperf and the same xarch option that was used when compiling If dalign cannot be used in the program supply a trap 6 handler as described in Getting Started With Sun Performance Library on page 14 If compiling with a value of xarch that is not one of v8 v8plusa v9a the compiler driver will select the closest match Sun Performance Library is linked into an application with the xlic_lib switch rather than the 1 switch that is used to link in other libraries as shown below my_system 95 dalign my _file f xlic_lib sunperf 29 30 The xlic_lib switch gives the same effect as if 1 was used to specify the Sun Performance Library and added 1 switches for all of the supporting libraries that Sun Performance Library requires Compiling for SPARC Platforms Applications using Sun Performance Library can be optimized for specific SPARC instruction
43. angular matrix 42 78 79 triangular matrix in packed storage 78 tridiagonal matrix 44 type Independence 19 Index 93 U unitary matrix 79 unitary matrix in packed storage 80 unsymmetric sparse matrix 47 upper Hessenberg matrix 71 USE SUNPERF 64 bit code 31 enabling Fortran 95 features 19 V VFFTPACK 11 86 X xarch 30 xautopar 33 xexplicitpar 33 xlic_lib sunperf 14 29 xparallel 33 xtypemap 31 94 Sun Performance Library User s Guide May 2000
44. ary m Types of arguments must match even after C does type conversion For example be careful when passing a single precision real value because a C compiler can automatically promote the argument to double precision m Arrays are stored columnwise m Array indices are based at zero in conformance with C conventions rather than being based at one to conform to Fortran conventions For example the Fortran interface to IDAMAX which C programs access as idamax_ would return a 1 to indicate the first element in a vector The C interface to idamax which C programs access as idamax would return a 0 to indicate the first element of a vector This convention is observed in function return values permutation vectors and anywhere else that vector or array indices are used Chapter 2 Using Sun Performance Library 25 Note Some of the routines in Sun Performance Library use malloc internally so user codes that make calls to Sun Performance Library and to sbrk may not work correctly Sun Performance Library uses global integer registers g2 g3 and g4 in 32 bit mode and g2 through g5 in 64 bit mode as scratch registers User code should not use these registers for temporary storage and then call a Sun Performance Library routine The data will be overwritten when the Sun Performance Library routine uses these registers C Examples The key to using Sun Performance Library to get peak performance from applications is to recogni
45. atrix XPORE S Refines solution to a linear system in a Cholesky factored symmetric or Hermitian positive definite matrix xPOSV Solves a symmetric or Hermitian positive definite system of linear equations simple driver xPOSVX Solves a symmetric or Hermitian positive definite system of linear equations expert driver XPOTRF Computes Cholesky factorization of a symmetric or Hermitian positive definite matrix xPOTRI Computes the inverse of a symmetric or Hermitian positive definite matrix using the Cholesky factorization returned by xPOTRF Appendix A Sun Performance Library Routines 73 74 TABLE A 1 Routine LAPACK Linear Algebra Package Routines Continued Function XPOTRS Solves a symmetric or Hermitian positive definite system of linear equations using the Cholesky factorization returned by xPOTRF Symmetric or Hermitian Positive Definite Matrix in Packed Storage xXPPCON XPPEQU XPPRES XPPSV XPPSVX xXPPTRE XPPTRI XPPTRS Reciprocal condition number of a Cholesky factored symmetric positive definite matrix in packed storage Computes equilibration scale factors for a symmetric or Hermitian positive definite matrix in packed storage Refines solution to a linear system in a Cholesky factored symmetric or Hermitian positive definite matrix in packed storage Solves a linear system in a symmetric or Hermitian positive definite matrix in packed storage simple driver Solves a
46. ctors of a generalized symmetric definite eigenproblem and uses a divide and conquer method to calculate eigenvectors Improves the computed solution to a system of linear equations when the coefficient matrix is symmetric indefinite Solves a real symmetric indefinite system of linear equations simple driver Appendix A Sun Performance Library Routines 77 78 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Routine Function xXSYSVX Solves a real symmetric indefinite system of linear equations expert driver SSYTRD or Reduces a symmetric matrix to real symmetric tridiagonal form by using a DSYTRD orthogonal similarity transformation XSYTRF Computes the factorization of a real symmetric indefinite matrix using the diagonal pivoting method xSYTRI Computes the inverse of a symmetric indefinite matrix using the factorization computed by xSYTRF xSYTRS Solves a system of linear equations by the symmetric matrix using the factorization computed by xSYTRF Triangular Band Matrix xTBCON Estimates the reciprocal condition number of a triangular band matrix XTBRES Determines error bounds and estimates for solving a triangular banded system of linear equations xTBTRS Solves a triangular banded system of linear equations Triangular Matrix Generalized Problem Pair of Triangular Matrices XTGEVC Computes right and or left generalized eigenvectors of two upper triangular matrices XTGEXC Re
47. d Storage 40 Matrix Types 41 General Matrices 42 Triangular Matrices 42 Symmetric Matrices 43 Tridiagonal Matrices 44 Sparse Matrices 44 Sparse Solver Matrix Data Formats 45 Sun Performance Library Sparse BLAS 47 Naming Conventions 48 Sparse Solver Routines 50 Routine Calling Order 51 Sparse Solver Examples 52 A Sun Performance Library Routines 65 Index 91 vi Sun Performance Library User s Guide May 2000 TABLE 4 1 TABLE 4 2 TABLE 4 3 TABLE 4 4 TABLE A 1 TABLE A 2 TABLE A 3 TABLE A 4 TABLE A 5 TABLE A 6 TABLE A 7 TABLE A 8 TABLE A 9 Tables Netlib Sparse BLAS Naming Conventions 48 NIST Fortran Sparse BLAS Routine Naming Conventions 49 Sparse Solver Routines 50 Sparse Solver Routine Calling Order 51 LAPACK Linear Algebra Package Routines 66 BLAS1 Basic Linear Algebra Subprograms Level 1 Routines 80 BLAS2 Basic Linear Algebra Subprograms Level 2 Routines 81 BLAS3 Basic Linear Algebra Subprograms Level 3 Routines 82 Sparse BLAS Routines 83 Sparse Solver Routines 85 FFTPACK and VFFTPACK Fast Fourier Transform and Vectorized Fast Fourier Transform Routines 86 Other Routines 87 LINPACK Routines 88 vii viii Sun Performance Library User s Guide May 2000 Preface This book describes how to use the Sun specific extensions and features included with the Sun Performance Library subroutines that are supported by the Sun WorkShop 6 FORTRAN 77 Fortran 95
48. ding array A The general form is the most common form A general matrix because it is dense has no special storage scheme In a general banded matrix however the diagonal of the matrix is stored in the row below the upper diagonals For example as shown below the general banded matrix can be represented with banded storage Elements shown with the symbol x are never accessed by routines that process banded arrays aj 47 4 3 0 0 X X 413 da4 435 az 477 473 a4 O X ajz 473 434 G45 0 37 433 434 435 411 422 433 444 455 O O0 a a44 445 az Az a43 s4 X L 0 0 0 ag4 as5 General Banded Matrix General Banded Array in Banded Storage Triangular Matrices A triangular matrix is stored so that there is a one to one correspondence between the nonzero elements of the matrix and the elements of the array but the elements of the array corresponding to the zero elements of the matrix are never accessed by routines that process triangular arrays A triangular matrix can be stored using packed storage a 0 0 a 41 An O az a31 437 433 a31 an 439 433 Triangular Matrix Triangular Array in Packed Storage 42 Sun Performance Library User s Guide May 2000 A triangular banded matrix can be stored using banded storage as shown below Elements shown with the symbol x are never accessed by routines that process banded arrays ay 0 0 411 422 433 az Ay O az 43 X 0 az 433 Triangular Banded Matrix Triangular Banded Array in
49. e a user may use parallelization directives as shown below to instruct the compiler to parallelize this loop Note that a user can also use compiler directives to parallelize a loop with a subroutine call that ordinarily would not be parallelizable For example it is ordinarily not possible to parallelize a loop containing a call to some of the linear system solvers because some vendors have implemented those routines using code that is not MT safe Loops containing calls to the expert drivers of the linear system solvers routines whose names end in SVX are usually not parallelizable with other implementations of LAPACK The implementation of LAPACK in Sun Performance Library allows parallelization of loops containing such calls Because the versions in Sun Performance Library are MT safe users of MP platforms can get additional performance by parallelizing these loops 24 C Interfaces Sun Performance Library contains native C interfaces for each of the routines contained in LAPACK BLAS FFTPACK VFFTPACK and LINPACK The Sun Performance Library C interfaces have the following features m Function names have C names m Function interfaces follow C conventions m C functions do not contain redundant or unnecessary arguments for a C function Sun Performance Library User s Guide May 2000 The following example compares the standard LAPACK Fortran interface and the Sun Performance Library C interfaces for the DGBCON routine CA
50. e been parallelized that might be serial in other products Improving Performance of Other Libraries Users of other mathematical libraries can replace the BLAS in their library with the BLAS in Sun Performance Library while leaving other routines unchanged This is helpful when an application has a dependency on proprietary interfaces in another library that prevent the other library from being completely replaced Many commercial math libraries are built around a core of generic BLAS and LAPACK routines so replacing those generic routines with the highly optimized BLAS and LAPACK routines in Sun Performance Library can give speed improvements on both serial and MP platforms Because replacing the core routines does not require any code changes the proprietary library features can still be used Even libraries that already have fast core routines may get additional speedups by using Sun Performance Library For example if another vendor s core routines are based on BLAS these routines can be replaced with Sun Performance Library routines which have SPARC specific optimizations Many Sun Performance Library routines have also been parallelized Using Tools to Restructure Code In some cases other libraries may not directly use the routines in the Sun Performance Library however there might be conversion aids available For example EISPACK users can refer to a conversion chart in the LAPACK Users Manual that shows how to convert EISP
51. ear equations in packed storage Product of a triangular matrix and a vector Solution to a triangular system of linear equations BLAS3 Routines TABLE A 4 Routine XGEMM CHEMM or ZHEMM CHERK or ZHERK BLAS3 Basic Linear Algebra Subprograms Level 3 Routines CHER2K or ZHER2K XSYMM Function Product of two general matrices Product of a Hermitian matrix and a general matrix Rank k update of a Hermitian matrix Rank 2k update of a Hermitian matrix Product of a symmetric matrix and a general matrix Sun Performance Library User s Guide May 2000 TABLE A 4 Routine XSYRK xSYR2K XTRMM XTRSM BLAS3 Basic Linear Algebra Subprograms Level 3 Routines Continued Function Rank k update of a symmetric matrix Rank 2k update of a symmetric matrix Product of a triangular matrix and a general matrix Solution for a triangular system of equations Sparse BLAS Routines TABLE A 5 Sparse BLAS Routines Routines Function XAXPYI Adds a scalar multiple of a sparse vector X to a full vector Y SBCOMM or Block coordinate matrix matrix multiply DBCOM SBDIMM or Block diagonal format matrix matrix multiply DBDIM SBDISM or Block Diagonal format triangular solve DBDIS SBELMM or Block Ellpack format matrix matrix multiply DBELM SBELSM or Block Ellpack format triangular solve DBELS SBSCMM or Block compressed sparse column format matrix matrix multiply DBSCM SBS
52. ebra Package Routines Continued Routine Function SORMTR or Multiplies a general matrix by the orthogonal transformation matrix DORMTR reduced to tridiagonal form by SSYTRD or DSYTRD Symmetric or Hermitian Positive Definite Band Matrix xPBCON Estimates the reciprocal of the condition number of a symmetric or Hermitian positive definite band matrix using the Cholesky factorization returned by xPBTRF XPBEQU Computes equilibration scale factors for a symmetric or Hermitian positive definite band matrix XPBRES Refines solution to a symmetric or Hermitian positive definite banded system of linear equations XPBSTF Computes a split Cholesky factorization of a real symmetric positive definite band matrix XPBSV Solves a symmetric or Hermitian positive definite banded system of linear equations simple driver XPBSVX Solves a symmetric or Hermitian positive definite banded system of linear equations expert driver XPBTRF Computes Cholesky factorization of a symmetric or Hermitian positive definite band matrix XPBTRS Solves symmetric positive definite banded matrix using the Cholesky factorization computed by xPBTRF Symmetric or Hermitian Positive Definite Matrix xPOCON Estimates the reciprocal of the condition number of a symmetric or Hermitian positive definite matrix using the Cholesky factorization returned by xPOTRE xXPOEQU Computes equilibration scale factors for a symmetric or Hermitian positive definite m
53. er of a general matrix Determinant and inverse of an LU factored general matrix LU factorization of a general matrix Solution to a linear system in an LU factored general matrix Solution to a linear system in a tridiagonal matrix UDU factorization and condition number of a Hermitian matrix Determinant inertia and inverse of a UDU factored Hermitian matrix UDU factorization of a Hermitian matrix Solution to a linear system in a UDU factored Hermitian matrix UDU factorization and condition number of a Hermitian matrix in packed storage Determinant inertia and inverse of a UDU factored Hermitian matrix in packed storage UDU factorization of a Hermitian matrix in packed storage Solution to a linear system in a UDU factored Hermitian matrix in packed storage Cholesky factorization and condition number of a symmetric positive definite matrix in banded storage Sun Performance Library User s Guide May 2000 TABLE A 9 LINPACK Routines Continued Routine Function XPBDI Determinant of a Cholesky factored symmetric positive definite matrix in banded storage XPBFA Cholesky factorization of a symmetric positive definite matrix in banded storage xPBSL Solution to a linear system in a Cholesky factored symmetric positive definite matrix in banded storage xPOCO Cholesky factorization and condition number of a symmetric positive definite matrix xPODI Determinant and inverse of a Cholesky factored symmetric positive
54. es pays Les produits portant les marques SPARC sont bas s sur une architecture d velopp e par Sun Microsystems Inc L interface d utilisation graphique OPEN LOOK et Sun a t d velopp e par Sun Microsystems Inc pour ses utilisateurs et licenci s Sun reconna t les efforts de pionniers de Xerox pour la recherche et le d veloppement du concept des interfaces d utilisation visuelle ou graphique pour l industrie de l informatique Sun d tient une licence non exclusive de Xerox sur l interface d utilisation graphique Xerox cette licence couvrant galement les licenci s de Sun qui mettent en place l interface d utilisation graphique OPEN LOOK et quien outre se conforment aux licences crites de Sun Sun f90 f95 est deriv de CRAY CF90 un produit de Silicon Graphics Inc CETTE PUBLICATION EST FOURNIE EN L ETAT ET AUCUNE GARANTIE EXPRESSE OU IMPLICITE N EST ACCORDEE Y COMPRIS DES GARANTIES CONCERNANT LA VALEUR MARCHANDE L APTITUDE DE LA PUBLICATION A REPONDRE A UNE UTILISATION PARTICULIERE OU LE FAIT QU ELLE NE SOIT PAS CONTREFAISANTE DE PRODUIT DE TIERS CE DENI DE GARANTIE NE S APPLIQUERAIT PAS DANS LA MESURE OU IL SERAIT TENU JURIDIQUEMENT NUL ET NON AVENU Gd tem Ca Adobe PostScript Important Note on New Product Names As part of Sun s new developer product strategy we have changed the names of our development tools from Sun WorkShop to Forte Developer products The products as you can see are
55. es the factorization of a symmetric tridiagonal matrix XSTTRS Computes the solution to a system of linear equations where the coefficient matrix is a symmetric tridiagonal matrix Symmetric Matrix XSYCON SSY I DSY lt e 4 BIR lt SSYEVX or DSY DSYEVX SSYEVD or DSYEVD SSYEVR or EVR SSYGST or DSYGST SSYGV or DSYGV SSYGVX or DSYGVX SSYGVD or DSYGVD XSYRFS XSYSV Estimates the reciprocal of the condition number of a symmetric matrix using the factorization computed by SSYTRF or DSYTRF Replacement with newer version SSYEVR or DSYEVR suggested Computes all eigenvalues and eigenvectors of a symmetric matrix Computes eigenvalues and eigenvectors of a symmetric matrix expert driver Replacement with newer version SSYEVR or DSYEVR suggested Computes all eigenvalues and eigenvectors of a symmetric matrix and uses a divide and conquer method to calculate eigenvectors Computes selected eigenvalues and eigenvectors of a symmetric tridiagonal matrix Reduces a symmetric definite generalized eigenproblem to standard form using the factorization computed by SPOTRF or DPOTRF Replacement with newer version SSYGVD or DSYGVD suggested Computes all the eigenvalues and eigenvectors of a generalized symmetric definite eigenproblem Computes selected eigenvalues and eigenvectors of a generalized symmetric definite eigenproblem Computes all the eigenvalues and eigenve
56. f or 95 dalign xarch any f 95 xlic_lib sunperf For example Compile and link with libsunperf a statically cc dalign xarch any c Bstatic xlic_lib sunperf Bdynamic or 77 dalign xarch any f Bstatic xlic_lib sunperf Bdynamic or 95 dalign xarch any f95 Bstatic xlic_lib sunperf Bdynamic Chapter 3 SPARC Optimization and Parallel Processing 35 Using Multiple Processors in Shared Mode To use multiple processors in shared mode 1 Call one or more of the routines 2 Set PARALLEL to a number greater than 1 3 Compile and link with mt 4 Link with xlic_lib sunperf specified at the end of the command line Do not compile or link with parallel explicitpar or autopar For example compile and link with Libsunperf so default cc dalign xarch any c xlic_lib sunperf mt or 77 dalign xarch any f xlic_lib sunperf mt or 95 dalign xarch any f95 xlic_lib sunperf mt For example Compile and link with libsunperf a statically cc dalign xarch any c Bstatic xlic_lib sunperf Bdynamic mt or 77 dalign xarch any f Bstatic xlic_lib sunperf Bdynamic mt or 95 dalign xarch any f95 Bstatic xlic_lib sunperf Bdynamic mt Using Multiple Processors in Dedicated Mode With Parallelization Options To use multiple processors in dedicated mode 1 Call one or more of the routines 2 Set PARALLEL to the n
57. genvalues and eigenvectors of a generalized ZHPGVX Hermitian definite eigenproblem where the coefficient matrices are in packed storage expert driver CHPGVD or Computes all the eigenvalues and eigenvectors of a generalized Hermitian ZHPGVD definite eigenproblem where the coefficient matrices are in packed storage and uses a divide and conquer method to calculate eigenvectors CHPRE S or Improves the computed solution to a system of linear equations when the ZHPRES coefficient matrix is Hermitian indefinite in packed storage CHP SV or Computes the solution to a complex system of linear equations where the ZHPSV coefficient matrix is Hermitian in packed storage simple driver CHP SVX or Uses the diagonal pivoting factorization to compute the solution to a ZHPSVX complex system of linear equations where the coefficient matrix is Hermitian in packed storage expert driver CHPTRD or Reduces a complex Hermitian matrix stored in packed form to real ZHPTRD symmetric tridiagonal form CHPTRF or Computes the factorization of a complex Hermitian indefinite matrix in ZHPTRE packed storage using the diagonal pivoting method CHPTRI or Computes the inverse of a complex Hermitian indefinite matrix in packed ZHPTRI storage using the factorization computed by CHPTRF or ZHPTRE CHPTRS or Solves a complex Hermitian indefinite matrix in packed storage using the ZHPTRS factorization computed by CHPTRF or ZHPTRE Upper Hessenberg Matr
58. iagonal system of linear equations using the LDL factorization returned by xPTTRE Sun Performance Library User s Guide May 2000 TABLE A 1 Rout ine LAPACK Linear Algebra Package Routines Continued Function Real Symmetric Band Matrix SSBI DSBI SSBI DSBE SSBEVX or DSBEVX SSBGST or DSBGST SSBGV or DSBGV SSBGVD or DSBGVD SSBGVX or DSBGVX SSBTRD or DSBTRD Replacement with newer version SSBEVD or DSBEVD suggested Computes all eigenvalues and eigenvectors of a symmetric band matrix Computes all eigenvalues and eigenvectors of a symmetric band matrix and uses a divide and conquer method to calculate eigenvectors Computes selected eigenvalues and eigenvectors of a symmetric band matrix Reduces symmetric definite banded generalized eigenproblem to standard form Replacement with newer version SSBGVD or DSBGVD suggested Computes all eigenvalues and eigenvectors of a generalized symmetric definite banded eigenproblem Computes all eigenvalues and eigenvectors of generalized symmetric definite banded eigenproblem and uses a divide and conquer method to calculate eigenvectors Computes selected eigenvalues and eigenvectors of a generalized symmetric definite banded eigenproblem Reduces symmetric band matrix to real symmetric tridiagonal form by using an orthogonal similarity transform Symmetric Matrix in Packed Storage XSPCON SSPI DSP SSPI DSP
59. ing the diagonal pivoting method Computes the inverse of a complex Hermitian indefinite matrix using the factorization computed by CHETRF or ZHETREF Solves a complex Hermitian indefinite matrix using the factorization computed by CHETRF or ZHETRF Hermitian Matrix in Packed Storage Estimates the reciprocal of the condition number of a Hermitian indefinite matrix in packed storage using the factorization computed by CHPTRF or ZHPTRF Replacement with newer version CHPEVD or ZHPEVD suggested Computes all the eigenvalues and eigenvectors of a Hermitian matrix in packed storage simple driver Computes selected eigenvalues and eigenvectors of a Hermitian matrix in packed storage expert driver Computes all the eigenvalues and eigenvectors of a Hermitian matrix in packed storage and uses a divide and conquer method to calculate eigenvectors Reduces a Hermitian definite generalized eigenproblem to standard form where the coefficient matrices are in packed storage and uses the factorization computed by CPPTRF or ZPPTRE Sun Performance Library User s Guide May 2000 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Routine Function CHPGV or Replacement with newer version CHPGVD or ZHPGVD suggested ZHPGV Computes all the eigenvalues and eigenvectors of a generalized Hermitian definite eigenproblem where the coefficient matrices are in packed storage simple driver CHPGVX or Computes selected ei
60. it arguments void dgbcon char norm int n int lda int info int nsub int ipivot int nsuper double da double danorm double drcond The following example shows calling the dgbcon routine using 64 bit arguments char norm double da double drcond void dgbcon_64 long n long lda long nsuper double danorm long nsub long ipivot long info Sun Performance Library User s Guide May 2000 Optimizing for Parallel Processing Note The Fortran compiler parallelization features require a Sun WorkShop HPC license Sun Performance Library can be used with the shared or dedicated modes of parallelization that are user selectable at link time Specifying the parallelization mode improves application performance by using the parallelization enhancements made to Sun Performance Library routines The shared multiprocessor model of parallelism has the following features m Delivers peak performance to applications that do not use compiler parallelization and that run on a platform shared with other applications ma Parallelization is implemented with threads library synchronization primitives The dedicated multiprocessor model of parallelism has the following features m Delivers peak performance to applications using automatic compiler parallelization and running on an MP platform dedicated to a single processor intensive application m Parallelization is impleme
61. itian matrix in packed storage 70 including routines in development environment 17 L LAPACK 11 66 LAPACK 90 12 LAPACK compatibility 12 14 LINPACK 11 88 M malloc 26 man pages 65 matrix banded 40 bidiagonal 66 diagonal 66 92 Sun Performance Library User s Guide May 2000 general 42 66 general band 66 general tridiagonal 68 Hermitian 69 Hermitian band 69 Hermitian in packed storage 70 real orthogonal 72 real orthogonal in packed storage 72 real symmetric band 75 real symmetric tridiagonal 76 structurally symmetric sparse 46 symmetric 43 77 symmetric banded 44 symmetric in packed storage 75 symmetric or Hermitian positive definite 73 symmetric or Hermitian positive definite band 73 symmetric or Hermitian positive definite in packed storage 74 symmetric or Hermitian positive definite tridiagonal 74 symmetric sparse 45 trapezoidal 79 triangular 42 78 79 triangular band 78 triangular in packed storage 78 tridiagonal 44 unitary 79 unitary in packed storage 80 unsymmetric sparse 47 upper Hessenberg 71 misalign 34 MT safe routines 23 N Netlib 12 Netlib Sparse BLAS 48 naming conventions 48 NIST Fortran Sparse BLAS 48 naming conventions 49 O one call interface 51 optimizing 64 bit code 30 31 parallel processing 33 SPARC instruction set 30 optional f95 interfaces 20 P packed storage 40 PARALLEL environment variab
62. ix XHSEIN XHSEQR Computes right and or left eigenvectors of upper Hessenberg matrix using inverse iteration Computes eigenvectors and Shur factorization of upper Hessenberg matrix using multishift QR algorithm Upper Hessenberg Matrix Generalized Problem Hessenberg and Triangular Matrix XHGEQZ Implements single double shift version of QZ method for finding the generalized eigenvalues of the equation det A w i B 0 Appendix A Sun Performance Library Routines 71 72 TABLE A 1 Routine LAPACK Linear Algebra Package Routines Continued Function Real Orthogonal Matrix in Packed Storage SOPGTR or DOPGTR SOPMTR or DOPMTR Generates an orthogonal transformation matrix from a tridiagonal matrix determined by SSPTRD or DSPTRD Multiplies a general matrix by the orthogonal transformation matrix reduced to tridiagonal form by SSPTRD or DSPTRD Real Orthogonal Matrix SORGBR or DORGBR SORGHR or DORGHR SORGLOQ or DORGLO SORGOL or DORGOL SORGOR or DORGOR SORGRO or DORGRO SORGTR or DORGTR SOR DORI SOR DORI SOR DORI SOR DORI SOR DORI SOR DORI SOR DORI SOR DORI BR or BR HR or HR LO or LO QL or OL QR or OR R3 or R3 RQ or RQ RZ or RZ Generates the orthogonal transformation matrices from reduction to bidiagonal form as determined by SGEBRD or DGEBRD Generates the orthogonal transformation matrix reduced to
63. ix matrix multiply Right permutation of a jagged diagonal matrix Jagged diagonal triangular solve Applies a Givens rotation to a sparse vector and a full vector Given a sparse vector and corresponding index vector puts those elements into a full vector Skyline format matrix matrix multiply Sun Performance Library User s Guide May 2000 TABLE A 5 Sparse BLAS Routines Continued Routines Function SSKYSM or Skyline format triangular solve DSKYSM SVBRMM or Variable block sparse row format matrix matrix multiply DVBRMM SVBRSM or Variable block sparse row format triangular solve DVBRSM Sparse Solver Routines TABLE A 6 Sparse Solver Routines Routines Function DGSSFS One call interface to sparse solver DGSSIN Sparse solver initialization DGSSOR Fill reducing ordering and symbolic factorization DGSSFA Matrix value input and numeric factorization DGSSSL Triangular solve DGSSUO Sets user specified ordering permutation DGSSRP Returns permutation used by solver DGSSCO Returns condition number estimate of coefficient matrix DGSSDA De allocates sparse solver DGSSPS Prints solver statistics Appendix A Sun Performance Library Routines 85 86 FFTPACK and VFFTPACK Routines Routines with a V prefix are vectorized routines that belong to VFFTPACK TABLE A 7 FFTPACK and VFFTPACK Fast Fourier Transform and Vectorized Fast Fourier Transform Routines Ro
64. le 34 35 parallel processing dedicated multiprocessor model 33 optimizing 33 shared multiprocessor model 33 parallelization model dedicated 33 shared 33 R real orthogonal matrix 72 real orthogonal matrix in packed storage 72 real symmetric band matrix 75 real symmetric tridiagonal matrix 76 regular interface 50 replacing routines 18 routine calling conventions C 25 77 95 19 routines BLAS1 80 BLAS2 81 BLAS3 82 FFTPACK 86 LAPACK 66 LINPACK 88 sparse BLAS 83 sparse solvers 85 VFFTPACK 86 S shared mode 36 Shared model 33 shared multiprocessor model 33 single processor 35 sparse BLAS 83 sparse matrices CSC storage format 45 structurally symmetric 46 symmetric 45 unsymmetric 47 sparse solver 85 sparse solver package 44 one call interface 51 regular interface 50 routine calling order 51 routines 50 using with C 44 specifying parallelization mode 33 STACKSIZE environment variable 34 structurally symmetric sparse matrix 46 symmetric banded matrix 44 symmetric matrix 43 77 symmetric matrix in packed storage 75 symmetric or Hermitian positive definite band matrix 73 symmetric or Hermitian positive definite matrix 73 symmetric or Hermitian positive definite matrix in packed storage 74 symmetric or Hermitian positive definite tridiagonal matrix 74 symmetric sparse matrix 45 T threads 34 trap 6 15 trapezoidal matrix 79 triangular band matrix 78 tri
65. linear system in a symmetric or Hermitian positive definite matrix in packed storage expert driver Computes Cholesky factorization of a symmetric or Hermitian positive definite matrix in packed storage Computes the inverse of a symmetric or Hermitian positive definite matrix in packed storage using the Cholesky factorization returned by xPPTRF Solves a symmetric or Hermitian positive definite system of linear equations where the coefficient matrix is in packed storage using the Cholesky factorization returned by xPPTRF Symmetric or Hermitian Positive Definite Tridiagonal Matrix XPI XPT XPT xP XPT XPT XP CON TEOR RFS TSV SVX TRE TRS Estimates the reciprocal of the condition number of a symmetric or Hermitian positive definite tridiagonal matrix using the Cholesky factorization returned by xPTTRF Computes all eigenvectors and eigenvalues of a real symmetric or Hermitian positive definite system of linear equations Refines solution to a symmetric or Hermitian positive definite tridiagonal system of linear equations Solves a symmetric or Hermitian positive definite tridiagonal system of linear equations simple driver Solves a symmetric or Hermitian positive definite tridiagonal system of linear equations expert driver Computes the LDL factorization of a symmetric or Hermitian positive definite tridiagonal matrix Solves a symmetric or Hermitian positive definite trid
66. lution to a linear system in a UDU factored symmetric matrix in packed storage Singular value decomposition of a general matrix Condition number of a triangular matrix Determinant and inverse of a triangular matrix Solution to a linear system in a triangular matrix Sun Performance Library User s Guide May 2000 Index NUMERICS 32 bit addressing 30 64 bit addressing 30 64 bit code C 32 FORTRAN 77 31 Fortran 95 31 integer promotion 31 USE SUNPERF 31 A automatic code restructuring tools 18 B banded matrix 40 bidiagonal matrix 66 BLAS1 11 80 BLAS2 11 81 BLAS3 11 82 Cc C 64 bit code 32 calling conventions 25 C interfaces advantages 24 compared to Fortran interfaces 25 CLAPACK 12 compatibility LAPACK 12 14 compile time checking 19 compressed sparse column CSC format 45 D dalign 15 dedicated mode 36 dedicated multiprocessor model 33 diagonal matrix 66 E EISPACK 12 enable trap 6 15 environment variable PARALLEL 34 35 STACKSIZE 34 F 77 95 interfaces calling conventions 19 FFTPACK 11 86 FORTRAN 77 64 bit code 31 Index 91 Fortran 95 64 bit code 31 compile time checking 19 optional interfaces 20 type independence 19 USE SUNPERF 19 G general band matrix 66 general matrix 42 66 general tridiagonal matrix 68 global integer registers 26 H Hermitian band matrix 69 Hermitian matrix 69 Herm
67. n Performance Library is based Typographic Conventions TABLE P 1 shows the typographic conventions that are used in Sun WorkShop documentation TABLE P 1 Typeface AaBbCc123 AaBbCc123 AaBbCc123 AaBbCc123 Typographic Conventions Meaning The names of commands files and directories on screen computer output What you type when contrasted with on screen computer output Book titles new words or terms words to be emphasized Command line placeholder text replace with a real name or value Sun Performance Library User s Guide May 2000 Examples Edit your login file Use 1s a to list all files o You have mail o 3 su Password Read Chapter 6 in the User s Guide These are called class options You must be superuser to do this To delete a file type rm filename Shell Prompts TABLE P 2 shows the default system prompt and superuser prompt for the C shell Bourne shell and Korn shell TABLE P 2 Shell Prompts Shell Prompt C shell Bourne shell and Korn shell S C shell Bourne shell and Korn shell superuser Access to Sun WorkShop Development Tools Because Sun WorkShop product components and man pages do not install into the standard usr bin and usr share man directories you must change your PATH and MANPATH environment variables to enable access to Sun WorkShop compilers and tools To determine if you need to set your PATH environment variable
68. nerates the unitary transformation matrix from a tridiagonal matrix ZUPGTR determined by CHPTRD or ZHPTRD CUPMTR or Multiplies a general matrix by the unitary transformation matrix reduced ZUPMTR to tridiagonal form by CHPTRD or ZHPTRD BLAS1 Routines TABLE A 2 BLAS1 Basic Linear Algebra Subprograms Level 1 Routines Routine Function SASUM DASUM Sum of the absolute values of a vector SCASUM DZASUM xAXPY Product of a scalar and vector plus a vector xCOPY Copy a vector SDOT DDOT Dot product inner product DSDOT SDSDOT CDOTU ZDOTU DQDOTA DQDOTI Sun Performance Library User s Guide May 2000 TABLE A 2 BLAS1 Basic Linear Algebra Subprograms Level 1 Routines Continued Routine Function CDOTC ZDOTC Dot product conjugating first vector SNRM2 DNRM2 Euclidean norm of a vector SCNRM2 DCNRM2 DZNRM2 xROTG Set up Givens plane rotation xROT CSROT Apply Given s plane rotation ZDROT SROTMG DROTMG Set up modified Given s plane rotation SROTM DROTM Apply modified Given s rotation ISAMAX DAMAX Index of element with maximum absolute value ICAMAX IZAMAX xSCAL CSSCAL Scale a vector ZDSCAL x SWAP Swap two vectors CVMUL ZVMUL Compute scaled product of complex vectors BLAS2 Routines TABLE A 3 BLAS2 Basic Linear Algebra Subprograms Level 2 Routines Routine Function xGBMV Product of a matrix in banded storage and a vector xGEMV Product of a general matrix and a vector
69. nted with spin locks On a dedicated system the dedicated model can be faster than the shared model due to lower synchronization overhead On a system running many different tasks the shared model can make better use of available resources Specifying the Parallelization Mode To specify the parallelization mode m Shared model Use mt on the link line without one of the compiler parallelization options m Dedicated model Use one of the compiler parallelization options xparallel xexplicitpar xautopar on the compile and link lines m Single processor Do not specify any of the compiler parallelization options or mt on the link line Note Using the shared model with one of the compiler parallelization options xparallel xexplicitpar or xautopar produces unpredictable behavior Chapter 3 SPARC Optimization and Parallel Processing 33 34 If compiling with one of the compiler parallelization options m Use the same parallelization option on the linking command m To use multiple processors add mt to the link line and then specify the number of processors at runtime with the PARALLEL environment variable For example to use 24 processors type the commands shown below my_system 95 dalign mt my_app f xlic_lib sunperf my_system setenv PARALLEL 24 my_system s a out Note Parallel processing options require using either the dalign command line option or establishing a trap
70. nvectors of a symmetric tridiagonal matrix using a divide and conquer method Computes selected eigenvalues and eigenvectors of a real symmetric tridiagonal matrix using Relatively Robust Representations Computes selected eigenvectors of a real symmetric tridiagonal matrix using inverse iteration Computes all the eigenvalues and eigenvectors of a real symmetric tridiagonal matrix using the implicit QL or QR algorithm Computes all the eigenvalues and eigenvectors of a real symmetric tridiagonal matrix using a root free QL or QR algorithm variant Replacement with newer version SSTEVR or DSTEVR suggested Computes all eigenvalues and eigenvectors of a real symmetric tridiagonal matrix simple driver Sun Performance Library User s Guide May 2000 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Routine Function SSTEVX or Computes selected eigenvalues and eigenvectors of a real symmetric DSTEVX tridiagonal matrix expert driver SSTEVD or Replacement with newer version SSTEVR or DSTEVR suggested DSTEVD Computes all the eigenvalues and eigenvectors of a real symmetric tridiagonal matrix using a divide and conquer method SSTEVR or Computes selected eigenvalues and eigenvectors of a real symmetric DSTEVR tridiagonal matrix using Relatively Robust Representations xSTSV Computes the solution to a system of linear equations where the coefficient matrix is a symmetric tridiagonal matrix XSTTRF Comput
71. olaris 2 6 Solaris 7 and Solaris 8 and adds support for multiple processors Sun Performance Library LAPACK routines have been compiled with a Fortran 95 compiler and remain compatible with the Netlib LAPACK version 3 0 library The Sun Performance Library versions of these routines perform the same operations as the Fortran callable routines and have the same interface as the standard Netlib versions Netlib Netlib is an online repository of mathematical software papers and databases maintained by AT amp T Bell Laboratories the University of Tennessee Oak Ridge National Laboratory and professionals from around the world Netlib provides many libraries in addition to the seven libraries used in Sun Performance Library While some of these libraries can appear similar to libraries used with Sun Performance Library they can be different from and incompatible with Sun Performance Library Using routines from other libraries can produce compatibility problems not only with Sun Performance Library routines but also with the base Netlib LAPACK routines When using non Sun Performance Library routines refer to the documentation provided with that library For example Netlib provides a CLAPACK library but the CLAPACK interfaces differ from the C interfaces included with Sun Performance Library A LAPACK 90 library package is also available on Netlib The LAPACK 90 library contains interfaces that differ from the Sun Performance Library
72. on 3P man pages for the individual routines For more information on the Netlib Sparse BLAS package refer to http www netlib org sparse blas index html a For more information on the NIST Fortran Sparse BLAS routines refer to http math nist gov spblas Chapter 4 Working With Matrices 47 48 Naming Conventions The Netlib Sparse BLAS and NIST Fortran Sparse BLAS Library routines each use their own naming conventions as described in the following two sections Netlib Sparse BLAS Each Netlib Sparse BLAS routine has a name of the form Prefix Root Suffix where the m Prefix represents the data type Root represents the operation m Suffix represents whether or not the routine is a direct extension of an existing dense BLAS routine TABLE 4 1 lists the naming conventions for the Netlib Sparse BLAS vector routines TABLE 4 1 Netlib Sparse BLAS Naming Conventions Operation Root of Name Prefix and Suffix Dot product DOT S I D I C UI Z UI C CI Z CI Scalar times a vector AXPY S f D I lt C Z 1 added to a vector Apply Givens ROT S I D I rotation Gather x into y GTHR S D Z S Z D Z C Z Z Z Scatter x into y SCTR S D C Z The prefix can be one of the following data types S SINGLE D DOUBLE C COMPLEX Z COMPLEX 16 or DOUBLE COMPLEX The I CI and UI suffixes denote sparse BLAS routines that are direct extensions to dense BLAS routines Sun Performance Library Use
73. ontains the corresponding nonzero numerical values The following matrix data formats exist for a sparse matrix of neqns equations and nnz nonzeros m Symmetric m Structurally symmetric m Unsymmetric The most efficient data representation often depends on the specific problem The following sections show examples of sparse matrix data formats Symmetric Sparse Matrices A symmetric sparse matrix is a matrix where a i j a j i for all i and j Because of this symmetry only the lower triangular values need to be passed to the solver routines The upper triangle can be determined from the lower triangle Chapter 4 Working With Matrices 45 46 An example of a symmetric matrix is shown below This example is derived from A George and J W H Liu Computer Solution of Large Sparse Positive Definite Systems 40 10 20 05 2 0 10 05 00 00 0 0 A 20 00 30 00 00 05 0 0 0 0 0 625 0 0 20 00 0 0 0 0 16 0 To represent A in CSC format colptr 1 6 7 8 9 10 m rowind 1 2 3 4 5 2 3 4 5 m values 4 0 1 0 2 0 0 5 2 0 0 5 3 0 0 625 16 0 Structurally Symmetric Sparse Matrices A structurally symmetric sparse matrix has nonzero values with the property that if a i j 0 then a j i 0 for all i and j When solving a structurally symmetric system the entire matrix must be passed to the solver routines An example of a structurally symmetric matrix is shown below 1 0 3 0 0 0 0 0 2 0 4 0 0 0 7 0 0 0 0 0
74. orders the generalized Schur decomposition of a real or complex matrix pair using an orthogonal or unitary equivalence transformation xTGSEN Reorders the generalized real Schur or Schur decomposition of two matrixes and computes the generalized eigenvalues xTGSJA Computes the generalized SVD from two upper triangular matrices obtained from xGGSVP xTGSNA Estimates reciprocal condition numbers for specified eigenvalues and eigenvectors of two matrices in real Schur or Schur canonical form xTGSYL Solves the generalized Sylvester equation Triangular Matrix in Packed Storage xTPCON Estimates the reciprocal or the condition number of a triangular matrix in packed storage XTPRES Determines error bounds and estimates for solving a triangular system of linear equations where the coefficient matrix is in packed storage xTPTRI Computes the inverse of a triangular matrix in packed storage xTPTRS Solves a triangular system of linear equations where the coefficient matrix is in packed storage Sun Performance Library User s Guide May 2000 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Routine Function Triangular Matrix xXTRCON XTREVC XTREXC XTRRE S XTRSEN xXTRSNA XTRSYL XTRTRI XTRTRS Estimates the reciprocal or the condition number of a triangular matrix Computes right and or left eigenvectors of an upper triangular matrix Reorders Schur factorization of matrix
75. p example FAILED sparse solver error number ier stop 200 format a5 3a20 300 format 15 3d20 12 i sol xexpct values 400 format a60 i20 fail message sparse solver error number end my_system s 95 dalign example_su f xlic_lib sunperf my_system a out rhs i expected rhs i error 100000000000D 01 100000000000D 01 000000000000D 00 200000000000D 01 200000000000D 01 000000000000D 00 300000000000D 01 300000000000D 01 000000000000D 00 400000000000D 01 400000000000D 01 000000000000D 00 Oo OO fo BWwWN O O O O G O ea oO 60 Sun Performance Library User s Guide May 2000 CODE EXAMPLE 4 4 Solving an Unsymmetric System Regular Interface my_system S cat example_uu f program example_uu c c This program is an example driver that calls the sparse solver tes It factors and solves an unsymmetric system c implicit none integer neqns ier msglvl outunt ldrhs nrhs character mtxtyp 2 pivot l ordmthd 3 double precision handle 150 integer colstr 6 rowind 10 double precision values 10 rhs 5 xexpct 5 integer alt c c Sparse matrix structure and value arrays Unsummetric matrix A Ax b solve for x where c c 1 0 0 0 0 0 0 0 0 0 1 0 1 0 c 2 0 6 0 0 0 0 0 9 0 2 0 59 0 e A 3 0 0 0 7 0 0 0 0 0 x 3 0 b 24 0 c 4 0 0 0 0 0 8 0 0 0 4 0 36 0 5 0 0 0 0 0 0 0 10 0 5 0 5940 c data colstr 1 6 7 8 9 11 data rowind Ip 27 3 4 5 2 37 4 2
76. r s Guide May 2000 NIST Fortran Sparse BLAS Each NIST Fortran Sparse BLAS routine has a six character name of the form XYYYZZ where m X represents the data type m YYY represents the sparse storage format m ZZ represents the operation TABLE 4 2 shows the values for X Y and Z TABLE 4 2 NIST Fortran Sparse BLAS Routine Naming Conventions X Data Type x S single precision D double precision YYY Sparse Storage Format YYY Single entry formats COO coordinate CSC compressed sparse column CSR compressed sparse row DIA diagonal ELL ellpack JAD jagged diagonal SKY skyline Block entry formats BCO block coordinate BSC block compressed sparse column BSR block compressed sparse row BDI block diagonal BEL block ellpack VBR block compressed sparse row ZZ Operation ZZ MM matrix matrix product SM solution of triangular system supported for all formats except COO RP right permutation for JAD format only Chapter 4 Working With Matrices 49 Sparse Solver Routines The Sun Performance Library sparse solver package contains the routines listed in TABLE 4 3 TABLE 4 3 Sparse Solver Routines Routine Function DGSSFS One call interface to sparse solver DGSSIN Sparse solver initialization DGSSOR Fill reducing ordering and symbolic factorization DGSSFA Matrix value input and numeric factorization DGSSSL Triangular solve Utility Routine Function DGSSUO
77. r xarch v8plusb To compile for 64 bit addressing in a 64 bit enabled Solaris operating environment a UltraSPARC I or UltraSPARC II systems use xarch v9 or xarch v9a m UltraSPARC III systems use xarch v9 or xarch v9b Sun Performance Library User s Guide May 2000 Compiling Code for 64 Bit UltraSPARC To compile 64 bit code on UltraSPARC use xarch v9 a b and convert all integer arguments to 64 bit arguments 64 bit routines require the use of 64 bit integers Sun Performance Library provides 32 bit and 64 bit interfaces To use the 64 bit interfaces a Modify the Sun Performance Library routine name For C FORTRAN 77 and Fortran 95 code without the USE SUNPERF statement _64 must be appended to the names of Sun Performance Library routines for example dgbcon_64 or CAXPY_64 For 95 code with the USE SUNPERF statement do not append _64 to the Sun Performance Library routine names The compiler will infer the correct interface from the presence or absence of INTEGER 8 arguments a Promote integers to 64 bits Double precision variables and the real and imaginary parts of double complex variables are already 64 bits Only the size of the integers is affected To control promotion of integer arguments do one of the following a To promote all integers from 32 bits to 64 bits compile with xtypemap integer 64 m When using Fortran to a
78. r service marks of Sun Microsystems Inc in the U S and other countries All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International Inc in the U S and other countries Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems Inc The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems Inc for its users and licensees Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry Sun holds a non exclusive license from Xerox to the Xerox Graphical User Interface which license also covers Sun s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun s written license agreements Sun 90 95 is derived from Cray CF90 a product of Silicon Graphics Inc Federal Acquisitions Commercial Software Government Users Subject to Standard License Terms and Conditions DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS REPRESENTATIONS AND WARRANTIES INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT ARE DISCLAIMED EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID Copyright 2000 Sun Microsystems Inc 901 San Antonio Road Palo Alto CA 94303 4900 Etats Unis Tous droits r serv s Ce produit ou document est di
79. ral rectangular matrix XGELS Computes the least squares solution to an over determined system of linear equations using a QR or LQ factorization of A xGELSD Computes the least squares solution to an over determined system of linear equations using a divide and conquer method using a QR or LQ factorization of A xXGELSS Computes the minimum norm solution to a linear least squares problem by using the SVD of a general rectangular matrix simple driver xGELSX Depreciated routine replaced by xSELSY XGELSY Computes the minimum norm solution to a linear least squares problem using a complete orthogonal factorization xGEQLF Computes QL factorization of a general rectangular matrix XGEQP 3 Computes QR factorization of general rectangular matrix using Level 3 BLAS XGEQPF Depreciated routine replaced by xGEQP 3 XGEQRF Computes QR factorization of a general rectangular matrix XGEREF S Refines solution to a system of linear equations XGERQF Computes RQ factorization of a general rectangular matrix xGESDD Computes SVD of general rectangular matrix using a divide and conquer method xGESV Solves a general system of linear equations simple driver XGESVX Solves a general system of linear equations expert driver xGESVD Computes SVD of general rectangular matrix Appendix A Sun Performance Library Routines 67 68 TABLE A 1 Routine LAPACK Linear Algebra Package Routines Continued Function XGETRE XG
80. re information about the MANPATH variable see the man 1 man page For more information about setting your PATH and MANPATH variables to access this release see the Sun WorkShop 6 Installation Guide or your system administrator 4 Related Documents and Web Sites A number of books and web sites provide reference information on the routines in the base libraries LAPACK LINPACK BLAS and so on upon which the Sun Performance Workshop is based Sun Performance Library includes extensions to the base libraries that are not described in the books from the Society for Industrial and Applied Mathematics SIAM or the online Netlib documents Sun Performance Library User s Guide May 2000 LAPACK and LINPACK Books The following books augment this manual and provide essential information m LAPACK Users Guide 3rd ed Anderson E and others SIAM 1999 a LINPACK User s Guide Dongarra J J and others SIAM 1979 The LAPACK Users Guide 3rd ed is the official reference for the base LAPACK version 3 0 routines An online version of the LAPACK 3 0 Users Guide is available at http www netlib org lapack lug and the printed version is available from SIAM Sun Performance Library routines contain performance enhancements extensions and features not described in the LAPACK Users Guide However because Sun Performance Library maintains compatibility with the base LAPACK routines the LAPACK Users Guide can be used as
81. s selected eigenvalues and eigenvectors of a Hermitian band matrix Reduces Hermitian definite banded generalized eigenproblem to standard form Replacement with newer version CHBGVD or ZHBGVD suggested Computes all eigenvalues and eigenvectors of a generalized Hermitian definite banded eigenproblem Computes all eigenvalues and eigenvectors of generalized Hermitian definite banded eigenproblem and uses a divide and conquer method to calculate eigenvectors Computes selected eigenvalues and eigenvectors of a generalized Hermitian definite banded eigenproblem Reduces Hermitian band matrix to real symmetric tridiagonal form by using a unitary similarity transform CHECON or ZHECON CHEEV or ZHEEV Gl T al EVD or T ti EVR or ZHEEVR CHEEVX or ZHEEVX CHEGST or ZHEGST Estimates the reciprocal of the condition number of a Hermitian matrix using the factorization computed by CHETRF or ZHETRF Replacement with newer version CHEEVR or ZHEEVR suggested Computes all eigenvalues and eigenvectors of a Hermitian matrix simple driver Replacement with newer version CHEEVR or ZHEEVR suggested Computes all eigenvalues and eigenvectors of a Hermitian matrix and uses a divide and conquer method to calculate eigenvectors Computes selected eigenvalues and the eigenvectors of a complex Hermitian matrix Computes selected eigenvalues and eigenvectors of a Hermitian matrix expert driver Reduces
82. set architectures and for 64 bit code The optimization for each architecture is targeted at one implementation of that architecture and includes optimizations for other architectures when it does not degrade the performance of the primary target Compile with the most appropriate xarch option for best performance At link time use the same xarch option that was used at compile time to select the version of the Sun Performance Library optimized for a specific SPARC instruction set architecture Note Using SPARC specific optimization options increases application performance on the selected instruction set architecture but limits code portability When using these optimization options the resulting code can be run only on systems using the specific SPARC chip from Sun Microsystems and in some cases a specific Solaris operating environment 32 or 64 bit Solaris 7 or Solaris 8 The SunOS command isalist 1 can be used to display a list of the native instruction sets executable on a particular platform The names output by isalist are space separated and are ordered in the sense of best performance For a detailed description of the different xarch options refer to the Fortran User s Guide or C User s Guide To compile for 32 bit addressing in a 32 bit enabled Solaris operating environment a UltraSPARC I or UltraSPARC II systems use xarch v8plus or xarch v8plusa a UltraSPARC II systems use xarch v8plus o
83. stribu avec des licences qui en restreignent l utilisation la copie la distribution et la d compilation Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme par quelque moyen que ce soit sans l autorisation pr alable et crite de Sun et de ses bailleurs de licence s il y en a Le logiciel d tenu par des tiers et qui comprend la technologie relative aux polices de caract res est prot g par un copyright et licenci par des fournisseurs de Sun Des parties de ce produit pourront tre d riv es des syst mes Berkeley BSD licenci s par l Universit de Californie UNIX est une marque d pos e aux Etats Unis et dans d autres pays et licenci e exclusivement par X Open Company Ltd La notice suivante est applicable a Netscape Netscape Navigator et the Netscape Communications Corporation logo Copyright 1995 Netscape Communications Corporation Tous droits r serv s Sun Sun Microsystems the Sun logo docs sun com AnswerBookz2 Solaris SunOS JavaScript SunExpress Sun WorkShop Sun WorkShop Professional Sun Performance Library Sun Performance WorkShop Sun Visual WorkShop et Forte sont des marques de fabrique ou des marques d pos es ou marques de service de Sun Microsystems Inc aux Etats Unis et dans d autres pays Toutes les marques SPARC sont utilis es sous licence et sont des marques de fabrique ou des marques d pos es de SPARC International Inc aux Etats Unis et dans d autr
84. structurally symmetric and unsymmetric coefficient matrices m Specify a choice of ordering methods including user specified orderings The sparse solver package contains interfaces for FORTRAN 77 Fortran 95 and C interfaces are not currently provided To use the sparse solver routines from Fortran 95 use the FORTRAN 77 interfaces To use the sparse solver routines with C append an underscore to the routine name dgssin_ dgssor_ and so on pass arguments by reference and use 1 based array indexing Sun Performance Library User s Guide May 2000 Sparse Solver Matrix Data Formats Sparse matrices are usually represented in formats that minimize storage requirements By taking advantage of the sparsity and not storing zeros considerable storage space can be saved The storage format used by the general sparse solver is the compressed sparse column CSC format also called the Harwell Boeing format The CSC format represents a sparse matrix with two integer arrays and one floating point array The integer arrays colptr and rowind specify the location of the nonzeros of the sparse matrix and the floating point array values is used for the nonzero values The column pointer colptr array consists of n 1 elements where colptr i points to the beginning of the ith column and colptr i 1 1 points to the end of the ith column The row indices rowind array contains the row indices of the nonzero values The values arrays c
85. t bring a large payoff It might be worthwhile to evaluate the reference to K If it is a loop index it may be that the loops shown here are part of a larger code structure and loops over DGEMV or DGER can often be converted to some form of matrix multiplication If so a single call to a matrix multiplication routine will probably bring a much larger payoff than a loop over calls to DGER All Sun Performance Library routines are MT safe multithread safe Because the routines are MT safe additional performance is possible on MP platforms by using the auto parallelizing compiler to parallelize loops that contain calls to Sun Performance Library Chapter 2 Using Sun Performance Library 23 An example of an effective combination of a Sun Performance Library routine together with an auto parallelizing compiler parallelization directive is shown in the following example CSPAR DOALL DO T iy CALL DGBMV No transpose N N ALPHA A LDA B 1 I 1 BETA C 1 I 1 END DO Sun Performance Library contains a routine named DGBMV to multiply a banded matrix by a vector By putting this routine into a properly constructed loop it is possible to use the routines in Sun Performance Library to multiply a banded matrix by a matrix The compiler will not parallelize this loop by default because the presence of subroutine calls in a loop inhibits parallelization However because Sun Performance Library routines are MT saf
86. tations 1 2 and infinity norms rank 1 2 k and 2k updates Linear systems Solve full rank systems compute error bounds solve Sylvester equations refine a computed solution equilibrate a coefficient matrix Least squares Full rank generalized linear regression rank deficient linear equality constrained Eigenproblems Eigenvalues generalized eigenvalues eigenvectors generalized eigenvectors Schur vectors generalized Schur vectors Matrix factorizations or decompositions SVD generalized SVD QL and LQ OR and RQ Cholesky LU Schur LDL and UDUT Support operations Condition number in place or out of place transpose inverse determinant inertia Sparse matrices Solve symmetric structurally symmetric and unsymmetric coefficient matrices using direct methods and a choice of fill reducing ordering algorithms including user specified orderings Chapter1 Introduction 13 m Convolution and correlation in one and two dimensions a Fast Fourier transforms Fourier synthesis cosine and quarter wave cosine transforms cosine and quarter wave sine transforms Complex vector FFTs and FFTs in two and three dimensions Compatibility With Previous LAPACK Versions The Sun Performance Library routines that are based on LAPACK support the expanded capabilities and improved algorithms in LAPACK 3 0 but are completely compatible with both LAPACK 1 x and LAPACK 2 0 Maintaining compatibility with previous LAP
87. the order shown in TABLE 4 4 TABLE 4 4 Sparse Solver Routine Calling Order One Call Interface For solving single matrix Start DGSSFS Initialize order factor solve DGSSSL Additional solves optional repeat dgsss1 as needed DGSSDA Deallocate working storage Finish End of One Call Interface Chapter 4 Working With Matrices 51 TABLE 4 4 Sparse Solver Routine Calling Order Continued Regular Interface For solving multiple matrices with the same structure Start DGSSIN Initialize DGSSOR Order DGSSFA Factor DGSSSL Solve repeat dgssfa or dgsssl as needed DGSSDA Deallocate working storage Finish End of Regular Interface Sparse Solver Examples CODE EXAMPLE 4 1 shows solving a symmetric system using the one call interface and CODE EXAMPLE 4 2 on page 55 shows solving a symmetric system using the regular interface CODE EXAMPLE 4 1 Solving a Symmetric System One Call Interface my_system S cat example_lcall f program example_licall This program is an example driver that calls the sparse solver It factors and solves a symmetric system by calling the one call interface qaaagdaa implicit none integer neqns ier msglvl outunt ldrhs nrhs character mtxtyp 2 pivot l ordmthd 3 double precision handle 150 integer colstr 6 rowind 9 double precision values 9 rhs 5 xexpct 5 integer ae Sparse matrix structure and value arrays From George and Liu page
88. top Edition is identical to the former Sun Performance WorkShop Personal Edition except that the Fortran compilers in that product no longer support the creation of automatically parallelized or explicit directive based parallel code This capability is still supported in the Fortran compilers in Forte for High Performance Computing We appreciate your continued use of our development products and hope that we can continue to fulfill your needs into the future Contents Preface 1 Introduction 11 Libraries Included With Sun Performance Library 11 Netlib 12 Sun Performance Library Features 13 Mathematical Routines 13 Compatibility With Previous LAPACK Versions 14 Getting Started With Sun Performance Library 14 Enabling Trap 6 15 Using Sun Performance Library 17 Improving Application Performance 17 Replacing Routines With Sun Performance Library Routines 17 Improving Performance of Other Libraries 18 Using Tools to Restructure Code 18 Fortran 77 95 Interfaces 19 Using Fortran 95 Features 19 Fortran Examples 22 C Interfaces 24 C Examples 26 3 SPARC Optimization and Parallel Processing 29 Using Sun Performance Library on SPARC Platforms 29 Compiling for SPARC Platforms 30 Compiling Code for 64 Bit UltraSPARC 31 Optimizing for Parallel Processing 33 Specifying the Parallelization Mode 33 Starting Threads 34 Parallel Processing Examples 35 4 Working With Matrices 39 Matrix Storage Schemes 39 Banded Storage 40 Packe
89. torization of two matrices Computes the generalized singular value decomposition Computes an orthogonal or unitary matrix as a preprocessing step for calculating the generalized singular value decomposition General Tridiagonal Matrix xXGTCON XGTRFS xGTSV XGTSVX Estimates the reciprocal of the condition number of a tridiagonal matrix using the LU factorization as computed by xGTTRF Refines solution to a general tridiagonal system of linear equations Solves a general tridiagonal system of linear equations simple driver Solves a general tridiagonal system of linear equations expert driver Sun Performance Library User s Guide May 2000 TABLE A 1 LAPACK Linear Algebra Package Routines Continued Routine Function XGTTRF Computes an LU factorization of a general tridiagonal matrix using partial pivoting and row exchanges xGTTRS Solves general tridiagonal system of linear equations using the factorization computed by x Hermitian Band Matrix CHBEV or ZHBEV CHBEVD or ZHBEVD CHBEVX or ZHBEVX CHBGST or ZHBGST CHBGV or ZHBGV CHBGVD or ZHBGVD CHBGVX or ZHBGVX CHBTRD or ZHBTRD Hermitian Matrix Replacement with newer version CHBEVD or ZHBEVD suggested Computes all eigenvalues and eigenvectors of a Hermitian band matrix Computes all eigenvalues and eigenvectors of a Hermitian band matrix and uses a divide and conquer method to calculate eigenvectors Compute
90. trix vector product y lt Ax y which can be performed with the DGEMV subroutine As another example consider the following code fragment DO I 1 N IF V2 I K LT 0 0 THEN v2 I K 0 0 LSE DO J 1 M X J I X J I V1 J K V2 I K END DO END IF END DO 22 Sun Performance Library User s Guide May 2000 In other cases a block of code can be equivalent to several Sun Performance Library calls or contain a mixture of code that can be replaced together with code that has no natural replacement in Sun Performance Library One way to rewrite the code with Sun Performance Library is shown below DO I 1 N IF V2 I K LT 0 0 THEN V2 I K 0 0 END IF END DO CALL DGER M N 1 0D0 X LDX V1 1 K 1 V2 1 K 1 An f95 specific example is also shown M N 1 0D0 X LDX V1 1 K 1 V2 1 K 1 The code to replace negative numbers with zero in V2 has no natural analog in Sun Performance Library so that code is pulled out of the outer loop With that code removed to its own loop the rest of the loop can be recognized as being a rank 1 update of the general matrix x which can be accomplished using the DGER routine from BLAS Note that if there are many negative or zero values in V2 it may be that the majority of the time is not spent in the rank 1 update and so replacing that code with the call to DGER might no
91. umber of available processors 3 Link with xlic_lib sunperf specified at the end of the command line Compile and link with parallel explicitpar or autopar 36 Sun Performance Library User s Guide May 2000 For example compile and link with Libsunperf_mt so default cc dalign xarch xparallel any c xlic_lib sunperf or 77 dalign xarch parallel any f xlic_lib sunperf or 95 dalign xarch parallel any f95 xlic_lib sunperf For example compile and link with libsunperf_mt a statically cc dalign xarch xparallel any c Bstatic xlic_lib sunperf Bdynamic or 77 dalign xarch parallel any f Bstatic xlic_lib sunperf Bdynamic or 95 dalign xarch parallel any f95 Bstatic xlic_lib sunperf Bdynamic Chapter 3 SPARC Optimization and Parallel Processing 37 38 Sun Performance Library User s Guide May 2000 CHAPTER 4 Working With Matrices Most matrices can be stored in ways that save both storage space and computation time Sun Performance Library uses the following storage schemes a Banded storage m Packed storage The Sun Performance Library processes matrices that are in one of four forms General Triangular Symmetric Tridiagonal Storage schemes and matrix types are described in the following sections Matrix Storage Schemes Some Sun Performance Library routines that work with arrays stored normally have
92. using an orthogonal or unitary similarity transformation Determines error bounds and estimates for triangular system of a linear equations Reorders Schur factorization of matrix to group selected cluster of eigenvalues in the leading positions on the diagonal of the upper triangular matrix T and the leading columns of Q form an orthonormal basis of the corresponding right invariant subspace Estimates the reciprocal condition numbers of selected eigenvalues and eigenvectors of an upper quasi triangular matrix Solves Sylvester matrix equation Computes the inverse of a triangular matrix Solves a triangular system of linear equations Trapezoidal Matrix xTZROF XTZRZE Depreciated routine replaced by routine xTZRZF Reduces a rectangular upper trapezoidal matrix to upper triangular form by means of orthogonal transformations Unitary Matrix CUNGBR or ZUNGBR CUNGHR or ZUNGHR CUNGLO or ZUNGLO CUNGOL or ZUNGOL CUNGOR or ZUNGOR CUNGRO or ZUNGROQ CUNGTR or ZUNGTR Generates the unitary transformation matrices from reduction to bidiagonal form as determined by CGEBRD or ZGEBRD Generates the orthogonal transformation matrix reduced to Hessenberg form as determined by CGEHRD or ZGEHRD Generates a unitary matrix Q from an LQ factorization as returned by CGELOF or ZGELOF Generates a unitary matrix Q from a QL factorization as returned by CGEQLF or ZGEQLF Generates a unitary matrix Q
93. utine Function COSQB DCOSQB Cosine quarter wave synthesis VCOSQB VDCOSQB COSQF DCOSQF Cosine quarter wave transform VCOSQF VDCOSQF COSQI DCOSQI Initialize cosine quarter wave transform and synthesis VCOSQI VDCOSQT COST DCOST Cosine even wave transform VCOST VDCOST COSTI DCOSTI Initialize cosine even wave transform VCOSTI VDCOSTI EZFFTB EZ Fourier synthesis EZFFTF EZ Fourier transform EZFFTI Initialize EZ Fourier transform and synthesis RFFTB DFFTB Fourier synthesis CFFTB ZFFTB VRFFTB VDFFTB VCFFTB VZFFTB RFFTF DFFTF Fourier transform CFFTF ZFFTF VRFFTF VDFFTF VCFFTF VZFFTE RFFTI DFFTI Initialize Fourier transform and synthesis CFFTI ZFFTI VRFEFTI VDFFTI VCFETI VZFFTI SINQB DSINQB Sine quarter wave synthesis VSINQB VDSINQB SINQF DSINOF Sine quarter wave transform VSINQOQF VDSINQF SINQI DSINOQI Initialize sine quarter wave transform and synthesis VSINQI VDSINQI Sun Performance Library User s Guide May 2000 TABLE A 7 FFTPACK and VFFTPACK Fast Fourier Transform and Vectorized Fast Fourier Transform Routines Continued Routine Function SINT DSINT VSINT VDSINT SINTI DSINT Sine odd wave transform Initialize sine odd wave transform VSINTI VDSINTI RFFT2B DFFT2B CFFT2B ZFFT2B RFFT2F DFFT2F CFFT2F ZFFT2F RFFT2I DFFT2I CFFT21I ZFFT2I RFFT3B DFFT3B CFFT3B ZFFT3B RFFT3F DFFT3F CFFT3F DFFT3F RFF
94. veloper 6 About Sun WorkShop 6 Sun WorkShop 6 Release Documentation Documents What s New in Sun WorkShop 6 Sun WorkShop 6 Release Notes Description Describes the documentation available with this Sun WorkShop release and how to access it Provides information about the new features in the current and previous release of Sun WorkShop Contains installation details and other information that was not available until immediately before the final release of Sun WorkShop 6 This document complements the information that is available in the component readme files Forte Developer 6 Analyzing Program Sun WorkShop 6 Performance With Sun WorkShop 6 Debugging a Program With dbx Explains how to use the new Sampling Collector and Sampling Analyzer with examples and a discussion of advanced profiling topics and includes information about the command line analysis tool er_print the LoopTool and LoopReport utilities and UNIX profiling tools prof gprof and tcov Provides information on using dbx commands to debug a program with references to how the same debugging operations can be performed using the Sun WorkShop Debugging window Preface TABLE P 3 Related Sun WorkShop 6 Documentation by Document Collection Continued Document Collection Document Title Description Introduction to Sun WorkShop Acquaints you with the basic program development features of the Sun WorkShop integrated progr
95. ver error number Solving an Unsymmetric System Regular Interface Continued i sol xexpct values my_system s 95 dalign example_uu f xlic_lib sunperf my_system a out i Owe WNnN H D OD Oro rhs i 100000000000D 4 200000000000D 4 300000000000D 4 400000000000D 4 500000000000D 4 O1 O1 O1 O1 O1 D O O O expected rhs i 100000000000D1 200000000000D 4 300000000000D 4 400000000000D 4 500000000000D1 Chapter 4 O1 O1 O1 O1 O1 om E OO e Oo ier sparse solver error number error 000000000000D4 000000000000D1 000000000000D 4 000000000000D1 000000000000D 4 Working With Matrices 00 00 00 00 00 63 64 Sun Performance Library User s Guide May 2000 APPENDIX A Sun Performance Library Routines This appendix lists the Sun Performance Library routines by library routine name and function For a description of the function and a listing of the Fortran and C interfaces refer to the section 3P man pages for the individual routines For example to display the man page for the SBDSOQR routine type man s 3P sbdsqr The man page routine names use lowercase letters For many routines separate routines exist that operate on different data types Rather than list each routine separately a lowercase x is used in a routine name to denote single double complex and double complex data types For example the routine xB
96. void promoting all integers change INTEGER or INTEGER 4 declarations to INTEGER 8 When passing constants in Fortran 95 code that have not been compiled with xtypemap append _8 to literal constants to effect the promotion For example when using Fortran 95 change CALL DSCAL 20 5 26D0 X 1 to CALL DSCAL 20_8 5 26D0 X 1_8 This example assumes USE SUNPERF is included in the code The following example shows calling CAXPY from FORTRAN 77 or Fortran 95 using 32 bit arguments SUBROUTINE CAXPY N ALPHA X INCY Y INCY COMPLEX ALPHA INTEGER INCX INCY N COMPLEX X Y Chapter 3 SPARC Optimization and Parallel Processing 31 32 The following example shows calling CAXPY from FORTRAN 77 or Fortran 95 without the USE SUNPERF statement using 64 bit arguments SUBROUTINE CAXPY_64 COMPLEX ALPHA INTEGER 8 INCX INCY N COMPLEX X Y N ALPHA X INCY INCY Y The following example shows calling CAXPY from Fortran 95 with the US Gl SUNPERF statement using 64 bit arguments SUBROUTINE CAXPY N ALPHA X COMPLEX ALPHA INTEGER 8 INCX INCY N COMPLEX X Y INCY INCY Yer In C routines the size of long is 32 bits when compiling for V8 or V8plus and 64 bits when compiling for V9 The following example shows calling the dgbcon routine using 32 b
97. wing ways to improve the speed of user code without making any code changes m Use Sun Performance Library routines instead of the base Netlib routines See the next section Replacing Routines With Sun Performance Library Routines m Use Sun Performance Library to speed up the other libraries if an application already uses libraries in addition to those in the Sun Performance Library See Improving Performance of Other Libraries on page 18 m Use tools that automatically modify an application to use Sun Performance Library See Using Tools to Restructure Code on page 18 Replacing Routines With Sun Performance Library Routines Many applications are built using one or more of the base Netlib libraries supported by the Sun Performance Library Third party vendors can also use BLAS and LAPACK as building blocks in their applications Because Sun Performance Library maintains the same interfaces and functionality of these libraries base Netlib routines can be replaced with Sun Performance Library routines 18 Sun Performance Library can be included in a user s development environment to improve application performance on single processor and multiprocessor MP platforms Sun Performance Library routines can be faster than the corresponding Netlib routines or routines provided by other vendors that perform similar functions The serial speed of many Sun Performance Library routines has been increased and many routines hav
98. y 4n 5 2r By 4 oS data values 4 0d0 1 0d0 2 0d0 0 5d0 2 0d0 0 5d0 amp 3 0d0 0 625d0 16 0d0 data rhs 7 000 3 0d0 7 0d0 4 0d0 4 0d0 data xexpct 2 0d0 2 0d0 1 0d0 8 0d0 0 5d0 c initialize solver mtxtyp pivot neqns outunt ll Conus A msglvl Chapter 4 Working With Matrices 55 56 CODE EXAMPLE 4 2 Solving a Symmetric System Regular Interface Continued Sun Performance Library User s Guide May 2000 c call regular interface c call dgssin mtxtyp pivot neqns colstr rowind amp outunt msglvl handle ier if ier ne 0 goto 110 c ordering and symbolic factorization c ordmthd mmd call dgssor ordmthd handle ier if ier ne 0 goto 110 o numeric factorization c call dgssfa neqns colstr rowind values handle ier if ier ne 0 goto 110 solution nrhs 1 ldrhs 5 call dgsssl nrhs rhs ldrhs handle ier if ier ne 0 goto 110 c deallocate sparse solver storage c call dgssda handle ier if ier ne 0 goto 110 print values of sol write 6 200 i rhs i expected rhs i error do i 1 neqns write 6 300 i rhs i xexpct i rhs i xexpct i enddo stop 110 continue CODE EXAMPLE 4 2 Solving a Symmetric System Regular Interface Continued c c call to sparse solver returns an error write 6 400 amp example FAILED sparse solver error number ier stop 2
99. ys reserved even if the values of the diagonal elements are known such as in a unit diagonal matrix Sun Performance Library User s Guide May 2000 An upper triangular matrix or a symmetric matrix whose upper triangle is stored in general storage in the array A can be transferred to packed storage in the array AP as shown below This code comes from the comment block of the LAPACK routine DTPTRI JC 1 DO J 1 N DO I 1 J AP JC I 1 A I J END DO JG UG E oJ END DO Similarly a lower triangular matrix or a symmetric matrix whose lower triangle is stored in general storage in the array A can be transferred to packed storage in the array AP as shown below JC 1 DO J 1 N DO 1 yN AP JC I 1 A I J END DO JC JC N J 1 END DO Matrix Types The general matrix form is the most common matrix and most operations performed by the Sun Performance Library can be done on general arrays In many cases there are routines that will work with the other forms of the arrays For example DGEMM will form the product of two general matrices and DTRMM will form the product of a triangular and a general matrix Chapter 4 Working With Matrices 41 General Matrices A general matrix is stored so that there is a one to one correspondence between the elements of the matrix and the elements of the array Element A of a matrix A is stored in element A I1 J of the correspon
100. ze opportunities to transform user written code sequences into calls to Sun Performance Library functions The following code sequence adapted from LAPACK shows one example int float a n b n largest largest a 0 for i 0 i lt n itt if a i gt largest largest a i if b i gt largest largest b i 26 Sun Performance Library User s Guide May 2000 There is no subroutine in Sun Performance Library that exactly replicates the functionality of the code above However the code can be accelerated by replacing it with the several calls to Sun Performance Library as shown below IE i large_index float a n b n largest large_index isamax n a l largest a large_index large_index isamax n b 1 if b large_index gt largest largest b large_index Note the differences between the call to the native C isamax in Sun Performance Library above and the call shown below to a comparable function in CLAPACK 1 Declare scratch variable to allow 1 to be passed by value int one l 2 Append underscore to conform to FORTRAN naming system 3 Pass all arguments even scalar input only by reference 4 Subtract one to convert from FORTRAN indexing conventions large_index isamax_ amp n a amp one l largest a large_index large_index isamax_ amp n b amp one l if b large_index gt largest largest b large_index

Forte/Sun Performance Library 6 Collection User`s Guide

Contents

Download Pdf Manuals

Related Search

Related Contents