Home
Intel(R) Math Kernel Library for Linux* OS User's Guide
Contents
1. DENDEARLY Terminates the problem after a few steps so that you can set up 10 or 20 HPL runs without monitoring them see how they all do and then only run the fastest ones to completion DENDEARLY assumes DASYOUGO You do not need to define both although it doesn t hurt To avoid the residual check for a problem that terminates early set the threshold parameter in HPL dat to a negative number when testing ENDEARLY It also sometimes gives a better picture to compile with DASYOUGO2 when using DENDEARLY Usage notes on DENDEARLY follow DENDEARLY stops the problem after a few iterations of DGEMM on the blocksize the bigger the blocksize the further it gets It prints only 5 or 6 updates whereas DASYOUGO prints about 46 or so output elements before the problem completes 1 A decomposition of a matrix into a product of a lower L and upper U triangular matrices 11 11 1 1 Intel Math Kernel Library User s Guide 11 12 Performance for DASYOUGO and DENDEARLY always Starts off at one speed slowly increases and then slows down toward the end because that is what LU does DENDEARLY is likely to terminate before it starts to slow down DENDEARLY terminates the problem early with an HPL Error exit It means that you need to ignore the missing residual results which are wrong as the problem never completed However you can get an idea what the initial performance was and if it looks good then run
2. Figure 10 1 Intel MKL Help in the Eclipse IDE amp Java Development User Guide E Platform Plug in Developer Guide E IDT PUGIN Developer Guide PDE Guide 8 APT in Edipse S E Intel R Math Kerne Library Help Intel Math Kernel Library Reference Manual 9 Legal Information C2 Overview 4 BLAS and Sparse BLAS Routines 2 LAPACK Routines Linear Equations C2 LAPACK Routines Least Squares and Eigenvalue Problems 02 LAPACK Auxiliary and Utility Routines O04 SeaLAPACK Routines 02 ScaLAPACK Auodliary and Utility Routines Sparse Solver Routines O Vector Mathematical Functions O Statistical Functions C2 Fourier Transform Functions M2 Interval Linear Solvers C2 Partial Differential Equations Support O4 Optimization Solvers Routines 4 Support Functions E U BLACS Routines w Appendices 10 2 Getting Assistance for Programming in the Eclipse IDE 1 0 Searching the Intel Web Site from the Eclipse IDE The Intel MKL plugin tunes Eclipse Help search to target http www intel com so that when you are connected to the Internet and run a search from the Eclipse Help pane the search hits at the site are shown through a separate link Figure 10 2 shows search results for VML Functions in Eclipse Help In the figure 1 hit means an entry hit to the respective site Click Intel com 1 hit to open the list of actual hits to the Intel Web site Figure 10 2 Hits to the Intel Web Site in the Eclipse
3. omp_get_thread_num CPU_ZERO amp new_mask 2 packages x 2 cores pkg x 1 threads core 4 total cores CPU_SET tid 0 0 2 amp new_mask 6 16 Managing Performance and Memory 6 Example 6 3 Setting An affinity Mask by Operating System Means Using the I ntel Compiler continued if sched_getaffinity 0 sizeof was_mask amp was_mask 1 printf Error sched_getaffinity d sizeof was_mask amp was_mask n tid if sched_setaffinity 0 sizeof new_mask amp new_mask 1 printf Error sched_setaffinity d sizeof new mask amp new_mask n tid printf tid d new_mask 08X was_mask 08X n tid unsigned int amp new_mask unsigned int amp was_mask Call Intel MKL FFT function return 0 See the Linux Programmer s Manual in man pages format for particulars of the sched_setaffinity function used in the above example Operating on Denormals The IEEE 754 2008 standard An IEEE Standard for Binary Floating Point Arithmetic defines denormal or subnormal numbers as non zero numbers smaller than the smallest possible normalized numbers for a specific floating point format Floating point operations on denormals are slower than on normalized operands because denormal operands and results are usually handled through a software assist mechanism rather than directly in hardware This software processing causes Intel MKL functions that consume denormals to run slower t
4. LP64 and MPI version These libraries are listed in Table 3 6 Table 3 7 or Table 3 8 For example for the A 32 architecture choose one of lmkl_blacs lmkl_blacs_intelmpi or 1lmkl_blacs_openmpi depending on the MPI version you use in particular for Intel MPI 3 x choose lmk1_blacs_intelmpi lt MKL core libraries gt iS lt MKL LAPACK amp MKL kernel libraries gt for ScaLAPACK and lt MKL kernel libraries gt for Cluster FFTs lt MKL kernel libraries gt are processor optimized kernels threading library and system library for threading support linked as described in section Listing Libraries on a Link Line lt MKL LAPACK amp kernel libraries gt are the LAPACK library and lt MKL kernel libraries gt grouping symbols W1 start group and W1 end group are required for static linking lt lt MPI gt linker script gt corresponds to the MPI version For instance for Intel MPI 3 x use lt Intel MPI 3 x linker script gt For example if you are using Intel MPI 3 x and want to statically use the LP64 interface with ScaLAPACK and have only one MPI process per core and thus do not employ threading specify the following linker options LSMKLPATH ISMKLINCLUDE W1 start group SMKLPATH libmk1_scalapack_1p64 a MKLPATH libmkl1_ blacs_intelmpi_ lp64 a SMKLPATH 1libmkl_ intel 1lp6 4 a MKLPATH libmk1_sequential a SMKLPATH libmkl_core a static_mpi Wl end group lpthread 1m For more examples see Ex
5. SMKLPATH libmkl_intel_thread a SMKLPATH libmk1_core a X liomp5 lpthread Examples for Linking a Fortran Application These examples illustrate linking of an application whose main module is in Fortran under the following conditions e Intel MPI 3 0 is installed in opt intel mpi 3 0 e S MKLPATH is a user defined variable containing lt mk1_directory gt lib 64 e You use the Intel Fortran Compiler 10 0 or higher To link with ScaLAPACK for a cluster of systems based on the IA 64 architecture use the following libraries opt intel mpi 3 0 bin mpiifort lt user files to link gt LSMKLPATH lmk1l_scalapack_1p64 lmkl1_blacs_intelmpi_ 1p64 lmk1_lapack lmkl_intel_1p64 lmkl_intel_thread 1lmkl_lapack lmkl_core liomp5 lpthread PO gO GO To link with Cluster FFT for a cluster of systems based on the IA 64 architecture use the following libraries opt intel mpi 3 0 bin mpiifort lt user files to link gt SMKLPATH libmk1l_cdft_core a SMKLPATH libmkl_blacs_intelmpi_ilp6 4 a SMKLPATH libmkl_ intel _ilp6 4 a SMKLPATH 1libmkl_ intel _thread a SMKLPATH 1libmkl_core a liomp5 lpthread a aa aa A binary linked with ScaLAPACK runs the same way as any other MPI application refer to the documentation that comes with the MPI implementation For instance the script mpirun is used in the case of MPICH2 and OpenMPI and a number of MPI processes is set by np In the case of MPICH 2 0 and all Intel MPIs you should start the
6. libmkl_intelmpi_ LP64 version of BLACS routines supporting Intel MPI 2 0 and 3 x 1p64 so and MPICH2 locale en_US Catalog of Intel MKL messages in English mkl_msg cat locale ja_JP Catalog of Intel MKL messages in Japanese mkl_msg cat 3 16 Intel Math Kernel Library Structure 3 Table 3 8 Detailed Structure of the I A 64 Architecture Directory lib 64 File Contents Static Libraries Interface layer libmkl_blas95_ilp6 4 a Fortran 95 interface library for BLAS for the Intel Fortran compiler Supports the ILP64 interface libmkl_blas95_ lp64 a Fortran 95 interface library for BLAS for the Intel Fortran compiler Supports the LP64 interface libmkl_intel_ilp 4 a ILP64 interface library for the Intel compilers libmkl_intel_lp6 4 a LP64 interface library for the Intel compilers libmkl_intel_sp2dp a SP2DP interface library for the Intel compilers libmkl_gf_ilp6 4 a ILP64 interface library for the GNU Fortran compiler libmkl_gf_lp64 a LP64 interface library for the GNU Fortran compiler libmkl_lapack95 ilp6 4 a Fortran 95 interface library for LAPACK for the Intel Fortran compiler Supports the ILP64 interface libmkl_lapack95_ lp64 a Fortran 95 interface library for LAPACK for the Intel Fortran compiler Supports the LP64 interface Threading layer libmkl_intel_thread a Threading library for the Intel compilers libmkl_gnu_thread a Threading library for the GNU Fortran and C compilers libmkl_sequential a Seq
7. replace lmk1_core with SMKLPATH 1libmk1_core a where SMKLPATH is the appropriate user defined environment variable See specific examples in the Linking Examples section lt files to link gt L lt MKL path gt I lt MKL include gt I lt MKL include gt 32 em 64t ilp64 1p64 64 ilp64 1p64 Imkl_blas 95 95 ilp64 95 1p64 Imkl_lapack 95 95_ i1p64 95 1p64 lt cluster components gt lmk1_ intel intel_ilp 4 intel_ 1p64 intel_sp2dp gf gf_ilp 4 gf_1p64 lmkl_ intel_thread gnu_thread pgi_thread sequential lmkl_lapack 1lmkl_core liomp5 lguide lpthread 1m See Selecting Libraries to Link for details of this syntax usage and specific recommendations on which libraries to link depending on your Intel MKL usage scenario See Working with the Intel Math Kernel Library Cluster Software on linking with libraries denoted as lt cluster components gt In case of static linking enclose the cluster components interface threading and computational libraries in grouping symbols for example W1 start group SMKLPATH libmkl_cdft_core a SMKLPATH libmkl blacs intelmpi_ilp 4 a SMKLPATH libmkl_intel_ilp6 4 a MKLPATH libmkl_intel_thread a SMKLPATH 1libmkl_core a W1 end group See specific examples in the Linking Examples section Linking Your Application with the Intel Math Kernel Library 5 The order of listing libraries on the link line is essential except for the libraries enclosed in
8. the grouping symbols above Selecting Libraries to Link This section recommends which libraries to link depending on your Intel MKL usage scenario and provides details of the linking in subsections Linking with Fortran 95 Interface Libraries Linking with Threading Libraries Linking with Computational Libraries Linking with Compiler Support RTLs Linking with System Libraries Linking Examples Linking with Fortran 95 Interface Libraries The libmk1_blas95 a and libmkl_lapack95 a libraries contain Fortran 95 interfaces for BLAS and LAPACK respectively which are compiler dependent In the Intel MKL package they are prebuilt for the Intel Fortran compiler If you are using a different compiler build these libraries before using the interface See Fortran 95 Interfaces to LAPACK and BLAS and Compiler dependent Functions and Fortran 90 Modules for more information Linking with Threading Libraries Several compilers that Intel MKL supports use the OpenMP threading technology Starting with version 10 0 Intel MKL supports implementations of the OpenMP technology that these compilers provide To make use of this support you need to link with the appropriate library in the Threading Layer and Compiler Support Run time Library RTL Threading Layer Each Intel MKL threading library contains the same code compiled by the respective compiler Intel gnu and PGI compilers on Linux OS RTL This layer includes
9. Architecture Directory lib em64t File Contents Computational layer libmkl_cdft_core a Cluster version of FFTs libmkl_core a Kernel library for the Intel 64 architecture libmk1l_scalapack_ ScaLAPACK routine library supporting the LP64 interface ilp6 4 a libmkl1_scalapack_ ScaLAPACK routine library supporting the LP64 interface lp64 a libmkl solver Deprecated Empty library for backward compatibility ilp64 a 7 libmkl_solver_ilp64_ Deprecated Empty library for backward compatibility sequential a libmkl_solver_lp 64 a Deprecated Empty library for backward compatibility libmkl_solver_1p64_ Deprecated Empty library for backward compatibility sequential a 3 13 3 Intel Math Kernel Library User s Guide Table 3 7 Detailed Structure of the Intel 64 Architecture Directory lib em6 4t File Contents RTL libguide a Legacy OpenMP run time library for static linking libiomp5 a Compatibility OpenMP run time library for static linking libmkl_blacs_ilp64 a ILP64 version of BLACS routines supporting the following MPICH versions Myricom MPICH version 1 2 5 10 e ANL MPICH version 1 2 5 2 libmkl_blacs_ ILP64 version of BLACS routines supporting Intel MPI 2 0 3 x and intelmpi_ilp64 a MPI CH2 libmkl_blacs_ LP64 version of BLACS routines supporting Intel MPI 2 0 3 x intelmpi_lp64 a MPI CH2 libmkl_blacs_ A soft link to intelmpi20 ilp 4 a lib em6 4t libmkl_ blacs_intelmpi_ilp6 4 a libmkl_blacs_ A soft
10. Benchmark for Clusters for distributed memory systems Intel Optimized LINPACK Benchmark for Linux OS Intel Optimized LINPACK Benchmark is a generalization of the LINPACK 1000 benchmark It solves a dense real 8 system of linear equations Ax b measures the amount of time it takes to factor and solve the system converts that time into a performance rate and tests the results for accuracy The generalization is in the number of equations N it can solve which is not limited to 1000 It uses partial pivoting to assure the accuracy of the results This benchmark should not be used to report LINPACK 100 performance as that is a compiled code only benchmark This is a shared memory SMP implementation which runs on a single platform Do not confuse this benchmark with e MP LINPACK which is a distributed memory version of the same benchmark e LINPACK the library which has been expanded upon by the LAPACK library Intel provides optimized versions of the LINPACK benchmarks to help you obtain high LINPACK benchmark results on your genuine Intel processor systems more easily than with the High Performance Linpack HPL benchmark Use this package to benchmark your SMP machine Additional information on this software as well as other Intel software performance products is available at http www intel com software products Contents The Intel Optimized LINPACK Benchmark for Linux OS contains the following files loc
11. Intel MKL threading The sequential mode may be helpful when using Intel MKL with programs threaded with some non Intel compilers or in other situations where you need a non threaded version of the library for instance in some MPI cases To set the sequential mode in the Threading layer choose the sequential library Add the POSIX threads library pthread to your link line for the sequential mode because the sequential library depends on pthread See also Directory Structure in Detail Using the Intel MKL Parallelism Avoiding Conflicts in the Execution Environment Linking Examples 1 Except the LAPACK deprecated routine lacon 3 5 3 Intel Math Kernel Library User s Guide Support for ILP64 Programming The Intel MKL ILP64 libraries use the 64 bit integer type necessary for indexing huge arrays with more than 231 1 elements whereas the LP64 libraries index arrays with the 32 bit integer type The LP64 and ILP64 interfaces are implemented in the Interface layer see Layered Model Concept and Directory Structure in Detail for more information The ILP64 interface provides for the following e Support huge data arrays with more than 231 1 elements e Enable compiling your Fortran code with the i8 compiler option The LP64 interface provides compatibility with the previous Intel MKL versions because LP64 is just a new name for the only interface that the Intel MKL versions lower than 9 1 pro
12. Intel package includes software developed at the University of Tennessee Knoxville Innovative Computing Laboratories and neither the University nor ICL endorse or promote this product Although HPL 2 0 is redistributable under certain conditions this particular package is subject to the Intel MKL license Intel MKL has introduced a new functionality into MP LINPACK which is called a hybrid build while continuing to support the older version The term hybrid refers to special optimizations added to take advantage of mixed OpenMP MPI parallelism If you want to use one MPI process per node and to achieve further parallelism by means of OpenMP use the hybrid build In general the hybrid build is useful when the number of MPI processes per core is less than one If you want to rely exclusively on MPI for parallelism and use one MPI per core use the non hybrid build In addition to supplying certain hybrid prebuilt binaries Intel MKL supplies some hybrid prebuilt libraries for Intel MPI to take advantage of the additional OpenMP optimizations LINPACK and MP LINPACK Benchmarks 1 1 If you wish to use an MPI version other than Intel MPI you can do so by using the MP LINPACK source provided You can use the source to build a non hybrid version that may be used in a hybrid mode but it would be missing some of the optimizations added to the hybrid version Non hybrid builds are the default of the source code makefiles provided In so
13. You do not need to change your source code to use the functions To call them e Include the header file mkl boost ublas matrix prod hpp in your code from the Intel MKL include directory e Add appropriate Intel MKL libraries to the link line see Linking Your Application with the Intel Math Kernel Library The list of expressions that are substituted follows prod ml m2 prod trans m1 m2 prod trans conj m1 m2 prod conj trans m1 m2 prod m1 trans m2 prod trans m1 trans m2 prod trans conj m1 trans m2 prod conj trans m1 trans m2 prod ml trans conj m2 prod trans m1 trans conj m2 prod trans conj m1 trans conj m2 prod conj trans m1 trans conj m2 prod ml conj trans m2 prod trans m1 conj trans m2 7 11 7 Intel Math Kernel Library User s Guide prod trans conj m1 conj trans m2 prod conj trans m1 conj trans m2 These expressions are substituted in the release mode only with NDEBUG preprocessor symbol defined Supported uBLAS versions are Boost 1 34 1 1 35 0 1 36 0 and 1 37 0 To get them visit www boost org A code example provided in the lt mk1_directory gt examples ublas source sylvester cpp file illustrates usage of the Intel MKL uBLAS header file for solving a special case of the Sylvester equation To run the Intel MKL ublas examples specify the BOOST ROOT parameter in the ma
14. are relevant to the Intel MKL cluster software are libiomp and libguide which are the libraries for the OpenMP code compiled with an Intel compiler Both libiomp and libguide support the threaded code in Intel MKL In other cases where RTL dependencies might arise the functions are delivered as source code and you need to compile the code with whatever compiler you are using for your application In particular Fortran 90 modules result in the compiler specific code generation requiring RTL support so Intel MKL delivers these modules as source code Language specific Usage Options 7 Mixed language Programming with Intel MKL Appendix A lists the programming languages supported for each Intel MKL function domain However you can call Intel MKL routines from different language environments This section explains how to do this using mixed language programming Calling LAPACK BLAS and CBLAS Routines from C Language Environments Not all Intel MKL function domains support both C and Fortran environments To use Intel MKL Fortran style functions in C C environments you should observe certain conventions which are discussed for LAPACK and BLAS in the subsections below require skills in manipulating the descriptor of a deferred shape array which is the Fortran 90 type Moreover BLAS95 LAPACK95 routines contain links to a Fortran RTL CAUTION Avoid calling BLAS95 LAPACK95 from C C Such calls LAPACK and BLAS Becaus
15. daemon before running an application the execution is driven by the script mpiexec For further linking examples see the support website for Intel products at http www intel com software products support 9 5 Getting Assistance for Programming in the Eclipse IDE This chapter discusses features of the Intel Math Kernel Library Intel MKL that assist you while programming in the Eclipse DE e The Intel MKL Reference Manual viewable from within the IDE e Eclipse Help search tuned to target the Intel Web sites e Context sensitive help in the Eclipse C C Development Tools CDT e Code Content Assist in the Eclipse CDT The Intel MKL plugin for Eclipse Help provides the first three features see Table 3 2 for the plugin location after installation To use the plugin copy it to the plugins folder of your Eclipse directory The last feature is native to the Eclipse CDT See the Code Assist section in Eclipse Help Viewing the Intel MKL Reference Manual in the Eclipse IDE To view the Reference Manual in Eclipse 1 Select Help gt Help Contents from the menu 2 Inthe Help tab under All Topics click Intel R Math Kernel Library Help 3 Inthe Help tree that expands click Intel Math Kernel Library Reference Manual see Figure 10 1 The Intel MKL Help Index is also available in Eclipse and the Reference Manual is included in the Eclipse Help search 10 1 1 0 Intel Math Kernel Library User s Guide
16. domains support see Intel Math Kernel Library Language Interfaces Support Reason In case your function domain does not directly support the needed environment you can use mixed language programming see Mixed langquage Programming with Intel MKL For a list of language specific interface libraries and modules and an example how to generate them see also Using Language Specific Interfaces with Intel MKL If your system is based on the Intel 64 or A 64 architecture identify whether your application performs calculations with huge data arrays of more than 231 1 elements Reason To operate on huge data arrays you need to select the ILP64 interface where integers are 64 bit otherwise use the default LP64 interface where integers are 32 bit see Support for ILP64 Programming Identify whether and how your application is threaded Threaded with the Intel compiler e Threaded with a third party compiler s Not threaded Reason The compiler you use to thread your application determines which threading library you should link with your application For applications threaded with a third party compiler you may need to use Intel MKL in the sequential mode for more information see Sequential Mode of the Library and Linking with Threading Libraries Getting Started 2 Table 2 2 What You Need to Know Before You Begin continued Number of threads Determine the number of threads you want Intel MKL to
17. for an example of how to generate these libraries and modules See Appendix G in the Intel MKL Reference Manual for details of FFTW to Intel MKL wrappers Fortran 95 Interfaces to LAPACK and BLAS Fortran 95 interfaces are compiler dependent Intel MKL provides the interface libraries and modules precompiled with the Intel Fortran compiler Additionally the Fortran 95 interfaces and wrappers are delivered as sources For more information see Compiler dependent Functions and Fortran 90 Modules If you are using a different compiler build the appropriate library and modules with your compiler and link the library as a user s library 1 Go to the respective directory lt mk1_directory gt interfaces blas95 or lt mkl_directory gt interfaces lapack95 2 Type one of the following commands make 1ib32 INSTALL DIR lt user_dir gt for the A 32 architecture make libem64t interface 1p64 ilp64 for the Intel 64 architecture INSTALL DIR lt user_dir gt make 1ib64 interface 1p64 ilp64 for the A 64 architecture INSTALL DIR lt user_dir gt NOTE Parameter INSTALL DIR is required As a result the required library is built and installed in the lt user dir gt lib lt arch gt directory and the mod files will be built and installed in the lt user dir gt include lt arch gt 1lp64 ilp64 directory where lt archs gt is one of 32 em64t 64 By default the ifort compiler is assumed You may change the compile
18. for linking with the Absoft compilers as well 3 11 3 Intel Math Kernel Library User s Guide Table 3 7 Detailed Structure of the Intel 64 Architecture Directory lib em6 4t File Contents Static Libraries Interface layer libmkl_blas95_ Fortran 95 interface library for BLAS for the Intel Fortran ilp64 a compiler Supports the ILP64 interface libmkl_blas95_ Fortran 95 interface library for BLAS for the Intel Fortran lp64 a compiler Supports the LP64 interface libmkl_gf_ilp64 a ILP64 interface library for the GNU Fortran and Absoft compilers libmkl_gf_lp64 a LP64 interface library for the GNU Fortran and Absoft compilers libmkl_intel_ilp64 a ILP64 interface library for the Intel compilers libmkl_intel_lp64 a LP64 interface library for the Intel compilers libmkl_intel_sp2dp a SP2DP interface library for the Intel compilers libmkl_lapack95_ Fortran 95 interface library for LAPACK for the Intel Fortran ilp64 a compiler Supports the ILP64 interface libmkl_lapack95_ Fortran 95 interface library for LAPACK for the Intel Fortran lp64 a compiler Supports the LP64 interface Threading layer libmkl_gnu _thread a Threading library for the GNU Fortran and C compilers libmkl_intel_thread a Threading library for the Intel compilers libmkl_pgi_thread a Threading library for the PGI compiler libmkl_sequential a Sequential library 3 12 Intel Math Kernel Library Structure 3 Table 3 7 Detailed Structure of the Intel 64
19. how to replace the memory functions that Intel MKL uses by default with your own functions Using the Intel MKL Parallelism Intel MKL is extensively parallelized The following routines and functions are threaded e Direct sparse solver e LAPACK Linear equations computational routines factorization getrf gbtrf potrf pptrf sytrf hetrf sptrf hptrf solving gbtrs gttrs pptrs pbtrs pttrs sytrs sptrs hptrs tptrs tbtrs Orthogonal factorization computational routines geqrf ormqr unmqr ormlg unmlg ormql unmgql ormrg unmrq Singular Value Decomposition computational routines gebrd bdsqr Symmetric Eigenvalue Problems computational routines sytrd hetrd sptrd hptrd steqr stedc Note that a number of other LAPACK routines which are based on threaded LAPACK or BLAS routines make effective use of parallelism gesv posv gels gesvd syev heev etc 6 1 6 Intel Math Kernel Library User s Guide 1 e Levell and Level2 BLAS functions Levell BLAS axpy copy swap ddot sdot drot srot Level2 BLAS gemv trmv dsyr ssyr dsyr2 ssyr2 dsymv ssymv Note that these functions are threaded only for Intel 64 architecture Intel Core 2 Duo and Intel Core i7 processors e All Level 3 BLAS and all Sparse BLAS routines except Level 2 Sparse Triangular solvers e VML e FFT Intel MKL is thread safe which means tha
20. link to intelmpi20 lp64 a lib em6 4t libmkl_blacs_intelmpi_lp 4 a libmkl_ blacs lp64 a LP64 version of BLACS routines supporting the following MPI CH versions Myricom MPICH version 1 2 5 10 e ANL MPICH version 1 2 5 2 libmkl blacs ILP64 version of BLACS routines supporting OpenMPI openmpi i lp64 a libmkl blacs LP64 version of BLACS routines supporting OpenMPI openmpi 1p64 a libmkl_blacs_ ILP64 version of BLACS routines supporting SGI MPT sgimpt_ilp 4 a libmkl_blacs_ LP64 version of BLACS routines supporting SGI MPT sgimpt_lp6 4 a 3 14 Intel Math Kernel Library Structure 3 Table 3 7 Detailed Structure of the Intel 64 Architecture Directory lib em64t File Contents Dynamic Libraries Interface layer libmk1 libmk1 libmkl_gf_ilp64 so libmkl_gf_lp64 so libmkl_intel_ilp64 so _intel_1p64 so _intel_sp2dp so Threading layer libmk1l_gnu_thread so libmkl_intel_ thread so libmk1l_pgi_thread so libmk1_sequential so ILP64 interface library for the GNU Fortran and Absoft compilers LP64 interface library for the GNU Fortran and Absoft compilers ILP64 interface library for the Intel compilers LP64 interface library for the Intel compilers SP2DP interface library for the Intel compilers Threading library for the GNU Fortran and C compilers Threading library for the Intel compilers Threading library for the PGI compiler Sequential library 3 15 3 Intel Math Kernel Library User s
21. picks up the chosen interface and the computational library uses interfaces and OpenMP implementation or non threaded mode chosen in the first two layers To learn which libraries to link with your application see Chapter 5 Table 3 3 provides more details of each layer Table 3 3 Layer Interface Layer Threading Layer 3 4 Intel MKL Layers Description Matches compiled code of your application with the threading and or computational parts of the library This layer provides This This LP64 and ILP64 interfaces see Support for LP64 Programming for details Compatibility with compilers that return function values differently A mapping between single precision names and double precision names for applications using Cray style naming SP2DP interface SP2DP interface supports Cray style naming in applications targeted for the Intel 64 or A 64 architecture and using the LP64 interface SP2DP interface provides a mapping between single precision names for both real and complex types in the application and double precision names in Intel MKL BLAS and LAPACK Function names are mapped as shown in the following example for BLAS functions GEMM SGEMM gt DGEMM DGEMM gt DGEMM CGEMM gt ZGEMM ZGEMM gt ZGEMM Mind that no changes are made to double precision names layer Provides a way to link threaded Intel MKL with different threading compilers Enables you to link with a threaded or seq
22. range uplo n a lda vl vu il iu abstol m w z ldz work lwork iwork ifail info where a is the dimension 1da by n which is at least N elements instead of the packed routine call dspevx jobz range uplo n ap vl vu il iu abstol m w Z ldz work iwork ifail info where ap is the dimension N N 1 2 Managing Performance and Memory 6 FFT Functions Additional conditions can improve performance of the FFT functions Applications based on the IA 32 or Intel 64 architecture The addresses of the first elements of arrays and the leading dimension values in bytes n element_size of two dimensional arrays should be divisible by cache line size which equals e 32 bytes for the Intel Pentium III processors e 64 bytes for the Intel Pentium 4 processors and processors using Intel 64 architecture Applications based on the IA 64 architecture Leading dimension values in bytes n element_size of two dimensional arrays should not be a power of two Hardware Configuration Tips Dual Core Intel Xeon processor 5100 series systems To get the best Intel MKL performance on Dual Core Intel Xeon processor 5100 series systems enable the Hardware DPL streaming data Prefetcher functionality of this processor To configure this functionality use the appropriate BIOS settings as described in your BIOS documentation The use of Hyper Threading Technology Hyper Threading Technology HT Technol
23. script corresponding to your system architecture and command shell as explained in Table 2 1 Table 2 1 Scripts to Set the Environment Variables Architecture Shell Script File A 32 C mklvars32 csh 1A 32 Bash and Bourne sh mklvars32 sh Intel 64 C mklvarsem64t csh Intel 64 Bash and Bourne sh mklvarsem64t sh A 64 C mklvars64 csh 1A 64 Bash and Bourne sh mklvars64 sh For further configuring the library see Chapter 4 Using the Web based Linking Advisor Use the Intel MKL Linking Advisor to determine the libraries and options to specify on your link or compilation line The tool is available at http software intel com en us articles intel mkl link line advisor The Linking Advisor requests information about your system and on how you intend to use Intel MKL link dynamically or statically use threaded or sequential mode etc The tool automatically generates the appropriate link line for your application For more information on linking with Intel MKL see Chapter 5 and specifically Table 5 1 for a list of non cluster Intel MKL libraries to link against Using Intel MKL Code Examples Intel MKL package includes code examples located in the examples subdirectory of the installation directory Use the examples to determine e Whether Intel MKL is working on your system e How you should call the library e How to link the library 2 2 Getting Started 2 The examples are grouped in subdirectories mainly by
24. steps using different factorization methods etc It can be a large waste of time to run a huge problem to completion only to discover it ran 0 01 slower than your previous best problem There are 3 options to reduce the search time e DASYOUGO DENDEARLY DASYOUGO2 Use DASYOUGO2 cautiously because it does have a marginal performance impact To see DGEMM internal performance compile with DASYOUGO2 and DASYOUGO2_DISPLAY These options provide a lot of useful DGEMM performance information at the cost of around 0 2 performance loss If you want to use the old HPL simply omit these options and recompile from scratch To do this try make arch lt arch gt clean_arch all DASYOUGO Gives performance data as the run proceeds The performance always starts off higher and then drops because this actually happens in LU decomposition The ASYOUGO performance estimate is usually an overestimate because the LU decomposition slows down as it goes but it gets more accurate as the problem proceeds The greater the lookahead step the less accurate the first number may be ASYOUGO tries to estimate where one is in the LU decomposition that MP LINPACK performs and this is always an overestimate as compared to ASYOUGO2 which measures actually achieved DGEMM performance Note that the ASYOUGO output is a subset of the information that ASyYOUGO2 provides So refer to the description of the DASYOUGO2 option below for the details of the output
25. the layered linking model you must link your application with only one computational library However certain Intel MKL function domains require several computational link libraries For each Intel MKL function domain Table 5 3 lists computational libraries that you must include in the link line For more information on linking with ScaLAPACK and Cluster FFTs see also Linking with ScaLAPACK and Cluster FFTs Table 5 3 Function domain BLAS CBLAS Sparse BLAS LAPACK VML VSL Iterative Sparse Solvers Trust Region Solver FFT Trigonometric Transform Functions Poisson Library Direct Sparse Solver PARDI SO Solver ScaLAPACK ScaLAPACK LP64 interface ScaLAPACK ILP64 interface Linking Your Application with the Intel Math Kernel Library 5 Computational Libraries to Link by Function Domain l A 32 Architecture Static libmkl_core a libmkl_core a libmkl_scalapack _core a libmk1_core a n a n a Dynamic libmkl_core so libmk1_lapack so libmkl_core so libmk1_scalapack core so libmk1_lapack so libmk1_core so n a n a Intel 64 or A 64 Architecture Static libmkl_core a libmkl_core a See below libmk1_scalapack _lp64 a libmkl_core a libmkl_scalapack _ilp64 a libmkl_lapack so libmkl_core a Dynamic libmkl_core so libmkl_lapack so libmk1_core so See below libmk1_scalapack _1p64 so libmkl_lapack so libm
26. the problem to completion without DENDEARLY To avoid the error check you can set HPL s threshold parameter in HPL dat toa negative number Though DENDEARLY terminates early HPL treats the problem as completed and computes Gflop rating as though the problem ran to completion Ignore this erroneously high rating The bigger the problem the more accurately the last update that DENDEARLY returns is close to what happens when the problem runs to completion DENDEARLY is a poor approximation for small problems It is for this reason that you are suggested to use ENDEARLY in conjunction with ASYOUGO2 because ASYOUGO2 reports actual DGEMM performance which can be a closer approximation to problems just starting The best known compile options for Itanium 2 processor are with the Intel compiler 02 ipo ipo_obj ftz IPF fltacc IPF_fma unroll w tpp2 DASYOUGO2 Gives detailed single node DGEMM performance information It captures all DGEMM calls if you use Fortran BLAS and records their data Because of this the routine has a marginal intrusive overhead Unlike DASYOUGO which is quite non intrusive DASYOUGO2 interrupts every DGEMM call to monitor its performance You should beware of this overhead although for big problems it is less than 1 10th of a percent Here is a sample ASYOUGO2 output the first 3 non intrusive numbers can be found in ASYOUGO and ENDEARLY so it suffices to describe these numbers here
27. using OpenMP directives and compile the program with Intel compilers Intel MKL and the program will both use the same threading library Intel MKL tries to determine if it is in a parallel region in the program and if it is it does not spread its operations over multiple threads unless you specifically request Intel MKL to do so via the MKL_ DYNAMIC functionality However Intel MKL can be aware that it is in a parallel region only if the threaded program and Intel MKL are using the same threading library If your program is threaded by some other means Intel MKL may operate in multithreaded mode and the performance may suffer due to overuse of the resources Here are several cases with recommendations depending on the threading model you employ 6 Intel Math Kernel Library User s Guide Table 6 1 Model Threading model You thread the program using OS threads pthreads on Linux OS You thread the program using OpenMP directives and or pragmas and compile the program using a compiler other than a compiler from Intel There are multiple programs running on a multiple cpu system for example a parallelized program that runs using MPI for communication in which each processor is treated as a node How to Avoid Conflicts in the Execution Environment for Your Threading Discussion If more than one thread calls Intel MKL and the function being called is threaded it may be important that you turn off Intel MKL threading Se
28. ware Linking with ScaLAPACK and Cluster FFTS eeeeeeeeeeeeeeeeeea eee ees 9 1 Setting the Number of Threads ececeeeeeee ee eeeeeeeee eae eeeeeaeaeeeees 9 3 Using Shared Libraries 0 0 cece cece cece eee eee eee eee eee e nent nena 9 3 Building SCaLAPACK TeStsS ccceece eee eee ee eee teeta eee teen ea ee ene naes 9 4 Examples for Linking with ScaLAPACK and Cluster FFT 065 9 4 Examples for Linking a C Application cccceeeee eee eee eee ee ees 9 4 Examples for Linking a Fortran Application c eceeeeeeeeees 9 5 Getting Assistance for Programming in the Eclipse IDE Viewing the Intel MKL Reference Manual in the Eclipse IDE 10 1 Searching the Intel Web Site from the Eclipse IDE 08 10 3 Using Context Sensitive Help in the Eclipse IDE CDT 10 4 LINPACK and MP LINPACK Benchmarks Intel Optimized LINPACK Benchmark for Linux OS ae 11 1 COnteNntS snina i be ed ee Te naa ee eee ee 11 1 vi Intel Math Kernel Library User s Guide Running the Software c cece eee ec eee eee ee eee eee eee e eee eae eee 11 2 Known Limitations cece a e ee eee eee eee eee eee Taaa 11 3 Intel Optimized MP LINPACK Benchmark for Clusters 5 11 4 CONECHES sitter itis sah ora cela batt cole Moat ln E eek a ta2 eg 11 5 Building the MP LINPACK c cscs eects eee ee eee eats teeta eee
29. 1 i MKL_ Complex16 a N bIN c 7 8 Language specific Usage Options 7 Example 7 1 Calling a Complex BLAS Level 1 Function from C continued for i 0 i lt n i afi re double i a i im double i 2 0 b i re double n i b i im double i 2 0 zdotc amp c amp n a amp inca b amp incb printf The complex dot product is 6 2f 6 2f n c re c im Below is the C implementation Example 7 2 Calling a Complex BLAS Level 1 Function from C include lt complex gt include lt iostream gt define MKL Complex16 std complex lt double gt include mkl h define N 5 int main Int n inca 1 in b l i std complex lt double gt a N bIN c n WN for 1 07 1 lt n gt i afi std complex lt double gt i i 2 0 b i std complex lt double gt n 1i i 2 0 zdotc amp c amp n a amp inca b amp inch std cout lt lt The complex dot product is lt lt c lt lt std endl return 0 7 9 Fi Intel Math Kernel Library User s Guide The example below uses CBLAS Example 7 3 Using CBLAS Interface Instead of Calling BLAS Directly from C include mkl h typedef struct double re double im complex16 extern C void cblas_ zdotc_sub const int const complexl const int const complex16 const int const complexl16 define N 5 void main int n inca 1 incb
30. 1 i complex16 a N bIN c n N for i 0 i lt n i a i re double i a i im double i 2 0 b i re double n i b i im double i 2 0 cbhlas_zdotc_sub n a inca b incb amp c printf The complex dot product is 6 2f 6 2f n c re c im Support for Boost uBLAS Matrix matrix Multiplication If you are used to uBLAS you can perform BLAS matrix matrix multiplication in C using Intel MKL substitution of Boost uBLAS functions uBLAS is the Boost C open source library that provides BLAS functionality for dense packed and sparse matrices The library 7 10 Language specific Usage Options 7 uses an expression template technique for passing expressions as function arguments which enables evaluating vector and matrix expressions in one pass without temporary matrices uBLAS provides two modes Debug safe mode default Checks types and conformance e Release fast mode Does not check types and conformance To enable this mode use the NDEBUG preprocessor symbol The documentation for the Boost uBLAS is available at www boost org Intel MKL provides overloaded prod functions for substituting uBLAS dense matrix matrix multiplication with the Intel MKL gemm calls Though these functions break uBLAS expression templates and introduce temporary matrices the performance advantage can be considerable for matrix sizes that are not too small roughly over 50
31. 4 INTEGER MKL_ LONG without specifying KIND Browsing the Intel MKL Include Files The Reference Manual does not explain which integer parameters of a function become 64 bit and which remain 32 bit for ILP64 To get to know this browse the include files examples and tests for the ILP64 interface details For the location of these files see Table 3 2 Start with browsing the include files listed in Table A 2 Some function domains that support only a Fortran interface see Table A 1 provide header files for C C in the include directory Such h files enable using a Fortran binary interface from C C code These files can also be used to understand the ILP64 usage Limitations All Intel MKL function domains support LP64 programming with the following exceptions e FFTW interfaces to Intel MKL FFTW 2 x wrappers do not support ILP64 FFTW 3 2 wrappers support ILP64 by a dedicated set of functions plan _guru 4 e GMP arithmetic functions do not support ILP64 3 7 3 Intel Math Kernel Library User s Guide Directory Structure in Detail 3 8 The information in the tables below shows a detailed structure of the Intel MKL architecture specific directories For the list of additional interface libraries that can be generated in these directories using makefiles in the interfaces directory see Using Language Specific Interfaces with Intel MKL For the contents of the doc directory see Contents of the Documentati
32. ACS routines supporting Intel MPI 2 0 3 x and MPICH2 LP64 version of BLACS routines supporting Intel MPI 2 0 3 x and MPICH2 A soft link to lib 64 libmkl_blacs_intelmpi_ilp6 4 a A soft link to lib 64 libmk1l blacs_intelmpi_lp 4 a LP64 version of BLACS routines supporting the following MPICH versions s Myricom MPICH version 1 2 5 10 e ANL MPICH version 1 2 5 2 ILP64 version of BLACS routines supporting OpenMPI LP64 version of BLACS routines supporting OpenMPI ILP64 version of BLACS routines supporting SGI MPT LP64 version of BLACS routines supporting SGI MPT Table 3 8 continued File Dynamic Libraries Interface layer libmkl_gf_ilp64 so libmkl_gf_1p64 so libmk1l_intel_ilp64 so libmk1l_intel_1p64 so libmk1l_intel_sp2dp so Threading layer libmk1_gnu_thread so libmkl_intel_ thread so libmk1_sequential so Computational layer libmkl_core so libmkl_i2p so libmkl_lapack so libmkl_scalapack_ ilp64 so libmkl_scalapack_ lp64 so libmkl_vml_i2p so RTL libguide so libiomp5 so libmkl_blacs_ intelmpi_ilp6 4 so libmkl_blacs_ intelmpi_1p64 so locale en_US mk1l_msg cat locale ja_JP mkl_msg cat Intel Math Kernel Library Structure 3 Detailed Structure of the I A 64 Architecture Directory lib 64 Contents ILP64 interface library for the GNU Fortran compiler LP64 interface library for the GNU Fortran compiler ILP64 interface library for the Intel compilers LP64 interface library for th
33. ATH and NLSPATH Section Automating Setting of Environment Variables explains how to automate setting of these variables at startup For information on how to set up environment variables for threading see Setting the Number of Threads Using an OpenMP Environment Variable Automating Setting of Environment Variables To automate setting of the INCLUDE MKLROOT LD LIBRARY PATH MANPATH LIBRARY PATH CPATH FPATH and NLSPATH environment variables add mklvars sh to your shell profile so that each time you login the script automatically executes and sets the path to the appropriate Intel MKL directories To do this with a local user account edit the following files by adding the appropriate script to the path manipulation section right before exporting variables e bash bash_profile bash_login or profile setting up MKL environment for bash lt absolute_path_to_installed_MKL gt tools environment mklvars lt arch gt sh e sh profile setting up MKL environment for sh lt absolute_path_to_installed_MKL gt tools environment mklvars lt arch gt sh 4 1 4 Intel Math Kernel Library User s Guide e csh login setting up MKL environment for csh lt absolute path_to_installed_MKL gt tools environment mklvars lt arch gt csh In the above commands replace mklvars lt arch gt with mklvars32 mklvarsem 4t or mklvars64 If you have super user permissions you can add the same commands to a ge
34. Col 001280 Fract 0 050 Mflops 42454 99 DT 9 5 DF 34 1 DMF 38322 78 The problem size was N 16000 with a blocksize of 128 After 10 blocks that is 1280 columns an output was sent to the screen Here the fraction of columns completed is 1280 16000 0 08 Only up to 40 outputs are printed at various places through the matrix decomposition fractions 0 005 0 010 0 015 0 020 0 025 0 030 0 035 0 040 0 045 0 050 0 055 0 060 0 065 0 070 0 075 0 080 0 085 0 090 0 095 0 100 0 105 0 110 0 115 0 120 0 125 0 130 0 135 0 140 0 145 0 150 0 155 0 160 0 165 0 170 0 175 0 180 0 185 0 190 0 195 0 200 0 205 0 210 0 215 0 220 0 225 0 230 0 235 0 240 0 245 0 250 0 255 0 260 0 265 0 270 0 275 0 280 0 285 0 290 0 295 0 300 0 305 0 310 0 315 0 320 0 325 0 330 0 335 0 340 0 345 0 350 0 355 0 360 0 365 0 370 0 375 0 380 0 385 0 390 LINPACK and MP LINPACK Benchmarks 1 1 0 395 0 400 0 405 0 410 0 415 0 420 0 425 0 430 0 435 0 440 0 445 0 450 0 455 0 460 0 465 0 470 0 475 0 480 0 485 0 490 0 495 0 515 0 535 0 555 0 575 0 595 0 615 0 635 0 655 0 675 0 695 0 795 0 895 However this problem size is so small and the block size so big by comparison that as soon as it prints the value for 0 045 it was already through 0 08 fraction of the columns On a really big problem the fractional number will be more accurate It never prints more than the 112 numbers above So smaller problems will have fewer than 112 updates and the biggest problems will have precisely 112 update
35. GENE Sev E ENE See eur 3 17 Table 3 9 Contents of the doc DirectOry ccceeee eee eee e eee ene 3 20 Table 5 1 Typical Libraries to List on a Link Line 5 1 Table 5 2 Selecting Threading Libraries ccceeeeee eee scene ee ee ees 5 4 vii Contents Table 5 3 Computational Libraries to Link by Function Domain 5 5 Table 6 1 How to Avoid Conflicts in the Execution Environment for Your Threading Model cceceeeeee ee eee eee eens eee ae ee teeta states ena ed 6 4 Table 6 2 Environment Variables for Threading Controls 6 9 Table 6 3 Interpretation of MKL_DOMAIN_NUM_THREADS Values 6 12 Table 7 1 Interface Libraries and Modules c eeeeeeee eee eee ees 7 1 Table 11 1 Contents of the LINPACK Benchmark eeeeee 11 2 Table 11 2 Contents of the MP LINPACK Benchmark 008 11 5 List of Examples Example 6 1 Changing the Number of Threads ceceeeeeeeeeaes 6 5 Example 6 2 Setting the Number of Threads to One 0 c00eees 6 10 Example 6 3 Setting An affinity Mask by Operating System Means Using the Intel Compiler 0 c cece eee ee ee eee eect este tena teeta 6 16 Example 6 4 Redefining Memory Functions 2 seceeeeeeeeee ees 6 19 Example 7 1 Calling a Complex BLAS Level 1 Function from C 7 8 Example 7 2 Calling a Complex BLAS Level 1 Function from C 7 9 Example 7 3 Using CBLAS Interface Inst
36. Guide Table 3 7 Detailed Structure of the Intel 64 Architecture Directory lib em6 4t File Contents Computational layer libmkl_avx so Kernel optimized for the Intel Advanced Vector Extensions Intel AVX libmkl_core so Library dispatcher for dynamic load of processor specific kernel libmk1_def so Default kernel library libmkl_mc so Kernel library for processors based on the Intel Core microarchitecture libmkl_mc3 so Kernel library for the Intel Core i7 processors libmkl_lapack so LAPACK and DSS PARDISO routines and drivers libmkl_scalapack_ ScaLAPACK routine library supporting the ILP64 interface ilp64 so libmk1l_scalapack_ ScaLAPACK routine library supporting the LP64 interface 1p64 so libmkl_vml_avx so VML VSL optimized for the Intel Advanced Vector Extensions Intel AVX libmkl_vml_def so VML VSL part of default kernels libmkl_vml_mc so VML VSL for processors based on the Intel Core microarchitecture libmkl_vml_mc3 so VML VSL for the Intel Core i7 processors libmkl_vml_p4n so VML VSL for the Intel Xeon processor using the Intel 64 architecture libmk1l_vml_mc2 so VML VSL for 45nm Hi k Intel Core 2 and Intel Xeon proces sor families RTL libguide so Legacy OpenMP run time library for dynamic linking libiomp5 so Compatibility OpenMP run time library for dynamic linking libmkl_intelmpi_ ILP64 version of BLACS routines supporting Intel MPI 2 0 3 x and ilp64 so MPICH2
37. IDE Help Search ava Eclipse SDK E 5 Edit Navigate Search Project Run Window Help z k D RE DARE E a O a a a TT Elg Java An ouine enot il gt Gearch Search expression gt VML Functions s0 gt Local Help 1 10 of 93 hits Google 1 hit 4 Web Search Click on this link to see the results Ediipse org 1 hit Web Search Click on this link to see the results Intel com 1 hit Web Search Click on this link to see the results Pr 3 Ja i be O errors Owarnings Oinfos 7 t sr Descrption GoTo Al Topics Related Topics WD Bookmarks kal Index ES ai 10 3 1 0 Intel Math Kernel Library User s Guide Using Context Sensitive Help in the Eclipse IDE CDT You can view context sensitive help in the Eclipse CDT editor by Infopop windows and F1 Help Infopop Window Infopop window is a popup description of a C function j NOTE In the current release infopop windows are provided only for VML pP functions To get the description of an Intel MKL function in the editor hover the mouse over the function name Figure 10 3 Infopop Window with an Intel MKL Function Description est_3 cpp Eclipse Platform Search Project Run Window Help e TE CIE SE _mki_tes cpp B Outi xs Mak 7 0 MKLGetVersfion d baa Name d_backward_trig_transform Protoype void d_backward_trig_transform double f DFTI_ ipar doubl
38. Intel Math Kernel Library for Linux OS User s Guide March 2009 Document Number 314774 009US World Wide Web http www intel com software products intel Version Version Information Date 001 Original issue Documents Intel Math Kernel Library Intel MKL 9 0 gold September 2006 release 002 Documents Intel MKL 9 1 beta release Getting Started LINPACK and MP January 2007 LINPACK Benchmarks chapters and Support for Third Party and Removed Interfaces appendix added Existing chapters extended Document restructured List of examples added 003 Documents Intel MKL 9 1 gold release Existing chapters extended June 2007 Document restructured More aspects of ILP64 interface discussed Section Configuring the Eclipse IDE CDT to Link with Intel MKL added to chapter 3 Cluster content is organized into one separate chapter 9 Working with Intel Math Kernel Library Cluster Software and restructured appropriate links added 004 Documents Intel MKL 10 0 Beta release Layered design model has been September 2007 described in chapter 3 and the content of the entire book adjusted to the model Automation of setting environment variables at startup has been described in chapter 4 New Intel MKL threading controls have been described in chapter 6 The User s Guide for Intel MKL merged with the one for Intel MKL Cluster Edition to reflect consolidation of the respective products 005 Docu
39. Intel MKL function domains and programming languages For example the examples spblas subdirectory contains a makefile to build the Sparse BLAS examples and the examples vmlc subdirectory contains the makefile to build the C VML examples Source code for the examples is in the next level sources subdirectory See also High level Directory Structure Compiler Support Intel MKL supports compilers identified in the Release Notes However the library has been successfully used with other compilers as well Intel MKL provides a set of include files to simplify program development by specifying enumerated values and prototypes for the respective functions for the list of include files see Table A 2 Calling Intel MKL functions from your application without an appropriate include file may lead to incorrect behavior of the functions Before You Begin Using Intel MKL Before you begin using Intel MKL learning a few important concepts will help you get off to a good start as shown in Table 2 2 Table 2 2 What You Need to Know Before You Begin Target platform Identify the architecture of your target machine A 32 or compatible Intel 64 or compatible IA 64 Itanium processor family Reason Because Intel MKL libraries are located in directories corresponding to your particular architecture see Architecture Support you should provide proper paths on your link lines see Linking Examples To configure your development env
40. Interface mkl_dss 77 mkl_dss h mkl_dss f90 e RCI Iterative Solvers mkl rci fi mkl rci h e ILU Factorization Z Optimization Solver Routines mkl_rci fi mkl_rci h Vector Mathematical Functions mkl_vml f77 mkl_vml h mkl_vml fi A 2 Table A 2 Function domain Vector Statistical Functions Fourier Transform Functions Cluster Fourier Transform Functions Partial Differential Equations Support Routines e Trigonometric Transforms Poisson Solvers GMP interface Service routines Memory allocation routines MKL examples interface Intel Math Kernel Library Language Interfaces Support A Include Files continued Include files Fortran mkl_vml 77 mkl_vsl fi mkl dfti f90 mkl_cdft f90 mkl_trig_transforms f90 mkl_poisson f90 Cor C mkl_vsl h mkl_dfti h mkl_cdft h mkl_trig_transforms h mkl _poisson h mkl_gmp h mkl_service h i_malloc h mkl_example h A 3 Support for Third Party Interfaces This appendix briefly describes certain third party interfaces that Intel Math Kernel Library Intel MKL supports GMP Functions Intel MKL implementation of GMP arithmetic functions includes arbitrary precision arithmetic operations on integer numbers The interfaces of such functions fully match the GNU Multiple Precision GMP Arithmetic Library For specifications of these functions please see http www intel com software products mklI docs gnump WebHelp If you c
41. KLPATH ISMKLINCLUDE ISMKLINCLUDE 32 lmkl_blas95 Wl start group MKLPATH libmkl_intel a SMKLPATH 1libmkl_intel_thread a MKLPATH libmkl_ core a W1 end group liomp5 lpthread Linking on Intel 64 and IA 64 Architecture Systems In these examples MKLPATH SMKLROOT 1lib em64t for the Intel 64 architecture MKLPATH SMKLROOT 1ib ia64 for the A 64 architecture MKLINCLUDE MKLROOT include 1 Static linking of myprog f and parallel Intel MKL supporting the LP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE W1l start group MKLPATH libmkl intel lp64 a SMKLPATH libmkl_ intel _thread a SMKLPATH libmkl1_core a W1 end group liomp5 lpthread Dynamic linking of myprog f and parallel Intel MKL supporting the LP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE lmkl_intel_ 1p64 lmkl_intel_ thread lmkl_core liomp5 lpthread Static linking of myprog f and sequential version of Intel MKL supporting the LP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE W1l start group MKLPATH libmkl_ intel lp64 a SMKLPATH libmkl_sequential a MKLPATH libmkl_ core a W1 end group lpthread Dynamic linking of myprog f and sequential version of Intel MKL supporting the LP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE lmkl_intel_1p64 1lmkl_ sequential lmkl_core lpthread Static linking of myprog f and parallel Intel MKL supporting the ILP64 interface See Fortran 95 Interfaces to LAPACK and BLAS for infor
42. L FFTs libfftw2x_cdft_SINGLE a Single precision interfaces for MPI FFTW version 2 x C interface to call Intel MKL cluster FFTs libfftw2x cdft DOUBLE a Double precision interfaces for MPI FFTW version 2 x C E E interface to call Intel MKL cluster FFTs Modules in architecture and interface specific subdirectories of the Intel MKL include directory blas95 modt Fortran 95 interface module for BLAS BLAS95 lapack95 mod Fortran 95 interface module for LAPACK LAPACK95 95 precision mod Fortran 95 definition of precision parameters for BLAS95 E and LAPACK95 mk195 blas mod Fortran 95 interface module for BLAS BLAS95 identical to blas95 mod To be removed in one of the future releases mk195_lapack mod Fortran 95 interface module for LAPACK LAPACK95 identical to Lapack95 mod To be removed in one of the future releases 7 2 Language specific Usage Options 7 Table 7 1 Interface Libraries and Modules continued File name Contains mk195_precision mod Fortran 95 definition of precision parameters for BLAS95 and LAPACK95 identical to 95 precision mod To be removed in one of the future releases 1 Prebuilt for the Intel Fortran compiler 2 FFTW3 interfaces are integrated with Intel MKL Look into lt mk1 directory gt interfaces fftw3x makefile for options defining how to build and where to place the standalone library with the wrappers See Fortran 95 Interfaces to LAPACK and BLAS
43. L functions in the threaded or sequential mode The default value is parallel export lt file_name gt Specifies the full name of the file that contains the list of entry point functions to be included in the shared object The default name is user_list no extension name lt so name gt Specifies the name of the library to be created By default the name of the created library is mkl_custom so xerbla lt error_handler gt Specifies the name of the object file lt user_xerbla gt o that contains the user s error handler The makefile adds this error handler to the library for use instead of the default Intel MKL error handler xerbla If you omit this parameter the native Intel MKL xerbla is used See the description of the xerbla function in the Intel MKL Reference Manual on how to develop your own error handler Linking Your Application with the Intel Math Kernel Library 5 MKLROOT lt MKL_ directory gt Specifies the location of Intel MKL libraries used to build the custom shared object By default the builder uses the Intel MKL installation directory All parameters are optional In the simplest case the command line is make ia32 and the missing parameters have default values This command creates the mkl_custom so library for processors using the A 32 architecture The command takes the list of functions from the user_list file and uses the native Intel MKL error handler xerbla An example of a more complex case f
44. LAS ScaLAPACK routines Yes t DSS PARDISO solvers Yes Yes Yes Other Direct and Iterative Sparse Solver Yes Yes Yes routines Vector Mathematical Library VML functions Yes Yes Yes Vector Statistical Library VSL functions Yes Yes Yes Fourier Transform functions FFT Yes Yes Cluster FFT functions Yes Yes A Intel Math Kernel Library User s Guide Table A 1 Language Interfaces Support continued FORTRAN 77 Fortran 90 95 C C Function Domain interface interface interface Trigonometric Transform routines Yes Yes Fast Poisson Laplace and Helmholtz Solver Yes Yes Poisson Library routines Optimization Trust Region Solver routines Yes Yes Yes GMP arithmetic functions Yes Service routines including memory Yes allocation Supported using a mixed language programming call See Table A 2 for the respective header file Table A 2 lists available header files for all Intel MKL function domains Table A 2 Include Files Function domain Include files Fortran C or C All function domains mkl fi mk1 h BLAS Routines blas 90 mkl_blas h mkl_blas fi BLAS like Extension mkl_ trans fi mkl_trans h Transposition Routines CBLAS Interface to BLAS mkl_cblas h Sparse BLAS Routines mkl_spblas fi mkl_spblas h LAPACK Routines lapack f90 mkl_lapack h mkl_lapack fi ScaLAPACK Routines mkl_scalapack h All Sparse Solver Routines mkl_solver f90 mkl_solver h e PARDISO mkl_pardiso f77 mkl_pardiso h mkl_pardiso f90 e DSS
45. L_ FFT MKL VML lt uses gt lt space symbol gt lt space symbol gt lt equality sign gt lt comma symbol gt lt space symbol gt lt number of threads gt lt positive number gt lt positive number gt lt decimal positive number gt lt octal number gt lt hexadecimal number gt In the syntax above MKL_BLAs indicates the BLAS function domain MKL_FFT indicates non cluster FFTs and MKL_VML indicates the Vector Mathematics Library For example MKL ALL 2 MKL BLAS 1 MKL FFT 4 MKL ALL 2 MKL BLAS 1 MKL FFT 4 MKL ALL 2 MKL BLAS 1 MKL FFT 4 6 11 6 Intel Math Kernel Library User s Guide MKL ALL 2 MKL BLAS 1 MKL FFT 4 MKL ALL 2 MKL BLAS 1 MKL FFT 4 MKL ALL 2 MKL BLAS 1 MKL FFT 4 The global variables MKL ALL MKL_ BLAS MKL FFT and MKL VML as well as the interface for the Intel MKL threading control functions can be found in the mk1 h header file Table 6 3 illustrates how values of MKL_ DOMAIN NUM THREADS are interpreted Table 6 3 Interpretation of MKL_DOMAIN_NUM_THREADS Values Value of MKL DOMAIN NUM THREADS Interpretation MKL ALL 4 All parts of Intel MKL should try four threads The actual number of threads may be still different because of the MKL_ DYNAMIC setting or system resource issues The setting is equivalent to MKL NUM_THREADS 4 MKL ALL 1 MKL BLAS 4 All parts of Intel MKL should try one thread except for BLAS which is suggested to try fo
46. NPACK That is if you do not use the new options explained in section Options to Reduce Search Time these changes are disabled The primary purpose of the additions is to assist you in finding solutions HPL requires a long time to search for many different parameters In MP LINPACK the goal is to get the best possible number Given that the input is not fixed there is a large parameter space you must search over An exhaustive search of all possible inputs is improbably large even for a powerful cluster MP LINPACK optionally prints information on performance as it proceeds You can also terminate early Save time by compiling with DENDEARLY DASYOUGO2 described in the Options to Reduce Search Time section and using a negative threshold do not use a negative threshold on the final run that you intend to submit as a Top500 entry Set the threshold in line 13 of the HPL 2 0 input file HPL dat If you are going to run a problem to completion do it with DASYOUGO see Options to Reduce Search Time Using the quick performance feedback return to step 3 and iterate until you are sure that the performance is as good as possible LINPACK and MP LINPACK Benchmarks 1 1 Options to Reduce Search Time Running huge problems to completion on large numbers of nodes can take many hours The search space for MP LINPACK is also huge not only can you run any size problem but over a number of block sizes grid layouts lookahead
47. PARDISO routines and drivers Pentium 4 processor kernel library Kernel library for processors based on the Intel Core microarchitecture except Intel Core Duo and Intel Core Solo processors for which mkl_p4p so is intended Kernel library for the Intel Core i7 processors Kernel library for the Intel Pentium 4 processor with Streaming SIMD Extensions 3 SSE3 including Intel Core Duo and Intel Core Solo processors ScaLAPACK routines VML VSL part of default kernel for old Intel Pentium processors VML VSL default kernel for newer Intel architecture processors VML VSL part of Pentium 4 processor kernel VML VSL for processors based on the Intel Core microarchitecture VML VSL for 45nm Hi k Intel Core 2 and Intel Xeon processor families VML VSL for the Intel Core i7 processors VML VSL for Pentium 4 processor with Streaming SIMD Extensions 3 SSE3 Intel Math Kernel Library Structure 3 Table 3 6 Detailed Structure of the I A 32 Architecture Directory lib 32 continued File Contents RTL libguide so Legacy OpenMP run time library for dynamic linking libiomp5 so Compatibility OpenMP run time library for dynamic linking libmkl_blacs_ BLACS routines supporting Intel MPI 2 0 3 x and MPICH2 intelmpi so locale en_US Catalog of Intel MKL messages in English mkl_msg cat locale ja_JP Catalog of Intel MKL messages in J apanese mkl_msg cat 1 To be used
48. Reference Manual mklman90_j pdf Intel MKL 9 0 Reference Manual in Japanese mklsupport txt Information on package number for customer support reference redist txt List of redistributable files Release Notes pdf Intel MKL Release Notes userguide pdf Intel MKL User s Guide this document Viewing Man Pages The Intel MKL man pages are located in the directory specified in Table 3 2 To access man pages add this directory to the MANPATH environment variable If you performed the Setting Environment Variables step of the Getting Started process this is done automatically To view the man page for an Intel MKL function enter the following command in your command shell man lt function base name gt In this release lt function base name gt is the function name with omitted prefixes denoting data type precision or function domain Examples e For the BLAS function ddot enter man dot 3 20 Intel Math Kernel Library Structure 3 For the ScaLAPACK function pzgeql12 enter man pgeql2 For the FFT function DftiCommitDescriptor enter man CommitDescriptor NOTE Function names in the man command are case sensitive 3 21 Configuring Your Development Environment This chapter explains how to configure your development environment for the use with the Intel Math Kernel Library Intel MKL Chapter 2 explains how to set environment variables INCLUDE MKLROOT LD_LIBRARY_PATH MANPATH LIBRARY PATH CPATH FP
49. ScaLAPACK tests e For the IA 32 architecture add libmk1l_scalapack_core a to your link command e For the IA 64 and Intel 64 architectures add libmkl_scalapack_1p64 aor libmkl_scalapack_ilp64 a depending upon the desired interface Examples for Linking with ScaLAPACK and Cluster FFT For the detailed information on the structure of the Intel MKL architecture specific directories and the names of the cluster libraries to link see Directory Structure in Detail Examples for Linking a C Application 9 4 These examples illustrate linking of an application whose main module is in C under the following conditions e MPICH2 1 0 7 or higher is installed in opt mpich e S MKLPATH is a user defined variable containing lt mk1_directory gt 1lib 32 e You use the Intel C Compiler 10 0 or higher To link with ScaLAPACK for a cluster of systems based on the IA 32 architecture use the following libraries opt mpich bin mpicce lt user files to link gt LSMKLPATH lmk1_scalapack_core lmk1_blacs_intelmpi lmk1_lapack lmkl_intel lmkl_intel_thread lmkl_lapack lmkl_core liomp5 lpthread PO Gg gh ge pg To link with Cluster FFT for a cluster of systems based on the IA 32 architecture use the following libraries opt mpich bin mpice lt user files to link gt SMKLPATH 1libmk1l_cdft_core a SMKLPATH libmkl_blacs_intelmpi a SMKLPATH 1libmkl_intel a aA Aaa Working with the Intel Math Kernel Library Cluster Software 9
50. aLAPACK and Cluster FFTs To link a program that calls ScaLAPACK and or Cluster FFTs you need to know how to link a message passing interface MPI application first Use mpi scripts to do this For example mpicc and mpif77 are C and FORTRAN 77scripts respectively that use the correct MPI header files The location of these scripts and the MPI library depends on your MPI implementation For example for the default installation of MPICH opt mpich bin mpicc and opt mpich bin mpif77 are the compiler scripts and opt mpich 1lib 1libmpich a is the MPI library Check the documentation that comes with your MPI implementation for implementation specific details of linking To link with the Intel MKL ScaLAPACK and or Cluster FFTs use the following general form lt lt MPI gt linker script gt lt files to link gt L lt MKL path gt Wl start group lt MKL cluster library gt lt BLACS gt lt MKL core libraries gt Wl end group where lt MPI gt is one of several MPI implementations MPICH Intel MPI 2 x 3 x and so on 9 1 9 Intel Math Kernel Library User s Guide lt MKL cluster library gt is one of ScaLAPACK or Cluster FFT libraries for the appropriate architecture which are listed in Table 3 6 Table 3 7 or Table 3 8 For example for A 32 architecture it is one of lmk1_scalapack_core or lmkl1_cdft_core lt BLACS gt is the BLACS library corresponding to your architecture programming interface LP64 or
51. amples for Linking with ScaLAPACK and Cluster FFT TIP Use the Web based Linking Advisor to quickly choose the appropriate set of lt MKL cluster Library gt lt BLACS gt and lt MKL core libraries gt For information on linking with Intel MKL libraries see Linking Your Application with the Intel Math Kernel Library 9 2 Working with the Intel Math Kernel Library Cluster Software 9 Setting the Number of Threads The OpenMP software responds to the environment variable OMP_NUM_THREADS Intel MKL also has other mechanisms to set the number of threads such as the MKL_NUM THREADS or MKL DOMAIN NUM THREADS environment variables see Using Additional Threading Control Make sure that the relevant environment variables have the same and correct values on all the nodes Intel MKL versions 10 0 and higher no longer set the default number of threads to one but depend on the OpenMP libraries used with the compiler to set the default number For the threading layer based on the Intel compiler libmkl_intel_thread a this value is the number of CPUs according to the OS CAUTION Avoid over prescribing the number of threads which may occur for instance when the number of MPI ranks per node and the number of threads per node are both greater than one The product of MPI ranks per node and the number of threads per node should not exceed the number of physical cores per node The best way to set an environment variab
52. apter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Overview Introduces the Intel MKL usage information and describes this document s notational conventions Getting Started Describes post installation steps and gives information needed to start using Intel MKL after its installation Intel Math Kernel Library Structure Discusses the structure of the Intel MKL directory after installation Configuring Your Development Environment Explains how to configure Intel MKL with your development environment Linking Your Application with the Intel Math Kernel Library Explains which libraries should be linked with your application for your particular platform discusses how to build custom dynamic libraries Managing Performance and Memory Discusses Intel MKL threading shows coding techniques and gives hardware configuration tips for improving performance of the library explains features of the Intel MKL memory management Language specific Usage Options Discusses mixed language programming and the use of language specific interfaces Coding Tips Presents coding tips that may be helpful to your specific needs Chapter 9 Chapter 10 Chapter 11 Appendix A Appendix B Overview 1 Working with the Intel Math Kernel Library Cluster Software Discusses usage of ScaLAPACK and Cluster FFTs explains linking of your application with these function domains including C and For
53. aries for distribution Intel MKL Custom Shared Object Builder The custom shared object builder enables you to create a dynamic library shared object containing the selected functions and located in the tools builder directory The builder contains a makefile and a definition file with the list of functions 1 See Fortran 95 Interfaces to LAPACK and BLAS for information on how to build Fortran 95 LAPACK and BLAS interface libraries 5 9 5 Intel Math Kernel Library User s Guide NOTE NOTE The objects in Intel MKL static libraries are position independent code PIC which is not typical for static libraries Therefore the custom shared object builder can create a shared object from a subset of Intel MKL functions by picking the respective object files from the static libraries Using the Builder 5 10 To build a custom shared object use the following command make target lt options gt Possible values for target e ia32 for processors that use the IA 32 architecture e em64t for processors that use the Intel 64 architecture e ipf for processors that use the IA 64 architecture The lt options gt placeholder stands for the list of parameters that define macros to be used by the makefile interface 1p64 ilp64 Defines whether to use LP64 or ILP64 programming interface for the Intel 64 or A 64 architecture The default value is 1p64 threading parallel sequential Defines whether to use the Intel MK
54. as been enhanced as well as the description of the layered model concept Description of the SP2DP interface has been added to Chapter 3 The Web based linking advisor has been described and referenced in chapters 2 and 5 intel INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS NO LICENSE EXPRESS OR IMPLIED BY ESTOPPEL OR OTHERWISE TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE MERCHANTABILITY OR INFRINGEMENT OF ANY PATENT COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT UNLESS OTHERWISE AGREED IN WRITING BY INTEL THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR Intel may make changes to specifications and product descriptions at any time without notice Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them The information here is subject to change witho
55. ased linking advisor available from http software intel com en us articles intel mkl link line advisor Linking on IA 32 Architecture Systems In these examples MKLPATH S MKLROOT 1ib ia32 MKLINCLUDE S MKLROOT include 1 Static linking of myprog f and parallel Intel MKL ifort myprog f LSMKLPATH ISMKLINCLUDE Wl start group MKLPATH libmkl_intel a SMKLPATH libmkl_intel_thread a MKLPATH libmkl_ core a W1 end group liomp5 lpthread 2 Dynamic linking of myprog f and parallel Intel MKL ifort myprog f LSMKLPATH ISMKLINCLUDE lmk1l_intel 1lmkl_intel_ thread lmkl_core liomp5 lpthread 3 Static linking of myprog f and sequential version of Intel MKL ifort myprog f LSMKLPATH ISMKLINCLUDE W1l start group MKLPATH libmkl_ intel a SMKLPATH libmkl_sequential a MKLPATH libmkl_ core a W1 end group lpthread 4 Dynamic linking of myprog f and sequential version of Intel MKL ifort myprog f LSMKLPATH ISMKLINCLUDE lmk1l_intel 1lmkl_sequential lmkl_core lpthread 5 Static linking of myprog Fortran 95 LAPACK interfacet and parallel Intel MKL 5 7 5 Intel Math Kernel Library User s Guide 1 ifort myprog f L MKLPATH ISMKLINCLUDE I MKLINCLUDE 32 lmkl_lapack95 W1l start group MKLPATH libmkl_intel a SMKLPATH libmkl_intel_thread a SMKLPATH libmkl1_core a W1 end group liomp5 lpthread Static linking of myprog Fortran 95 BLAS interfacet and parallel Intel MKL ifort myprog f LSM
56. ated in the benchmarks linpack subdirectory in the Intel MKL directory see Table 3 2 11 1 1 1 Intel Math Kernel Library User s Guide Table 11 1 Running the Software To obtain results for the pre determined sample problem sizes on a given system type one of the following as appropriate 11 2 benchmarks linpack linpack_itanium linpack_xeon32 linpack_xeon64 runme_itanium runme_xeon32 runme_xeon64 lininput_itanium lininput_xeon32 lininput_xeon64 lin_itanium txt lin _xeon32 txt lin_xeon64 txt help lpk xhelp lpk Contents of the LINPACK Benchmark The 64 bit program executable for a system based on Intel Itanium 2 processor The 32 bit program executable for a system based on Intel Xeon processor or Intel Xeon processor MP with or without Streaming SIMD Extensions 3 SSE3 The 64 bit program executable for a system with Intel Xeon processor using Intel 64 architecture A sample shell script for executing a pre determined problem set for linpack_itanium OMP_NUM_THREADS set to 8 processors A sample shell script for executing a pre determined problem set for linpack_xeon32 OMP_NUM_THREADS set to 2 processors A sample shell script for executing a pre determined problem set for linpack_xeon64 OMP_NUM_THREADS set to 4 processors Input file for pre determined problem for the runme_itanium script Input file for pre determined problem for the runme_xeon32
57. ax and more code examples MKL_DYNAMIC The MKL DYNAMIC environment variable enables Intel MKL to dynamically change the number of threads The default value of MKL_DYNAMIC is TRUE regardless of OMP_DYNAMIC whose default value may be FALSE When MKL DYNAMIC is TRUE Intel MKL tries to use what it considers the best number of threads up to the maximum number you specify For example MKL_DYNAMIC set to TRUE enables optimal choice of the number of threads in the following cases e Ifthe requested number of threads exceeds the number of physical cores perhaps because of hyper threading and MKL_DYNAMIC is not changed from its default value of TRUE Intel MKL will scale down the number of threads to the number of physical cores e If you are able to detect the presence of MPI but cannot determine if it has been called in a thread safe mode it is impossible to detect this with MPICH 1 2 x for instance and MKL_ DYNAMIC has not been changed from its default value of TRUE Intel MKL will run one thread When MKL_ DYNAMIC is FALSE Intel MKL tries not to deviate from the number of threads the user requested However setting MKL_DYNAMIC FALSE does not ensure that Intel MKL will use the number of threads that you request The library may have no choice on this Managing Performance and Memory 6 number for such reasons as system resources Additionally the library may examine the problem and use a different number of threads than the
58. ble for example the KMP_AFFINITY environment variable using the Intel OpenMP library e A system function as explained below Consider the following performance issue e The system has two sockets with two cores each for a total of four cores CPUs e The two thread parallel application that calls the Intel MKL FFT happens to run faster than in four threads but the performance in two threads is very unstable Example 6 3 resolves this issue The code example calls the system function sched_setaffinity to bind the threads to the cores on different sockets Then the Intel MKL FFT function is called Compile your application with the Intel compiler using the following command icc test_application c openmp where test_application c is the filename for the application Build the application Run it in two threads for example by using the environment variable to set the number of threads env OMP_NUM THREADS 2 a out Example 6 3 Setting An affinity Mask by Operating System Means Using the I ntel Compiler define _GNU SOURCE for using the GNU CPU affinity works with the appropriate kernel and glibc Set affinity mask include lt sched h gt include lt stdio h gt include lt unistd h gt include lt omp h gt int main void int NCPUs sysconf _SC_NPROCESSORS_CONF printf Using thread affinity on i NCPUs n NCPUs pragma omp parallel default shared cpu_set_t new_mask cpu_set_t was_mask int tid
59. broad support for Fortran and C C programming However not all function domains support both Fortran and C interfaces see Table A 1 in Appendix A For example LAPACK has no C interface You can call functions comprising such domains from C using mixed language programming If you want to use LAPACK or BLAS which support Fortran in the Fortran 95 environment additional effort may be initially required to build compiler specific interface libraries and modules from the source code provided with Intel MKL This chapter focuses on mixed language programming and the use of language specific interfaces It explains the use of Intel MKL in C language environments for function domains that provide only Fortran interfaces as well as explains usage of language specific interfaces specifically the Fortran 95 interfaces to LAPACK and BLAS The chapter also discusses compiler dependent functions to explain why Fortran 90 modules are supplied as sources A separate section guides you through the process of running examples to invoke Intel MKL functions from J ava Using Language Specific Interfaces with Intel MKL You can create the following interface libraries and modules using the respective makefiles located in the interfaces directory Table 7 1 Interface Libraries and Modules File name Contains Libraries in Intel MKL architecture specific directories libmkl blas95 at Fortran 95 wrappers for BLAS BLAS95 for IA 32 E architecture l
60. commands depending on the shell you use For example for a bash shell use the export commands export lt VARIABLE NAME gt lt value gt For example export MKL NUM _THREADS 4 export MKL DOMAIN NUM THREADS MKL ALL 1 MKL BLAS 4 export MKL DYNAMIC FALSE For the csh or tcsh shell use the set commands set lt VARIABLE NAME gt lt value gt For example set MKL NUM _THREADS 4 set MKL DOMAIN NUM THREADS MKL ALL 1 MKL BLAS 4 set MKL DYNAMIC FALSE Dispatching Intel Advanced Vector Extensions Intel AVX Intel MKL provides optimized kernels for Intel AVX To have the Intel AVX instructions dispatched on Intel AVX enabled hardware or simulation use the Intel MKL service function mkl_enable_ instructions This function enables dispatching new Intel AVX instructions Call this function before any other Intel MKL function call For the function description see the Intel MKL Reference Manual instructions to be dispatched A particular instruction will be dispatched if the hardware is Intel AVX enabled and the function is already optimized to dispatch this instruction However if you do not call this function new instructions will not be dispatched i NOTE Successful execution of this function does not guarantee new 6 13 6 Intel Math Kernel Library User s Guide Tips As the Intel AVX instruction set is evolving the behavior of mkl_ enable instructions may change with future Intel MKL releases R
61. correlation functions mitigates the same difficulty of the VSL interface which assumes a similar lifecycle for task descriptors The wrapper utilizes the ESSL like interface for those functions which is simpler for the case of 1 dimensional data The J NI stub additionally encapsulates the MKL functions into the ESSL like wrappers written in C and so packs the lifecycle of a task descriptor into a single call to the native method The wrappers meet the JNI Specification versions 1 1 and 5 0 and should work with virtually every modern implementation of J ava The examples and the J ava part of the wrappers are written for the Java language described in The Java Language Specification First Edition and extended with the feature of inner classes this refers to late 1990s This level of language version is supported by all versions of the Sun Java Development Kit J DK developer toolkit and compatible implementations starting from version 1 1 5 or by all modern versions of J ava The level of C language is Standard C that is C89 with additional assumptions about integer and floating point data types required by the Intel MKL interfaces and the J NI header files That is the native float and double data types must be the same as J NI j loat and jdouble data types respectively and the native int must be 4 bytes long Running the Examples The Java examples support all the C and C compilers that the Intel MKL does The mak
62. d against Intel MPI 3 2 New Example of an MP LINPACK benchmark input file for a pure MPI binary and the IA 64 architecture New Sample run script for the A 64 architecture and a hybrid binary statically linked against Intel MPI 3 2 New Sample run script for the A 64 architecture and a hybrid binary dynamically linked against Intel MPI 3 2 New Example of an MP LINPACK benchmark input file for a hybrid binary and the IA 64 architecture New Sample utility that tests the DGEMM speed across the cluster Building the MP LINPACK There are a few included sample architecture makefiles You can edit them to fit your specific configuration Specifically Set TOPdir to the directory that MP LINPACK is being built in You may set MPI variables that is MPdir MPinc and MP1ib 11 8 Specify the location of Intel MKL and of files to be used LAdir LAinc LA1ib Adjust compiler and compiler linker options Specify the version of MP LINPACK you are going to build hybrid or non hybrid by setting the version parameter for the make command for example make arch em64t version hybrid install For some sample cases like Linux systems based on the Intel 64 architecture the makefiles contain values that must be common However you need to be familiar with building an HPL and picking appropriate values for these variables LINPACK and MP LINPACK Benchmarks 1 1 New Features The toolset is basically identical w
63. e types to define complex data You can also redefine the types with your own types before including the mkl_types h header file The only requirement is that the types must be compatible with the Fortran complex layout that is the complex type must be a pair of real numbers for the values of real and imaginary parts For example you can use the following definitions in your C code define MKL Complex8 std complex lt float gt and define MKL _ Complex16 std complex lt double gt See Example 7 2 for details You can also define these types in the command line DMKL_Complex8 std complex lt float gt DMKL_Complex16 std complex lt double gt Calling BLAS Functions that Return the Complex Values in C C Code Complex values that functions return are handled differently in C and Fortran Because BLAS is Fortran style you need to be careful when handling a call from C to a BLAS function that returns complex values However in addition to normal function calls Fortran enables calling functions as though they were subroutines which provides a mechanism for returning the complex value correctly when the function is called from a C program When a Fortran function is called as a subroutine the return value is the first parameter in the calling sequence You can use this feature to call a BLAS function from C The following example shows how a call to a Fortran function as a subroutine converts to a call from C and the hidde
64. e Intel compilers SP2DP interface library for the Intel compilers Threading library for the GNU Fortran and C compilers Threading library for the Intel compilers Sequential library Library dispatcher for dynamic load of processor specific kernel library Kernel library for the A 64 architecture LAPACK and DSS PARDISO routines and drivers ScaLAPACK routine library supporting the ILP64 interface ScaLAPACK routine library supporting the LP64 interface VML kernel for the A 64 architecture Legacy OpenMP run time library for dynamic linking Compatibility OpenMP run time library for dynamic linking ILP64 version of BLACS routines supporting Intel MPI 2 0 3 x and MPICH2 LP64 version of BLACS routines supporting Intel MPI 2 0 and 3 x and MPICH2 Catalog of Intel MKL messages in English Catalog of Intel MKL messages in Japanese 3 19 3 Intel Math Kernel Library User s Guide Accessing the Intel MKL Documentation This section details the contents of the Intel MKL documentation directory and explains how to access man pages for the library Contents of the Documentation Directory Table 3 9 shows the contents of the doc subdirectory in the Intel MKL installation directory Table 3 9 Contents of the doc Directory File name Comment Install txt Intel MKL Installation Guide mkl_documentation htm Overview and links for the Intel MKL documentation mk1LEULA txt Intel MKL end user license mklman pdf Intel MKL
65. e LAPACK and BLAS routines are Fortran style when calling them from C language programs follow the Fortran style calling conventions e Pass variables by address not by value Function calls in Example 7 2 and Example 7 3 illustrate this e Store your data in Fortran style that is column major rather than row major order With row major order adopted in C the last array index changes most quickly and the first one changes most slowly when traversing the memory segment where the array is stored With Fortran style column major order the last index changes most slowly whereas the first one changes most quickly as illustrated by Figure 7 1 for a two dimensional array 7 5 7 Intel Math Kernel Library User s Guide Figure 7 1 Column major Order versus Row major Order 7 6 1 2 3 a 0 1 2 3 l 2 3 A Column major order Fortran style B Row major order C style For example if a two dimensional matrix A of size m x n is stored densely in a one dimensional array B you can access a matrix element like this Ali j Bli n j in C i 0 m 1 j 0 n 1 A i j B j m i in Fortran i 1 m j l n When calling LAPACK or BLAS routines from C be aware that because the Fortran language is case insensitive the routine names can be both upper case or lower case with or without the trailing underscore For example these names are equivalent e LAPACK dgetrf DGETRF dgetrf_ DGETRF_ e BLAS dgem
66. e dpar int stat Description This function computes the backward Trigonometric Transform d_init_trig_transform routine and passed to d_backward_ size of the problem n which determines sizes of the array para routine with the ipar array and defined in the previously called Paw ww ele wee dia 10 4 Getting Assistance for Programming in the Eclipse IDE 1 0 F1 Help F1 Help displays the list of relevant documentation topics for a keyword To get F1 Help for an Intel MKL function in the editor window 1 Hover the mouse over the function name 2 Press F1 or double click the name This displays two lists The list of links to the relevant topics in the product documentation displays in the Related Topics page under See also The Intel MK Help Index establishes the relevance see Figure 10 4 Typically one link displays in this list for each function The list of search results for the function name displays in the Related Topics page under Dynamic Help see Figure 10 5 3 Click a link to open the associated Help topic Figure 10 4 F1 Help in the Eclipse IDE ic y wkLGetVersfion a baa ES 1 0 Intel Math Kernel Library User s Guide Figure 10 5 F1 Help Search in the Eclipse IDE CDT 10 6 LINPACK and MP LINPACK Benchmarks This chapter describes the Intel Optimized LINPACK Benchmark for the Linux OS for shared memory systems and Intel Optimized MP LINPACK
67. ead of Calling BLAS Directly POMC oe aa endan ietoga ia Deel an a aa 7 10 Example 8 1 Aligning Addresses at 16 byte Boundaries 065 8 2 List of Figures Figure 7 1 Column major Order versus Row major Order 5 7 6 Figure 10 1 Intel MKL Help in the Eclipse IDE eee 10 2 Figure 10 2 Hits to the Intel Web Site in the Eclipse IDE Help ATIL E a rareancareianienb ie Bac ath da sees tet ET ee as 10 3 Figure 10 3 Infopop Window with an Intel MKL Function Descriptio wes civeicstes oy coed ne iaa i Vee exceeded a EAA E AKEE Ea aAA E ene 10 4 Figure 10 4 F1 Help in the Eclipse IDE 10 5 Figure 10 5 F1 Help Search in the Eclipse IDE CDT 10 6 viii Overview The Intel Math Kernel Library Intel MKL offers highly optimized thread safe math routines for science engineering and financial applications that require maximum performance Technical Support Intel provides a support web site which contains a rich repository of self help information including getting started tips known product issues product errata license information user forums and more Visit the Intel MKL support website at http www intel com software products support About This Document Read this document after you have installed Intel MKL on your system If you have not completed the installation see the Intel Math Kernel Library Installation Guide file Install txt The Intel MKL User s Guide provides usage info
68. ecture and Linux OS Statically linked against Intel MPI 3 2 New Prebuilt binary for the A 32 architecture and Linux OS Dynamically linked against Intel MPI 3 2 New Prebuilt binary for the Intel 64 architecture and Linux OS Statically linked against Intel MPI 3 2 New Prebuilt binary for the Intel 64 architecture and Linux OS Dynamically linked against Intel MPI 3 2 New Prebuilt binary for the A 64 architecture and Linux OS Statically linked against Intel MPI 3 2 New Prebuilt binary for the A 64 architecture and Linux OS Dynamically linked against Intel MPI 3 2 files are prebuilt hybrid executables New Prebuilt hybrid binary for the A 32 architecture and Linux OS Statically linked against Intel MPI 3 2 New Prebuilt hybrid binary for the A 32 architecture and Linux OS Dynamically linked against Intel MPI 3 2 New Prebuilt hybrid binary for the Intel 64 architecture and Linux OS Statically linked against Intel MPI 3 2 New Prebuilt hybrid binary for the Intel 64 and Linux OS Dynamically linked against Intel MPI 3 2 New Prebuilt hybrid binary for the A 64 architecture and Linux OS Statically linked against Intel MPI 3 2 New Prebuilt hybrid binary for the A 64 and Linux OS Dynamically linked against Intel MPI 3 2 LINPACK and MP LINPACK Benchmarks 1 1 Table 11 2 Contents of the MP LI NPACK Benchmark benchmarks mp_linpack Next 3 files a
69. eee cece ects teeta e eee ee ed 5 3 Linking with Computational Libraries 0 cece eee ee eee eee ee ed 5 4 Linking with Compiler Support RTLS scceeeee teens teeta eee ee ed 5 6 Linking with System LibrarieS cccceeeee cece eee tees teeta eae ee ed 5 6 Linking EXAMPIeS 0 cece eee eee een a tennant 5 7 Building Custom Shared ObjectS cccceee eect eee eee eee eee eaeeaenes 5 9 Intel MKL Custom Shared Object Builder cccceeeeeeeeee eee ee ee 5 9 Using the Builder sei n aaaea aaar ene e ee tena ene eee 5 10 Specifying a List Of FUNCTIONS 0 c cece cece eee ee eee eee e ee ea eed 5 11 Distributing Your Custom Shared Object ccceceeee eens eee eens 5 11 Chapter 6 Managing Performance and Memory Using the Intel MKL ParallelisM cccceeeee rreren rererere 6 1 Techniques to Set the Number of Threads c secee cesses teens 6 3 Avoiding Conflicts in the Execution Environment eseese 6 3 Setting the Number of Threads Using an OpenMP Environment Variable nso ctw ss panes waned aucune mead eater a E te ee eta eats 6 4 Changing the Number of Threads at RUN Time eeeeeeeee eee 6 5 Using Additional Threading Control ccceeeee eect ee eee eee ee eee 6 8 Dispatching Intel Advanced Vector Extensions Intel AVX 6 13 Tips and Techniques to Improve Performance ccceeeeeee eee es 6 14 Coding Tech
70. efault is not For instance arrays dynamically allocated using malloc are aligned at 8 byte boundaries not 16 byte boundaries If you need numerically identical outputs use mk1_malloc to get the properly aligned workspace as shown below 8 1 8 Intel Math Kernel Library User s Guide Example 8 1 Aligning Addresses at 16 byte Boundaries RKKKKK C language x include lt stdlib h gt void darray int workspace Allocate workspace aligned on 16 bit boundary darray mkl_malloc sizeof double workspace 16 call the program using MKL mkl_app darray Free workspace mkl_free darray j k e ek Fortran language double precision darray pointer p wrk darray 1 integer workspace Allocate workspace aligned on 16 bit boundary p_wrk mkl_malloc 8 workspace 16 call the program using MKL call mkl_app darray Free workspace call mkl_free p_wrk 8 2 Working with the Intel Math Kernel Library Cluster Software This chapter discusses the usage of the Intel Math Kernel Library Intel MKL ScaLAPACK and Cluster FFTs See Chapter 3 for details about the Intel MKL directory structure including the available documentation in the doc directory For information on MP LINPACK Benchmark for Clusters see Chapter 11 Intel MKL ScaLAPACK and Cluster FFTs support MPI implementations identified in the Intel MKL Release Notes Linking with Sc
71. efer to the Release Notes for release specific details of the function behavior and Techniques to Improve Performance This section provides some tips and techniques for improving performance Coding Techniques 6 14 To obtain the best performance with Intel MKL ensure the following data alignment in your source code e Align arrays at 16 byte boundaries e Make sure leading dimension values n element_size of two dimensional arrays are divisible by 16 e For two dimensional arrays avoid leading dimension values divisible by 2048 LAPACK Packed Routines The routines with the names that contain the letters HP OP PP SP TP UP in the matrix type and storage position the second and third letters respectively operate on the matrices in the packed format see LAPACK Routine Naming Conventions sections in the Intel MKL Reference Manual Their functionality is strictly equivalent to the functionality of the unpacked routines with the names containing the letters HE OR PO SY TR UN in the same positions but the performance is significantly lower If the memory restriction is not too tight use an unpacked routine for better performance In this case you need to allocate N2 2 more memory than the memory required by a respective packed routine where N is the problem size the number of equations For example to speed up solving a symmetric eigenproblem with an expert driver use the unpacked routine call dsyevx jobz
72. efile intended to run the examples also needs the make utility which is typically provided with the Linux OS distribution Language specific Usage Options 7 To run Java examples the JDK developer toolkit is required for compiling and running Java code A Java implementation must be installed on the computer or available via the network You may download the J DK from the vendor website The examples should work for all versions of JDK However they were tested only with the following J ava implementations e J2SE SDK 1 4 2 JDK 5 0 and 6 0 from Sun Microsystems Inc http sun com Supports only processors using the A 32 and Intel 64 architectures e JRockit JDK 1 4 2 and 5 0 from BEA Systems Inc http bea com Supports processors using the IA 32 Intel 64 and IA 64 architectures Note that the Java run time environment JRE system which may be pre installed on your computer is not enough You need the J DK developer toolkit that supports the following set of tools java e javac e javah javadoc To make these tools available for the examples makefile set the JAVA_HOME environment variable and add the J DK binaries directory to the system PATH for example using the bash shell export JAVA_HOME home lt user name gt jdk1 5 0_09 export PATH JAVA_HOME bin PATH You may also need to clear the JDK_HOME environment variable if it is assigned a value unset JDK_HOME To start the examples use t
73. er link in the Libiomp libguide version that comes with Intel MKL If you link with dynamic versions of 1ibiomp libguide recommended that is use libiomp5 so or libguide so make sure LD LIBRARY PATH is defined correctly See Setting Environment Variables for details Linking with System Libraries 5 6 To use the Intel MKL FFT Trigonometric Transform or Poisson Laplace and Helmholtz Solver routines link in the math support system library by adding 1m to the link line On Linux OS 1libiomp 1ibguide both rely on the native pthread library for multi threading Any time Libiomp 1libguide is required add 1lpthread to your link line afterwards the order of listing libraries is important Linking Your Application with the Intel Math Kernel Library 5 Linking Examples The section provides specific linking examples that use Intel compilers on systems based on the IA 32 Intel 64 and A 64 architectures The following examples use the Fortran source file C C users should instead specify a cpp C or c C file and replace the ifort linker with icc 8 A eae ee ae soe NOTE If you successfully completed the Setting Environment Variables step of the Getting Started process you can omit ISMKLINCLUDE in all the examples and omit LSMKLPATH in the examples for dynamic linking See also Examples for Linking with ScaLAPACK and Cluster FFT For assistance in finding the right link line use the Web b
74. ging Performance and Memory 6 Example 6 1 Changing the Number of Threads continued INTEGER N I J PARAMETER N 1000 REAL 8 A N N B N N C N N REAL 8 ALPHA BETA INTEGER 8 MKL MALLOC integer ALLOC SIZE integer NTHRS ALLOC SIZE 8 N N A PTR MKL MALLOC ALLOC_SIZE 128 B PTR MKL MALLOC ALLOC_SIZE 128 C_PTR MKL MALLOC ALLOC SIZE 128 ALPHA 1 1 BETA 1 2 DO I 1 N DO J 1 N A I d I d B I J I j C I J 0 0 END DO END DO CALL DGEMM N N N N N ALPHA A N B N BETA C N print Row A CY DO i 1 10 write 14 F20 8 F20 8 I A 1 1I C 1 1 END DO CALL OMP_SET_NUM_THREADS 1 DO I 1 N DO J 1 N A I d I d B I J I j C I J 0 0 END DO END DO CALL DGEMM N N N N N ALPHA A N B N BETA C N print Row A Cc DO i 1 10 write 14 F20 8 F20 8 I A 1 1I C 1 1 END DO 6 7 6 Intel Math Kernel Library User s Guide Examp le 6 1 Changing the Number of Threads continued CALL OMP_SET_NUM_THREADS 2 DO I 1 N DO J 1 N A I d I d B I J I j C I J 0 0 END DO END DO CALL DGEMM N N N N N ALPHA A N B N BETA C N print Row A Cc DO i 1 10 write 14 F20 8 F20 8 I A 1 1I C 1 1 END DO STOP END Using Additional Threading Control 6 8 Intel MKL provides optional threading controls that is the environment variables and service functions that are independent of OpenMP They behave similar
75. han with normalized floating point numbers You can mitigate this performance issue by setting the appropriate bit fields in the MXCSR floating point control register to flush denormals to zero FTZ or to replace any denormals loaded from memory with zero DAZ Check your compiler documentation to determine whether it has options to control FTZ and DAZ Note that these compiler options may slightly affect accuracy FFT Optimized Radices You can improve the performance of Intel MKL FFT if the length of your data vector permits factorization into powers of optimized radices In Intel MKL the optimized radices are 2 3 5 7 and 11 6 17 6 Intel Math Kernel Library User s Guide Using the Intel MKL Memory Management Intel MKL has memory management software that controls memory buffers for the use by the library functions New buffers that the library allocates when your application calls certain functions Level 3 BLAS or FFT are not deallocated until the program ends To get the amount of memory allocated by the memory management software call the mk1_mem_stat function If your program needs to free memory call mkl_ free buffers If another call is made to a library function that needs a memory buffer the memory manager again allocates the buffers and they again remain allocated until either the program ends or the program deallocates the memory This behavior facilitates better performance However some tools may repo
76. he Intel Optimized LINPACK Benchmark without setting the number of threads it will default to the number of cores according to the OS You can find the settings for this environment variable in the runme_ sample scripts If the settings do not already match the situation for your machine edit the script Known Limitations The following limitations are known for the Intel Optimized LINPACK Benchmark for Linux OS e Intel Optimized LINPACK Benchmark is threaded to effectively use multiple processors So in multi processor systems best performance will be obtained with Hyper Threading technology turned off which ensures that the operating system assigns threads to physical processors only e Ifan incomplete data input file is given the binaries may either hang or fault See the sample data input files and or the extended help for insight into creating a correct data input file 1 1 Intel Math Kernel Library User s Guide Intel Optimized MP LINPACK Benchmark for Clusters 11 4 The Intel Optimized MP LINPACK Benchmark for Clusters is based on modifications and additions to HPL 2 0 from Innovative Computing Laboratories ICL at the University of Tennessee Knoxville UTK The Intel Optimized MP LINPACK Benchmark for Clusters can be used for Top 500 runs see http www top500 org To use the benchmark you need be intimately familiar with the HPL distribution and usage The Intel Optimized MP LINPACK Benchmark for Clusters prov
77. he makefile found in the Intel MKL Java examples directory make so32 soem64t so64 1ib32 libem64t 1ib64 function compiler If you type the make command and omit the target for example so32 the makefile prints the help info which explains the targets and parameters For the examples list see the examples 1st file in the Java examples directory Known Limitations This section explains limitations of Java examples Functionality It is possible that some MKL functions will not work if called from the J ava environment by using a wrapper like those provided with the Intel MKL J ava examples Only those specific CBLAS FFT VML VSL RNG and the convolution correlation functions 7 15 7 Intel Math Kernel Library User s Guide 7 16 listed in the Intel MKL Java Examples section were tested with the Java environment So you may use the Java wrappers for these CBLAS FFT VML VSL RNG and convolution correlation functions in your Java applications Performance The functions from Intel MKL must work faster than similar functions written in pure Java However the main goal of these wrappers is to provide code examples not maximum performance So an Intel MKL function called from a Java application will probably work slower than the same function called from a program written in C C or Fortran Known bugs There are a number of known bugs in Intel MKL identified in the Release Notes as well as incompatibilities be
78. hecking 2 1 J Java examples 7 12 L language interfaces support A 1 language specific interfaces 7 1 LAPACK calling routines from C 7 5 Fortran 95 interfaces to 7 3 packed routines performance 6 14 layered model 3 3 library run time compatibility OpenMP 3 5 Index 2 run time legacy OpenMP 3 5 library structure 3 1 license end user location 3 20 link command examples 5 7 link libraries computational 5 5 for Intel R 64 architecture 5 5 threading 5 4 linking 5 1 with Cluster FFT 9 1 with ScaLAPACK 9 1 LINPACK benchmark 11 1 M memory functions redefining 6 18 memory management 6 18 memory renaming 6 18 mixed language programming 7 5 module Fortran 95 7 4 MP LINPACK benchmark 11 4 hybrid version 11 4 multi core performance 6 15 notational conventions 1 3 number of threads changing at run time 6 5 changing with OpenMP environment variable 6 4 Intel R MKL choice particular cases 6 10 setting for cluster 9 3 techniques to set 6 3 numerical stability 8 1 0 OpenMP compatibility run time library 3 5 legacy run time library 3 5 OpenMP run time library 5 3 P parallel performance 6 4 parallelism 6 1 PARDISO OOC configuration file 4 4 performance 6 1 coding techniques to gain 6 14 hardware tips to gain 6 15 multi core 6 15 of LAPACK packed routines 6 14 with denormals 6 17 with subnormals 6 17 R RTL 7 4 ru
79. ibmkl blas95 ilp64 at Fortran 95 wrappers for BLAS BLAS95 supporting LP64 E interface 7 1 7 Intel Math Kernel Library User s Guide Table 7 1 Interface Libraries and Modules continued File name Contains libmkl_blas95_ 1p64 at Fortran 95 wrappers for BLAS BLAS95 supporting I LP64 interface libmkl_lapack95 al Fortran 95 wrappers for LAPACK LAPACK95 for I A 32 architecture libmkl_lapack95 1p64 at Fortran 95 wrappers for LAPACK LAPACK95 supporting LP64 interface libmkl lapack95 ilp64 as Fortran 95 wrappers for LAPACK LAPACK95 supporting E ILP64 interface libfftw2xc_intel a Interfaces for FFTW version 2 x C interface for Intel compiler to call Intel MKL FFTs libfftw2xc_gnu a Interfaces for FFTW version 2 x C interface for GNU compiler to call Intel MKL FFTs libfftw2xf_intel a Interfaces for FFTW version 2 x Fortran interface for Intel compiler to call Intel MKL FFTs libfftw2xf_gnu a Interfaces for FFTW version 2 x Fortran interface for GNU compiler to call Intel MKL FFTs libfftw3xc intel a Interfaces for FFTW version 3 x C interface for Intel F compiler to call Intel MKL FFTs libfftw3xc_gnu a Interfaces for FFTW version 3 x C interface for GNU compiler to call Intel MKL FFTs libfftw3xf intel a Interfaces for FFTW version 3 x Fortran interface for Intel compiler to call Intel MKL FFTs libfftw3xf_gnu a Interfaces for FFTW version 3 x Fortran interface for GNU compiler to call Intel MK
80. ication for example mkl_intel_1p 4 mkl_intel_thread_1p64 mkl_core and iomp5 compilers typically require library names rather than library file names so omit the lib prefix and a extension To learn how to choose the libraries see Selecting Libraries to Link The name of the particular setting where libraries are specified depends upon the compiler integration Configuring the Eclipse IDE CDT 3 x To configure Eclipse IDE CDT 3 x to link with Intel MKL follow the instructions below For Standard Make projects 1 Go to C C Include Paths and Symbols property page and set the Intel MKL include path to lt mk1 directory gt include Go to C C Project Paths gt Libraries and set the Intel MKL libraries to link with your applications for example lt mkl directory gt lib em6 4t libmkl_intel_1lp 64 a lt mkl directory gt lib em64t libmkl_intel_thread a and lt mkl directory gt lib em6 4t libmkl_core a To learn how to choose the libraries see Selecting Libraries to Link Note that with the Standard Make the above settings are needed for the CDT internal functionality only The compiler linker will not automatically pick up these settings and you will still have to specify them directly in the makefile For Managed Make projects you can specify settings for a particular build To do this 1 Go to C C Build gt Tool Settings All the settings you need to specify are on this page Names of the part
81. icular settings depend upon the compiler integration and therefore are not given below If the compiler integration supports include path options set the Intel MKL include path to lt mk1_directory gt include If the compiler integration supports library path options set a path to the Intel MKL libraries for the target architecture such as lt mk1 directory gt lib em 4t Specify the names of the Intel MKL libraries to link with your application for example mkl_intel 1p64 mkl_intel_ thread_1p64 mkl_core and iomp5 compilers typically require library names rather than library file names so omit the lib prefix and a extension To learn how to choose the libraries see Selecting Libraries to Link 4 3 4 Intel Math Kernel Library User s Guide Configuring the Out of Core OOC DSS PARDISO Solver When using the configuration file for the OOC DSS PARDISO Solver be aware that the maximum length of the path lines in the file is 1000 characters For more information see the Sparse Solver Routines chapter in the Intel MKL Reference Manual 4 4 Linking Your Application with the Intel Math Kernel Library This chapter discusses linking your applications with the Intel Math Kernel Library Intel MKL for the Linux OS The chapter provides information on the libraries that should be linked with your application presents linking examples and explains how to build custom shared objects To link w
82. ides some additional enhancements and bug fixes designed to make the HPL usage more convenient as well as explain Intel Message Passing Interface MPI settings that may enhance performance The benchmarks mp_linpack directory adds techniques to minimize search times frequently associated with long runs The Intel Optimized MP LINPACK Benchmark for Clusters is an implementation of the Massively Parallel MP LINPACK benchmark by means of HPL code It solves a random dense real 8 system of linear equations Ax b measures the amount of time it takes to factor and solve the system converts that time into a performance rate and tests the results for accuracy You can solve any size N system of equations that fit into memory The benchmark uses full row pivoting to ensure the accuracy of the results Use the Intel Optimized MP LINPACK Benchmark for Clusters on a distributed memory machine On a shared memory machine use the Intel Optimized LINPACK Benchmark Intel provides optimized versions of the LINPACK benchmarks to help you obtain high LINPACK benchmark results on your systems based on genuine Intel processors more easily than with the HPL benchmark Use the Intel Optimized MP LINPACK Benchmark to benchmark your cluster The prebuilt binaries require that you first install Intel MPI 3 x be installed on the cluster The run time version of Intel MPI is free and can be downloaded from www intel com software products cluster The
83. ing post installation steps Checking Your Installation After installing Intel MKL verify that the library is properly installed and configured 1 Check that your installation directory was created By default Intel MKL installs in one of the following directories opt intel mk1 RR r y xxx where RR r is the version number y is the release update number and xxx is the package number for example opt intel mk1 10 2 0 004 lt Intel Compiler Pro directory gt mk1 for example opt intel Compiler 11 1 015 mk1 2 If you want to keep multiple versions of Intel MKL installed on your system update your build scripts to point to the correct Intel MKL version 3 Check that the following six files appear in the tools environment directory klvars32 sh klvars32 csh klvarsem64t sh klvarsem6 4t csh SB BB B B klvars64 sh mklvars64 csh Use these files to assign Intel MKL specific values to several environment variables see Setting Environment Variables on how to do it 4 To understand how the Intel MKL directories are structured see Chapter 3 2 1 2 Intel Math Kernel Library User s Guide Setting Environment Variables When the installation of Intel MKL for Linux OS is complete set the INCLUDE MKLROOT LD_LIBRARY_PATH MANPATH LIBRARY PATH CPATH FPATH and NLSPATH environment variables in the command shell using one of the script files in the tools environment directory Choose the
84. ironment for the use with Intel MKL set your environment variables using the script corresponding to your architecture see Setting Environment Variables for details 2 3 2 Intel Math Kernel Library User s Guide Mathematical problem Table 2 2 What You Need to Know Before You Begin continued Identify all Intel MKL function domains that you require BLAS Sparse BLAS LAPACK PBLAS Programming language Range of integer data Threading model 2 4 e ScaLAPACK e Sparse Solver routines Vector Mathematical Library functions Vector Statistical Library functions Fourier Transform functions FFT Cluster FFT e Trigonometric Transform routines Poisson Laplace and Helmholtz Solver routines Optimization Trust Region Solver routines GMP arithmetic functions Reason The function domain you intend to use narrows the search in the Reference Manual for specific routines you need Additionally if you are using the Intel MKL cluster software your link line is function domain specific see Working with the Intel Math Kernel Library Cluster Software Coding tips may also depend on the function domain see Tips and Techniques to Improve Performance Though Intel MKL provides support for both Fortran and C C programming not all the function domains support a particular language environment for example C C or Fortran 90 95 Identify the language interfaces that your function
85. ith Intel MKL choose one library from the Interface layer one library from the Threading layer one and typically the only library from the Computational layer and if necessary add run time libraries Table 5 1 lists typical sets of Intel MKL libraries to link with your application Table 5 1 Typical Libraries to List on a Link Line Computational Interface layer Threading layer layer RTL 1A 32 libmkl_intel a libmkl intel_ libmkl_core a libiomp5 so architecture thread a static linking 1A 32 libmkl_ intel libmkl_intel_ libmkl_core so libiomp5 so architecture so thread so dynamic linking Intel 64 and libmkl_intel_ libmkl_intel_ libmkl_core a libiomp5 so 1A 64 lp64 a thread a architectures static linking Intel 64 and libmkl_intel_ libmkl_intel_ libmkl_core so libiomp5 so 1A 64 1p64 so thread so architectures dynamic linking For exceptions and alternatives to the libraries listed above see Selecting Libraries to Link 5 1 5 Intel Math Kernel Library User s Guide See also Listing Libraries on a Link Line Working with the Intel Math Kernel Library Cluster Software Listing Libraries on a Link Line 5 2 To link with Intel MKL libraries specify paths and libraries on the link line as shown below NOTE NOTE The syntax below is for dynamic linking For static linking replace each library name preceded with 1 with the path to the library file for example
86. ith the HPL 2 0 distribution There are a few changes that are optionally compiled in and disabled until you specifically request them These new features are ASYOUGO Provides non intrusive performance information while runs proceed There are only a few outputs and this information does not impact performance This is especially useful because many runs can go for hours without any information ASYOUGO2 Provides slightly intrusive additional performance information by intercepting every DGEMM call ASYOUGO2_DISPLAY Displays the performance of all the significant DGEMMs inside the run ENDEARLY Displays a few performance hints and then terminates the run early FASTSWAP Inserts the LAPACK optimized DLASWP into HPL s code This may yield a benefit for Itanium 2 processor You can experiment with this to determine best results HYBRID Establishes the Hybrid OpenMP MPI mode of MP LINPACK providing the possibility to use threaded Intel MKL and prebuilt MP LINPACK hybrid libraries WARNING Use this option only with an Intel compiler and the Intel MPI library version 3 1 or higher You are also recommended to use the compiler version 10 0 or higher Benchmarking a Cluster To benchmark a cluster follow the sequence of steps below some of them are optional Pay special attention to the iterative steps 3 and 4 They make a loop that searches for HPL parameters specified in HPL dat that enable you to reach the top performance of yo
87. k1_core so libmk1l_scalapack_ ilp64 so libmk1_lapack so libmk1_core so 5 5 5 Intel Math Kernel Library User s Guide Table 5 3 Computational Libraries to Link by Function Domain continued Function 1A 32 Architecture Intel 64 or A 64 Architecture domain Static Dynamic Static Dynamic Cluster libmkl_ cdft_ n a libmkl_ cdft_ n a paler core a core a ransform Functions libmkl_core a libmkl_core a 1 Add also the library with BLACS routines corresponding to the used MPI For details see Linking with ScaLAPACK and Cluster FFTs See also Linking with Compiler Support RTLs Linking with Compiler Support RTLs You are strongly encouraged to dynamically link in the compatibility OpenMP run time library 1ibiomp or legacy OpenMP run time library libguide Link with libiomp and libguide dynamically even if other libraries are linked statically Linking to static OpenMP run time library is not recommended because it is very easy with complex software to link in more than one copy of the library This causes performance problems too many threads and may cause correctness problems if more than one copy is initialized If you link with Libiomp libguide statically the version of Libiomp libguide you link with depends on which compiler you use e Ifyou use the Intel compiler link in the Libiomp libguide version that comes with the compiler that is use the openmp option e Ifyou do not use the Intel compil
88. ke command for instance when using Boost version 1 37 0 make 1ib32 BOOST _ROOT lt your_path gt boost_1_ 37_0 Invoking Intel MKL Functions from Java Applications 1 This section describes examples that are provided with the Intel MKL package and illustrate calling the library functions from Java Intel MKL Java Examples To demonstrate binding with Java Intel MKL includes a set of Java examples in the following directory lt mkl directory gt examples java The examples are provided for the following MKL functions e gemm gemv and dot families from CBLAS e The complete set of non cluster FFT functions _ESSL like functions for one dimensional convolution and correlation e VSL Random Number Generators RNG except user defined ones and file subroutines e VML functions except GetErrorCallBack SetErrorCallBack and ClearErrorCallBack You can see the example sources in the following directory lt mkl directory gt examples java examples The examples are written in Java They demonstrate usage of the MKL functions with the following variety of data e 1 and 2 dimensional data sequences IBM Engineering Scientific Subroutine Library ESSL 7 12 Language specific Usage Options 7 real and complex types of the data e single and double precision However the wrappers used in the examples do not e Demonstrate the use of huge arrays gt 2 billion elements e Demonstrate processing of arrays in na
89. le such as OMP_NUM_ THREADS is your login environment Remember that changing this value on the head node and then doing your run as you do on a shared memory SMP system does not change the variable on all the nodes because mpirun Starts a fresh default shell on all of the nodes To change the number of threads on all the nodes in bashrc add a line at the top as follows OMP NUM _THREADS 1 export OMP_NUM_THREADS You can run multiple CPUs per node using MPICH To do this build MPICH to enable multiple CPUs per node Be aware that certain MPICH applications may fail to work perfectly in a threaded environment see the Known Limitations section in the Release Notes If you encounter problems with MPICH and setting of the number of threads is greater than one first try setting the number of threads to one and see whether the problem persists Using Shared Libraries All needed shared libraries must be visible on all the nodes at run time To achieve this point these libraries by the LD LIBRARY PATH environment variable in the bashrc file If Intel MKL is installed only on one node link statically when building your Intel MKL applications rather than use shared libraries The Intel compilers or GNU compilers can be used to compile a program that uses Intel MKL However make sure that the MPI implementation and compiler match up correctly 9 3 9 Intel Math Kernel Library User s Guide Building ScaLAPACK Tests To build
90. m DGEMM dgemm_ DGEMM_ See Example 7 2 on how to call BLAS routines from C CBLAS Instead of calling BLAS routines from a C language program you can use the CBLAS interface CBLAS is a C style interface to the BLAS routines You can call CBLAS routines using regular C style calls When using the CBLAS interface the header file mk1 h will simplify the program development because it specifies enumerated values as well as prototypes of all the functions The header determines if the program is being compiled with a C compiler and if it is the included file will be correct for use with C compilation Example 7 3 illustrates the use of the CBLAS interface Language specific Usage Options 7 Using Complex Types in C C As described in the Building Applications document for the Intel Fortran Compiler C C does not directly implement the Fortran types COMPLEX 4 and COMPLEX 8 However you can write equivalent structures The type COMPLEX 4 consists of two 4 byte floating point numbers The first of them is the real number component and the second one is the imaginary number component The type COMPLEX 8 is similar to COMPLEX 4 except that it contains two 8 byte floating point numbers Intel MKL provides complex types MKL Complex8 and MKL Complex16 which are structures equivalent to the Fortran complex types COMPLEX 4 and COMPLEX 8 respectively These types are defined in the mkl_types h header file You can use thes
91. m Pentium Pentium Inside skoool Sound Mark The J ourney Inside VTune Xeon and Xeon Inside are trademarks of Intel Corporation in the U S and other countries Other names and brands may be claimed as the property of others Copyright 2006 2009 Intel Corporation All rights reserved Contents Chapter 1 Chapter 2 Chapter 3 Overview Technical SUpport seseris back aaeain aa a a a e a gai 1 1 About This DocumMeNE aimi idee oad oa daMginad oaths 1 1 Related Information c cece cece eee eect eee eee neta teeta eae ene ed 1 2 Document Organization cccceee eee ee eee eee eee eee eee eae ee ened 1 2 Notational COnVentiOns c cece cece eee eee eee teeta ened 1 3 Getting Started Checking Your Installation cece cece eee eee eee eee tented 2 1 Setting Environment Variables cccecee cece ee eee eee eee ee eee ee nee eaees 2 2 Using the Web based Linking AdViSOFr cceceee eee ee eee ee eee ee ed 2 2 Using Intel MKL Code Examples cccecceee cee eee cnet ee eee ee eee eae eatin 2 2 Compiler Support serian aaa Saeed eee dan pea sade 2 3 Before You Begin Using Intel MKL ececeeeeeneee eee ee eee ee eee eeeeeeees 2 3 Intel Math Kernel Library Structure Architecture SUPPOIt 0 riein eee eee eee eee eee E Aaaa 3 1 High level Directory Structure ccccceeee eee e eee e eee ee ee eee ee eee ea eta enes 3 1 Layered Model Co
92. m SIZE n SIZE k SIZE lda SIZE ldb SIZE ldc SIZE i 0 j 0 char transa n transb n for i 0 i lt SIZE i for j 0 j lt SIZE j ali SIZE j double i j b i SIZE 3 double i j c i SIZE j double 0 6 5 6 Intel Math Kernel Library User s Guide Example 6 1 Changing the Number of Threads continued cbhlas_ dgemm CblasRowMajor CblasNoTrans CblasNoTrans A m n k alpha a lda b ldb beta c lde printf row ta tc n for i 0 i lt 10 i printf d t f t f n i ali SIZE c i SIZE omp set_num threads 1 for i 0 aie i for j 0 j lt SIZE j a i SIZE j double i j b i SIZE j double i j c i SIZE j double 0 cbhlas_dgemm CblasRowMajor CblasNoTrans CblasNoTrans m n k alpha a lda b ldb beta c lde printf row ta tc n for i 0 1 lt 10 i printf Sd t t n i ali SIZE c i SIZE omp_ set _num_threads 2 for i 0 i lt SIZE i for j 0 j lt SIZE j i SIZE j double i i SIZE j double i 0 a b c i SIZE j double 0 oe cblas_dgemm CblasRowMajor CblasNoTrans CblasNoTrans m n k alpha a lda b ldb beta c ldc printf row ta tc n for i 0 1 lt 10 i printf d t f t f n i ali SIZE c i SIZE delete a delete b delete c kKkKKKKK Fortran language KkKKKKKK PROGRAM DGEMM DIFF THREADS 6 6 Mana
93. mation on how to build Fortran 95 LAPACK and BLAS interface libraries 5 8 Linking Your Application with the Intel Math Kernel Library 5 ifort myprog f LSMKLPATH ISMKLINCLUDE W1l start group MKLPATH libmkl_ intel ilp6 4 a SMKLPATH libmkl_intel_thread a MKLPATH libmkl_ core a W1 end group liomp5 lpthread 6 Dynamic linking of myprog f and parallel Intel MKL supporting the ILP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE lmk1l_intel_ilp 64 1lmkl_intel_ thread lmkl_core liomp5 lpthread 7 Static linking of myprog Fortran 95 LAPACK interfacet and parallel Intel MKL supporting the LP64 interface ifort myprog f LS MKLPATH ISMKLINCLUDE ISMKLINCLUDE em64t 1p64 lmkl_lapack95_1p64 Wl start group MKLPATH libmkl_intel_ lp6 4 a SMKLPATH libmkl_intel_thread a MKLPATH libmkl_ core a W1 end group liomp5 lpthread 8 Static linking of myprog f Fortran 95 BLAS interface and parallel Intel MKL supporting the LP64 interface ifort myprog f LS MKLPATH ISMKLINCLUDE ISMKLINCLUDE em64t 1p64 lmkl_blas95_1p64 W1l start group MKLPATH libmkl_ intel lp64 a SMKLPATH libmkl_intel_thread a MKLPATH libmkl_ core a W1 end group liomp5 lpthread Building Custom Shared Objects Custom shared objects enable you to reduce the collection of functions available in Intel MKL libraries to those required to solve your particular problems which helps to save disk space and build your own dynamic libr
94. me cases the use of the hybrid mode is required for external reasons If there is a choice the non hybrid code may be faster To use the non hybrid code in a hybrid mode use the threaded version of Intel MKL BLAS link with a thread safe MPI and call function MPI init thread so as to indicate a need for MPI to be thread safe Intel MKL also provides prebuilt binaries that are dynamically linked against Intel MPI libraries NOTE NOTE Performance of statically and dynamically linked prebuilt binaries may be different The performance of both depends on the version of Intel MPI you are using You can build binaries statically linked against a particular version of Intel MPI by yourself Contents The Intel Optimized MP LINPACK Benchmark for Clusters MP LINPACK Benchmark includes the HPL 2 0 distribution in its entirety as well as the modifications delivered in the files listed in Table 11 2 and located in the benchmarks mp_linpack subdirectory in the Intel MKL directory see Table 3 2 Table 11 2 Contents of the MP LI NPACK Benchmark benchmarks mp_linpack testing ptest HPL pdtest c HPL 2 0 code modified to display captured DGEMM information in ASYOUGO2_DISPLAY see details in the New Features section if it was captured src blas HPL dgemm c HPL 2 0 code modified to capture DGEMM information 7 if desired from ASYOUGO2_ DISPLAY src grid HPL_ grid_init c HPL 2 0 code modified to do additional grid experiments originally no
95. ments Intel MKL 10 0 Gold release Configuring of Eclipse CDT 4 0 to October 2007 link with Intel MKL has been described in chapter 3 Compatibility Open MP run time library 1ibiomp has been described 006 Documents Intel MKL 10 1 beta release Information on dummy libraries in May 2008 Table High level directory structure has been further detailed Information on the Intel MKL configuration file removed Section Accessing Man Pages has been added to chapter 3 Section Support for Boost uBLAS Matrix Matrix Multiplication has been added to chapter 7 Chapter Getting Assistance for Programming in the Eclipse IDE has been added 007 Documents Intel MKL 10 1 gold release Linking examples for A 32 August 2008 architecture and section Linking with Computational Libraries have been added to chapter 5 Integration of DSS PARDISO into the layered structure has been documented Two Fortran code examples have been added 008 Documents Intel MKL 10 2 beta release Prebuilt Fortran 95 interface January 2009 libraries and modules for BLAS and LAPACK have been described Support for Intel Advanced Vector Extensions Intel AVX has been documented Discontinuation of support for dummy libraries and legacy linking model has been also documented Chapter 5 has been restructured 009 Documents Intel MKL 10 2 gold release The document has been March 2009 considerably restructured The Getting Started chapter h
96. n parameter result gets exposed Normal Fortran function call result cdotc n x 1 y 1 A call to the function as a subroutine call cdotc result n x 1 y 1 7 7 7 Intel Math Kernel Library User s Guide A call to the function from C cdotc amp result amp n x amp one y amp one NOTE NOTE Intel MKL has both upper case and lower case entry points in the Fortran style case insensitive BLAS with or without the trailing underscore So all these names are equivalent and acceptable cdotc CDOTC cdotc_ CDOTC The above example shows one of the ways to call several level 1 BLAS functions that return complex values from your C and C applications An easier way is to use the CBLAS interface For instance you can call the same function using the CBLAS interface as follows cbhlas_cdotu n x 1 y 1 amp result pa NOTE The complex value comes last on the argument list in this case The following examples show use of the Fortran style BLAS interface from C and C as well as the CBLAS C language interface The example below illustrates a call from a C program to the complex BLAS Level 1 function zdotc This function computes the dot product of two double precision complex vectors In this example the complex dot product is returned in the structure c Example 7 1 Calling a Complex BLAS Level 1 Function from C include mkl h define N 5 void main MKL int n N inca 1 incb
97. n time library 7 4 compatibility OpenMP 3 5 legacy OpenMP 3 5 S ScaLAPACK linking with 9 1 stability numerical 8 1 subnormal number performance 6 17 support technical 1 1 syntax linking cluster software 9 1 T technical support 1 1 thread safety of Intel R MKL 6 2 threading avoiding conflicts 6 4 environment variables and functions 6 8 Intel R MKL behavior particular cases 6 10 Intel R MKL controls 6 8 see also number of threads U uBLAS matrix matrix multiplication substitution with Intel MKL functions 7 10 unstable output numerically getting rid of 8 1 usage information 1 1 Index 3
98. nal Conventions Italic Italic is used for emphasis and also indicates document names in body text for example see Intel MKL Reference Manual 1 3 1 Intel Math Kernel Library User s Guide Table 1 1 Notational Conventions continued Monospace Indicates filenames directory names and pathnames for example lowercase libmkl_core a opt intel mk1 10 2 0 004 Monospace Indicates commands and command line options for example lowercase mixed icc myprog c LSMKLPATH ISMKLINCLUDE lmkl lguide lpthread with uppercase C C code fragments for example a new double SIZE SIZE UPPERCASE Indicates system variables for example SMKLPATH MONOS PACE Monospace italic Indicates a parameter in discussions routine parameters for example 1da makefile parameters for example functions_list etc When enclosed in angle brackets indicates a placeholder for an identifier an expression a string a symbol or a value for example lt mk1 directory gt Substitute one of these items for the placeholder items Square brackets indicate that the items enclosed in brackets are optional item item Braces indicate that only one of the items listed between braces should be selected A vertical bar separates the items 1 4 Getting Started This chapter helps you get started with the Intel Math Kernel Library Intel MKL on Linux OS by providing the basic information needed to start using the library includ
99. ncept cc ccecceeeee eee e eee teeta teeta eae ee ene nae tates 3 3 Sequential Mode of the Library ccecceeee eee ee eee ee eee eee eee ee en rnn 3 5 Support for ILP64 PrograMMing ceceeeeee eee e ee eee eee teeta nee eaees 3 6 Directory Structure in Detail c cece eee eee ee eee eee teeta ene eae eee 3 8 Accessing the Intel MKL Documentation ccceeeeeeee ee eee eae 3 20 Contents of the Documentation Directory ccceeeee eee ee eee 3 20 Viewing Man Pages cccece eect nent eee eee teeta e eae ee teeta ene e nee enas 3 20 Intel Math Kernel Library User s Guide Chapter 4 Configuring Your Development Environment Automating Setting of Environment Variables ceeeeee es 4 1 Configuring the Eclipse IDE CDT to Link with Intel MKL 4 2 Configuring the Eclipse IDE CDT 4 0 cece eeeee eee ee eee eee eee 4 2 Configuring the Eclipse IDE CDT 3 X cceeeee cece eee eee ee teen eee 4 3 Configuring the Out of Core OOC DSS PARDISO Solver 4 4 Chapter 5 Linking Your Application with the Intel Math Kernel Library Listing Libraries on a Link Line 0c cece eee eee eee tena eta ene 5 2 Selecting Libraries to LINK cece cece e eee eee eee eee teeta e eae 5 3 Linking with Fortran 95 Interface Libraries ccceeeeeeee ees 5 3 Linking with Threading Libraries ccceeee
100. nd Memory 6 e For the bash shell enter export OMP_NUM_THREADS lt number of threads to use gt e For the csh or tcsh shell enter set OMP_NUM THREADS lt number of threads to use gt See Using Additional Threading Control on how to set the number of threads using Intel MKL environment variables for example MKL_NUM THREADS Changing the Number of Threads at Run Time You cannot change the number of threads during run time using the environment variables However you can call OpenMP API functions from your program to change the number of threads during run time The following sample code shows how to change the number of threads during run time using the omp_set_num threads routine See also Techniques to Set the Number of Threads The following example shows both C and Fortran code examples To run this example in the C language use the omp h header file from the Intel compiler package If you do not have the Intel compiler but wish to explore the functionality in the example use Fortran API for omp_set_num threads rather than the C version For example omp_set_num_threads_ amp i_one Example 6 1 Changing the Number of Threads e k C language include omp h include mkl h include lt stdio h gt define SIZE 1000 void main int args char argv double a b c a new double SIZE SIZE b new double SIZE SIZE c new double SIZE SIZE double alpha 1 beta 1 int
101. neeeaees 11 8 NEW FEATURES iia ii cde paN cata AA A ia NEAGA 11 9 Benchmarking a Cluster 0 cece eect ee eee eee ee tena eee rreren 11 9 Appendix A Intel Math Kernel Library Language Interfaces Support Appendix B Support for Third Party Interfaces GMP PUNGUIONS aaao a Seni aan te te out a ead anand aaa eae eee tae B 1 FFTW Interface SUpport cece cece eee e eee teeta teen ee tae ea ene naes B 1 Index List of Tables Table 1 1 Notational ConventionsS ccccceeee eee eset ee eee eee ee eae 1 3 Table 2 1 Scripts to Set the Environment Variables c0ceeeeees 2 2 Table 2 2 What You Need to Know Before You BeGQin 0 00 2 3 Table 3 1 Architecture specific Implementations 0 eeeee 3 1 Table 3 2 High level Directory Structure c cc cceeeee eee eens eee ea eee 3 2 Table 3 3 Intel MKL LayerS ccccceeeeee etree eee eee eset eee aes 3 4 Table 3 4 Compiling for the ILP64 and LP64 Interfaces 5 3 6 Table 3 5 Integer TYP S cccce cece cece cence ee eee teeta eee eee etna teats 3 7 Table 3 6 Detailed Structure of the A 32 Architecture Directory lib 32 3 9 Table 3 7 Detailed Structure of the Intel 64 Architecture Directory lib embi act eA uti ce ih ne aten gear a fecha deci Gu dads CxewG eed oves al 3 12 Table 3 8 Detailed Structure of the A 64 Architecture Directory te og Menem eee ie oo nt Cn Bt Weve Ale E UME COI ME
102. neral system file in etc profile for bash and sh or in etc csh login for csh Before uninstalling Intel MKL to avoid problems logging in later remove the above commands from all profile files where the script execution was added Configuring the Eclipse IDE CDT to Link with Intel MKL This section describes how to configure the Eclipse Integrated Development Environment IDE C C Development Tools CDT 3 x and 4 0 to link with Intel MKL Eclipse provided code assist feature See Code Context Assist description Q TIP After linking your CDT with Intel MKL you can benefit from the in Eclipse Help Configuring the Eclipse IDE CDT 4 0 4 2 Before configuring Eclipse IDE CDT 4 0 make sure to turn on the automatic makefile generation To configure Eclipse CDT 4 0 to link with Intel MKL follow the instructions below 1 If the tool chain compiler integration supports include path options go to C C General gt Paths and Symbols gt Includes and set the Intel MKL include path that is lt mkl directory gt include 2 If the tool chain compiler integration supports library path options go to C C General gt Paths and Symbols gt Library Paths and set the Intel MKL library path for the target architecture such as lt mk1 directory gt lib em 64t Configuring Your Development Environment 4 Go C C Build gt Settings gt Tool Settings and specify the names of the Intel MKL libraries to link with your appl
103. niques ccceeece a eee eee e neta a Ea 6 14 Hardware Configuration Tips c cceeeeee ee eee eee teeta eee eee naees 6 15 Managing Multi core Performance ceceeeeee ee eee eee eee eee ea ees 6 15 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Contents Operating on Denormals cceceee cece e eee eee eee eee eee eae na eee 6 17 FFT Optimized Radices ccceeeee eee e eect eee teeta e eae ee teeta eee eee ed 6 17 Using the Intel MKL Memory Management e ceeeeeee eee ees 6 18 Redefining Memory FUNCtIONS 0 cece eee e ee eee eee tena teeta teeta ed 6 18 Language specific Usage Options Using Language Specific Interfaces with Intel MKL 00ceeees 7 1 Mixed language Programming with Intel MKL eccerre 7 5 Calling LAPACK BLAS and CBLAS Routines from C Language Environments ici dan a eet ine a aa da a a a aia aaa ene 7 5 Using Complex Types in C C occ cccee eee ee eee eee ee ee eee e ne enaens 7 7 Calling BLAS Functions that Return the Complex Values in C C Code e one caves eae Getica vanced e Aa cat che cae on a a aa aie eal ongn cane 7 7 Support for Boost uBLAS Matrix matrix Multiplication 7 10 Invoking Intel MKL Functions from Java Applications 7 12 Coding Tips Aligning Data for Numerical Stability eesse cece eset teeta ee 8 1 Working with the Intel Math Kernel Library Cluster Soft
104. ogy is especially effective when each thread performs different types of operations and when there are under utilized resources on the processor However Intel MKL fits neither of these criteria because the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread You may obtain higher performance by disabling HT Technology See Using the Intel MKL Parallelism for information on the default number of threads changing this number and other relevant details If you run with HT enabled performance may be especially impacted if you run on fewer threads than physical cores Moreover if for example there are two threads to every physical core the thread scheduler may assign two threads to some cores and ignore the other cores altogether If you are using the OpenMP library of the Intel Compiler read the respective User Guide on how to best set the thread affinity interface to avoid this situation For Intel MKL you are recommended to set KMP_AFFINITY granularity fine compact 1 0 Managing Multi core Performance You can obtain best performance on systems with multi core processors by requiring that threads do not migrate from core to core To do this bind threads to the CPU cores by setting an affinity mask to threads Use one of the following options 6 15 6 Intel Math Kernel Library User s Guide e OpenMP facilities recommended if availa
105. ollows make 1a32 export my_func_list txt name mkl_small xerbla my_xerbla o In this case the command creates the mk1_small so library for processors using the A 32 architecture The command takes the list of functions from my_func_list txt file and uses the user s error handler my_xerbla o The process is similar for processors using the Intel 64 or IA 64 architecture Specifying a List of Functions In the list of functions provided in the user_list file adjust function names to the required interface For example for Fortran functions append an underscore character _ to the names as a suffix dgemm_ ddot_ dgetrf_ If selected functions have several processor specific versions they all will be automatically included in the custom library and managed by the dispatcher See the lt mk1 directory gt tools builder folder for complete lists of functions in different function domains Distributing Your Custom Shared Object To enable use of your custom shared object in a threaded mode distribute 1ibiomp5 so along with the custom shared object 5 11 Managing Performance and Memory This chapter shows different ways to get the best performance with the Intel Math Kernel Library Intel MKL it first discusses Intel MKL parallelism then explains coding techniques and finally provides hardware configuration tips to improve the performance of the library The chapter also explains the Intel MKL memory management and shows
106. on Directory For the contents of subdirectories in the benchmarks directory see LINPACK and MP LINPACK Benchmarks Note that in MKL 10 2 libraries that provided link line compatibility with the Intel MKL versions 9 x and lower were removed Intel Math Kernel Library Structure 3 Table 3 6 Detailed Structure of the I A 32 Architecture Directory lib 32 File Static Libraries Interface layer libmkl_blas95 a libmkl_gf a libmkl_intel a libmkl_lapack95 a Threading layer libmkl_gnu_thread a libmkl_intel_thread a libmkl_pgi_thread a libmkl_sequential a Computational layer libmkl_cdft_core a libmkl_core a libmkl_scalapack_ core a libmkl_solver a libmkl_solver_ sequential a RTL libguide a libiomp5 a libmkl_blacs a libmk1_blacs_ intelmpi a libmkl_blacs_ intelmpi20 a libmkl_blacs_openmpi a Contents Fortran 95 interface library for BLAS for the Intel Fortran compiler Interface library for the GNU Fortran compiler Interface library for the Intel compilers Fortran 95 interface library for LAPACK for the Intel Fortran compiler Threading library for the GNU Fortran and C compilers Threading library for the Intel compilers Threading library for the PGI compiler Sequential library Cluster version of FFTs Kernel library for the A 32 architecture ScaLAPACK routines Deprecated Empty library for backward compatibility Deprecated Empty library for backward compatibility Legacy OpenMP
107. onal Edition or Intel Fortran Compiler Professional Edition 1 Fortran 95 interface to BLAS 2 Fortran 95 interface to LAPACK Layered Model Concept Starting with release 10 0 Intel MKL uses a layered model There are four essential parts of the library 1 Interface layer 2 Threading layer 3 Computational layer 4 Compiler Support Run time libraries Each part consists of several libraries that process independent cases in this part For example e On systems based on the Intel 64 of IA 64 architecture the 1ibmkl_intel_lp64 a library in the Interface layer adapts Intel MKL to the use of 32 bit integer types and the way how Intel compilers return function values 3 3 3 Intel Math Kernel Library User s Guide e The libmkl_intel_thread a library in the Threading layer adapts Intel MKL to the OpenMP implementation used by Intel compilers and the libmkl_sequential a library adapts Intel MKL to the non threaded mode The Computational layer is the bulk of Intel MKL The library in this layer contains only code needed for pure computations without adaptation to interfaces or OpenMP threading Being organized this way Intel MKL avoids duplication of the same code in different libraries and thus considerably saves space You can combine Intel MKL independent libraries to meet your needs by linking with one library in each part layer by layer Once the interface library is selected the threading library you select
108. or Threading Controls Environment Variable Service Function Comment MKL_NUM_THREADS mkl_set_num threads Suggests the number of threads to use MKL DOMAIN NUM __ mkl domain set num Suggests the number of THREADS threads threads for a particular function domain MKL DYNAMIC mkl_set_dynamic Enables Intel MKL to dynamically change the number of threads Equivalent OpenMP Environment Variable OMP_NUM_THREADS OMP_ DYNAMIC variables NOTE The functions take precedence over the respective environment In particular if in your application you want Intel MKL to use a given number of threads and do not want users of your application to change this via environment variables set this number of threads by a call to mk1_set_num_ threads which will have full precedence over any environment variables being set 6 9 6 Intel Math Kernel Library User s Guide The example below illustrates the use of the Intel MKL function mkl_set_num threads to set one thread Example 6 2 Setting the Number of Threads to One 6 10 RKKKKR C language x include lt omp h gt include lt mkl h gt mkl_set_num threads 1 ke Fortran language call mkl_set_num_threads 1 The section further explains the Intel MKL environment variables for threading control See the Intel MKL Reference Manual for the detailed description of the threading control functions their parameters calling synt
109. r command name with an additional parameter of make FC lt compiler gt 7 3 7 Intel Math Kernel Library User s Guide 7 4 For example command make libem6 4t FC pgf95 INSTALL DIR lt user_pgf95 dir gt interface 1p64 builds the required library and mod files and installs them in subdirectories of lt user_ pgf95 dir gt To delete the library from the building directory use the following commands make clean32 INSTALL DIR lt user_ dir gt for the A 32 architecture make cleanem64t INSTALL DIR lt user_dir gt for the Intel 64 architecture make clean6 4 INSTALL DIR lt user_dir gt for the A 64 architecture make clean INSTALL DIR lt user_ dir gt for all the architectures p NOTE Setting INSTALL_DIR or INSTALL DIR lt mk1_directory gt in a build or clean command above will replace or delete the Intel MKL prebuilt Fortran 95 library and modules Though this is possible only if you have administrative rights you are strongly discouraged from doing this Compiler dependent Functions and Fortran 90 Modules Compiler dependent functions occur whenever the compiler inserts into the object code function calls that are resolved in its run time library RTL Linking of such code without the appropriate RTL will result in undefined symbols Intel MKL has been designed to minimize RTL dependencies Where dependencies occur a supporting RTL is shipped with Intel MKL The only examples of such RTLs except those that
110. re prebuilt libraries lib hybrid 32 libhpl_hybrid a lib hybrid em 6 4t libhpl _ hybrid a lib hybrid 64 libhpl_ hybrid a Next 18 files refer to run scripts bin_intel ia32 runme_ia32 bin _intel ia32 runme_ia32_ dynamic bin _intel ia32 HPL_serial dat bin _intel ia32 runme_hybrid_ia32 bin _intel ia32 runme_hybrid_ia32_dynamic bin _intel ia32 HPL hybrid dat bin _intel em6 4t runme_em6 4t bin _intel em 6 4t runme_em 6 4t_ dynamic bin _intel em6 4t HPL serial dat bin _intel em64t runme_hybrid_em64t bin _intel em64t runme_hybrid_em64t_dynamic New Prebuilt library with the hybrid version of MP LINPACK for the A 32 architecture and Intel MPI 3 2 New Prebuilt library with the hybrid version of MP LINPACK for the Intel 64 architecture and Intel MPI 3 2 New Prebuilt library with the hybrid version of MP LINPACK for the IA 64 architecture and Intel MPI 3 2 New Sample run script for the A 32 architecture and a pure MPI binary statically linked against Intel MPI 3 2 New Sample run script for the A 32 architecture and a pure MPI binary dynamically linked against Intel MPI 3 2 New Example of an MP LINPACK benchmark input file for a pure MPI binary and the IA 32 architecture New Sample run script for the A 32 architecture and a hybrid binary statically linked against Intel MPI 3 2 New Sample run script for the A 32 architecture and a hybrid binary dynamicall
111. rectory gt include em64t ilp64 nclude em 6 4t 1p64 nterfaces blas95 nterfaces nterfaces fftw2xc nterfaces fftw2xf nterfaces fftw3xc nterfaces fftw3xf directory gt interfaces lapack95 directory gt lib 32 High level Directory Structure Contents Intel MKL main directory For the default installation directory see Checking Your Installation Shared memory SMP version of the LINPACK benchmark Message passing interface MPI version of the LINPACK benchmark Documentation for the stand alone Intel MKL Examples directory Each subdirectory has source and data files INCLUDE files for the library routines as well as for tests and examples BLAS95 and LAPACK952 mod files for the A 32 architecture and Intel Fortran compiler BLAS95 and LAPACK95 mod files for the A 64 architecture Intel Fortran compiler and ILP64 interface BLAS95 and LAPACK95 mod files for the A 64 architecture Intel Fortran compiler and LP64 interface BLAS95 and LAPACK95 mod files for the Intel 64 architecture formerly Intel EM64T Intel Fortran compiler and ILP64 interface BLAS95 and LAPACK95 mod files for the Intel 64 architecture Intel Fortran compiler and LP64 interface Fortran 95 interfaces to BLAS and a makefile to build the library MPI FFTW 2 x interfaces to the Intel MKL Cluster FFTs FFTW 2 x interfaces to the Intel MKL FFTs C interface FFTW 2 x interfaces to the Intel MKL FFTs Fortran in
112. rmation for the library The usage information covers the organization configuration performance and accuracy of Intel MKL specifics of routine calls in mixed language programming linking and more This guide describes OS specific usage of Intel MKL along with OS independent features It contains usage information for all Intel MKL function domains listed in Table A 1 in Appendix A This User s Guide provides the following information e Describes post installation steps to help you start using the library e Shows you how to configure the library with your development environment e Acquaints you with the library structure e Explains how to link your application to the library and provides simple usage scenarios 1 1 1 Intel Math Kernel Library User s Guide Related Information e Describes how to code compile and run your application with Intel MKL This guide is intended for Linux OS programmers with beginner to advanced experience in software development To reference how to use the library in your application use this guide in conjunction with the following documents e The Intel MKL Reference Manual which provides reference information on routine functionalities parameter descriptions interfaces calling syntaxes and return values e The Intel Math Kernel Library for Linux OS Release Notes Document Organization The document contains the following chapters and appendices 1 2 Chapter 1 Ch
113. rol Managing Multi core Performance Except the LAPACK deprecated routine lacon 6 2 Managing Performance and Memory 6 Techniques to Set the Number of Threads Use one of the following techniques to change the number of threads to use in Intel MKL e Set one of the OpenMP or Intel MKL environment variables OMP_NUM_ THREADS MKL NUM THREADS MKL DOMAIN NUM THREADS e Call one of the OpenMP or Intel MKL functions omp set_num threads mkl_set_num threads mkl_domain_set_num threads When choosing the appropriate technique take into account the following rules e The Intel MKL threading controls take precedence over the OpenMP controls e A function call takes precedence over any environment variables The exception which is a consequence of the previous rule is the OpenMP subroutine omp_set_num threads which does not have precedence over Intel MKL environment variables such as MKL_NUM_THREADS See Using Additional Threading Control for more details e The environment variables cannot be used to change run time behavior in the course of the run because they are read only once at the first call to Intel MKL Avoiding Conflicts in the Execution Environment Certain situations can cause conflicts in the execution environment that make the use of threads in Intel MKL problematic This section briefly discusses why these problems exist and how to avoid them If you thread the program
114. rt this behavior as a memory leak In addition to calling the mkl_free_ buffers function you can release free memory in your program by setting an environment variable The memory management software is turned on by default which leaves memory allocated by calls to Level 3 BLAS and FFT until the program ends To disable this behavior of the memory management software set the MKL_ DISABLE FAST MM environment variable to any value This configures the memory management software to allocate and free memory from call to call Disabling this feature will negatively impact performance of routines such as the level 3 BLAS especially for small problem sizes Using one of these methods to release memory will not necessarily stop programs from reporting memory leaks and in fact may increase the number of such reports in case you make multiple calls to the library thereby requiring new allocations with each call Memory not released by one of the methods described previously will be released by the system when the program ends Redefining Memory Functions 6 18 In C C programs you can replace Intel MKL memory functions that the library uses by default with their own functions To do this use the memory renaming feature Memory Renaming Intel MKL memory management by default uses standard C run time memory functions to allocate or free memory These functions can be replaced using memory renaming Intel MKL accesses the memory function
115. run time libraries of the Intel compiler the compatibility OpenMP run time library libiomp and legacy OpenMP run time library libguide The compatibility library libiomp is an extension of 1ibguide that provides support for one additional threading compiler on Linux OS GNU That is a program threaded with a GNU compiler can safely be linked with Intel MKL and libiomp So you are encouraged to use libiomp rather than libguide 5 3 5 Intel Math Kernel Library User s Guide Table 5 2 shows different scenarios depending on the threading compiler used and the possibilities for each scenario to choose the threading libraries and RTL when using Intel MKL static cases only Table 5 2 Selecting Threading Libraries Application RTL Compiler Threaded Threading Layer Recommended Comment Intel Does not libmkl_intel_thread a libiomp5 so matter E PGI Yes libmkl_pgi_thread a or PGI supplied Use of libmk1_ libmkl_ sequential a sequential a removes threading from Intel MKL calls PGI No libmkl_intel_thread a libiomp5 so PGI No libmkl_pgi_thread a PGI supplied PGI No libmkl_sequential a None gnu Yes libmkl gnu _thread a libiomp5 so or libiomp5 offers _ GNU OpenMP superior scaling run time library performance gnu Yes libmkl_sequential a None gnu No libmkl_ intel thread a libiomp5 so other Yes libmkl_sequential a None other No libmkl_intel_thread a libiomp5 so Linking with Computational Libraries 5 4 Typically with
116. run time library for static linking Compatibility OpenMP run time library for static linking BLACS routines supporting the following MPICH versions Myricom MPICH version 1 2 5 10 e ANL MPICH version 1 2 5 2 BLACS routines supporting Intel MPI 2 0 3 x and MPICH2 A soft link to Lib 32 libmk1l_blacs_ intelmpi a BLACS routines supporting OpenMPI 3 9 3 Intel Math Kernel Library User s Guide Table 3 6 Detailed Structure of the I A 32 Architecture Directory lib 32 continued File Dynamic Libraries Interface layer libmkl_gf so libmkl_intel so Threading layer libmk1l_gnu_thread so libmkl_intel_ thread sot libmk1l_pgi_thread so libmkl sequential so Computational layer libmkl_core so libmkl_def so libmkl_lapack so libmk1_p4 so libmk1l_p4m so libmk1_p4m3 so libmkl_ p4p so libmkl_scalapack_ core so libmk1l_vml_def so libmkl_vml_ia so libmk1_vml_p4 so libmkl_vml_p4m so libmkl_vml_p4m2 so libmkl_vml_p4m3 so libmkl_vml_p4p so 3 10 Contents Interface library for the GNU Fortran compiler Interface library for the Intel compilers Threading library for the GNU Fortran and C compilers Threading library for the Intel compilers Threading library for the PGI compiler Sequential library Library dispatcher for dynamic load of processor specific kernel library Default kernel library Intel Pentium Pentium Pro Pentium II and Pentium III processors LAPACK and DSS
117. ry structure reduce its size and add usage flexibility See also Layered Model Concept Architecture Support Intel MKL for Linux OS provides three architecture specific implementations Table 3 1 lists the supported architectures and directories where each architecture specific implementation is located Table 3 1 Architecture specific mplementations Architecture Location A 32 or compatible lt mkl directory gt 1ib 32 lt mkl directory gt lib em 4t lt mkl directory gt 1ib 64 Intel 64 or compatible A 64 See a detailed structure of these directories in Table 3 6 Table 3 7 and Table 3 8 See also High level Directory Structure High level Directory Structure Table 3 2 shows a high level directory structure of Intel MKL after installation 3 1 3 Intel Math Kernel Library User s Guide Table 3 2 Directory lt mkl directory gt lt mk1 directory gt benchmarks linpack lt mk1 directory gt benchmarks mp_linpack lt mk1 lt mk1 lt mkl directory gt i lt mk1 directory gt i lt mk1 directory gt i lt mk1 directory gt i lt mk1 lt mkl directory gt i lt mk1 directory gt i lt mk1 directory gt i ftw2x_cdft lt mk1 lt mk1 directory gt i directory gt i lt mk1 lt mk1 directory gt i directory gt i lt mk1 lt mk1 3 2 directory gt doc directory gt examples nclude nclude 32 nclude 64 ilp64 nclude 64 1p64 di
118. s The M lops is an estimate based on 1280 columns of LU being completed However with lookahead steps sometimes that work is not actually completed when the output is made Nevertheless this is a good estimate for comparing identical runs The 3 numbers in parenthesis are intrusive ASYOUGO2 addins The DT is the total time processor 0 has spent in DGEMM The DF is the number of billion operations that have been performed in DGEMM by one processor Hence the performance of processor 0 in Gflops in DGEMM is always DF DT Using the number of DGEMM flops as a basis instead of the number of LU flops you get a lower bound on performance of our run by looking at DMF which can be compared to Mflops above It uses the global LU time but the DGEMM flops are computed under the assumption that the problem is evenly distributed amongst the nodes as only HPL s node 0 0 returns any output Note that when using the above performance monitoring tools to compare different HPL dat input data sets you should be aware that the pattern of performance drop off that LU experiences is sensitive to some input data For instance when you try very small problems the performance drop off from the initial values to end values is very rapid The larger the problem the less the drop off and it is probably safe to use the first few performance values to estimate the difference between a problem size 700000 and 701000 for instance Another factor that influences
119. s by pointers i_malloc i free i_calloc and i_realloc which are visible at the application level These pointers initially hold addresses of the standard C run time memory functions malloc free calloc and realloc respectively You can programmatically redefine values of these pointers to the addresses of your application s memory management functions Managing Performance and Memory 6 Redirecting the pointers is the only correct way to use your own set of memory management functions If you call your own memory functions without redirecting the pointers the memory will get managed by two independent memory management packages which may cause unexpected memory issues How to Redefine Memory Functions To redefine memory functions use the following procedure 1 Include the i_malloc h header file in your code This header file contains all declarations required for replacing the memory allocation functions The header file also describes how memory allocation can be replaced in those Intel libraries that support this feature 2 Redefine values of pointers i_ malloc i free i_calloc i_realloc prior to the first call to MKL functions Example 6 4 Redefining Memory Functions include i_malloc h i_malloc my_ malloc i_calloc my_calloc i_realloc my realloc i_free my free Now you may call Intel MKL functions 6 19 Language specific Usage Options The Intel Math Kernel Library Intel MKL provides
120. script Input file for pre determined problem for the runme_xeon 4 script Result of the runme_itanium script execution Result of the runme_xeon32 script execution Result of the runme_xeon 64 script execution Simple help file Extended help file runme_itanium runme_xeon32 runme_xeon64 LINPACK and MP LINPACK Benchmarks 1 1 To run the software for other problem sizes see the extended help included with the program Extended help can be viewed by running the program executable with the e option xlinpack_itanium e xlinpack_xeon32 e xlinpack_xeon6 4 e The pre defined data input files Lininput_itanium lininput_xeon32 and lininput_xeon 4 are provided merely as examples Different systems have different number of processors or amount of memory and thus require new input files The extended help can be used for insight into proper ways to change the sample input files Each input file requires at least the following amount of memory lininput_itanium 16GB lininput_xeon32 2 GB lininput_xeon64 16 GB If the system has less memory than the above sample data input requires you may need to edit or create your own data input files as explained in the extended help Each sample script in particular uses the OMP_NUM_THREADS environment variable to set the number of processors it is targeting To optimize performance on a different number of physical processors change that line appropriately If you run t
121. t the number of threads to one by any of the available means see Techniques to Set the Number of Threads This is more problematic because setting of the OMP_NUM_THREADS environment variable affects both the compiler s threading library and libiomp libguide In this case choose the threading library that matches the layered Intel MKL with the OpenMP compiler you employ see Linking Examples on how to do this If this is not possible use Intel MKL in the sequential mode To do this you should link with the appropriate threading library libmkl_sequential aor libmk1l_ sequential so see High level Directory Structure The threading software will see multiple processors on the system even though each processor has a separate MPI process running on it In this case one of the solutions is to set the number of threads to one by any of the available means see Techniques to Set the Number of Threads Section Intel Optimized MP LINPACK Benchmark for Clusters discusses another solution for a Hybrid OpenMP MPI mode See also Using Additional Threading Control Linking with Compiler Support RTLs Setting the Number of Threads Using an OpenMP Environment Variable You can set the number of threads using the environment variable OMP_NUM_THREADS To change the number of threads use the appropriate command in the command shell in which the program is going to run for example 6 4 Managing Performance a
122. t all Intel MKL functions work correctly during simultaneous execution by multiple threads In particular any chunk of threaded Intel MKL code provides access for multiple threads to the same shared data while permitting only one thread at any given time to access a shared piece of data Therefore you can call Intel MKL from multiple threads and not worry about the function instances interfering with each other The library uses OpenMP threading software so you can use the environment variable OMP_NUM_THREADS to specify the number of threads or the equivalent OpenMP run time function calls Intel MKL also offers variables that are independent of OpenMP such as MKL_NUM_THREADS and equivalent Intel MKL functions for thread management The Intel MKL variables are always inspected first then the OpenMP variables are examined and if neither are used the OpenMP software chooses the default number of threads Starting with Intel MKL 10 0 the OpenMP software determines the default number of threads For Intel OpenMP libraries the default number of threads is equal to the number of logical processors in your system To achieve higher performance set the number of threads to the number of real processors or physical cores as Summarized in Techniques to Set the Number of Threads See also Setting the Number of Threads Using an OpenMP Environment Variable Changing the Number of Threads at Run Time Using Additional Threading Cont
123. t in HPL 2 0 src pgesv HPL pdgesvkK2 c HPL 2 0 code modified to do ASYOUGO and ENDEARLY modifications src pgesv HPL_pdgesv0 c HPL 2 0 code modified to do ASYOUGO ASYOUGO2 and ENDEARLY modifications 1 1 Intel Math Kernel Library User s Guide Table 11 2 benchmarks mp_linpack testin Make i Make e Make i HPL da Next six bin_in bin_in dynami bin_in bin_in dynami bin_in bin_in dynami Next six bin_in xhpl_h bin in xhpl h bin_in xhpl_ bin_in xhpl_ bin_in xhpl bin in xhpl_h 11 6 g ptest HPL dat a32 m64t pf tel ia32 xhpl_ia32 tel ia32 xhpl_ia32_ c tel em64t xhpl_em64t tel em64t xhpl_em64t_ c tel ipf xhpl_ipf tel ipf xhpl_ipf_ c tel ia32 ybrid_ia32 tel ia32 ybrid_ia32_ dynamic tel em64t hybrid_em64t tel em64t hybrid_em64t_dynamic tel ipf hybrid_ipf tel ipf ybrid_ipf dynamic Contents of the MP LINPACK Benchmark HPL 2 0 sample HPL dat modified New Sample architecture makefile for processors using the IA 32 architecture and Linux OS New Sample architecture makefile for processors using the Intel 64 architecture and Linux OS New Sample architecture makefile for the A 64 architecture and Linux OS A repeat of testing ptest HPL dat in the top level directory files are prebuilt executables readily available for simple performance testing New Prebuilt binary for the A 32 archit
124. terface FFTW 3 x interfaces to the Intel MKL FFTs C interface FFTW 3 x interfaces to the Intel MKL FFTs Fortran interface Fortran 95 interfaces to LAPACK and a makefile to build the library Static libraries and shared objects for the IA 32 architecture Intel Math Kernel Library Structure 3 Table 3 2 High level Directory Structure continued Directory Contents lt mkl directory gt lib 64 Static libraries and shared objects for the A 64 architecture Itanium processor family lt mkl directory gt lib em64t Static libraries and shared objects for the Intel 64 architecture lt mkl directory gt man en_US man3 Man pages for Intel MKL functions for the stand alone Intel MKL lt mkl directory gt tests Source and data files for tests lt mkl directory gt tools builder Tools for creating custom dynamically linkable libraries lt mkl1 directory gt tools environment Shell scripts to set environmental variables in the user shell lt mk1 directory gt tools plugins Eclipse IDE plug in with Intel MKL Reference Manual in WebHelp format See mk1_documentation htm for com intel mkisnelp more information lt Intel Compiler Pro directory gt Documentation for Intel MKL included in the Intel C documentation en US mkl Compiler Professional Edition or Intel Fortran Compiler E Professional Edition lt Intel Compiler Pro directory gt Man pages for Intel MKL functions in the Intel C man en US man3 Compiler Professi
125. the examples To browse the documentation open the index file in the docs directory created by the build script 7 13 7 Intel Math Kernel Library User s Guide 7 14 lt mkl directory gt examples java docs index html The Java wrappers for CBLAS VML VSL RNG and FFT establish the interface that directly corresponds to the underlying native functions so you can refer to the Intel MKL Reference Manual for their functionality and parameters Interfaces for the ESSL like functions are described in the generated documentation for the com intel mk1 ESSL class Each wrapper consists of the interface part for Java and JNI stub written in C You can find the sources in the following directory lt mkl directory gt examples java wrappers Both Java and C parts of the wrapper for CBLAS and VML demonstrate the straightforward approach which you may use to cover additional CBLAS functions The wrapper for FFT is more complicated because it needs to support the lifecycle for FFT descriptor objects To compute a single Fourier transform an application needs to call the FFT software several times with the same copy of the native FFT descriptor The wrapper provides the handler class to hold the native descriptor while the virtual machine runs J ava bytecode The wrapper for VSL RNG is similar to the one for FFT The wrapper provides the handler class to hold the native descriptor of the stream state The wrapper for the convolution and
126. the performance drop off is the grid dimensions P and Q For big problems the performance tends to fall off less from the first few steps when P and Q are roughly equal in value You can make use of a large number of parameters such as broadcast types and change them so that the final performance is determined very closely by the first few steps Using these tools will greatly assist the amount of data you can test 11 13 Intel Math Kernel Library Language Interfaces Support Table A 1 shows language interfaces that Intel Math Kernel Library Intel MKL provides for each function domain and Table A 2 lists the respective header files However Intel MKL routines can be called from other languages using mixed language programming See Mixed language Programming with Intel MKL for an example of how to call Fortran routines from C C Table A 1 Language Interfaces Support FORTRAN 77 Fortran 90 95 C C Function Domain interface interface interface Basic Linear Algebra Subprograms BLAS Yes Yes via CBLAS BLAS like extension transposition routines Yes Yes Sparse BLAS Level 1 Yes Yes via CBLAS Sparse BLAS Level 2 and 3 Yes Yes Yes LAPACK routines for solving systems of linear Yes Yes t equations LAPACK routines for solving least squares Yes Yes T problems eigenvalue and singular value problems and Sylvester s equations Auxiliary and utility LAPACK routines Yes T Parallel Basic Linear Algebra Subprograms Yes PB
127. tion 7 4 configuration file for OOC DSS PARDISO 4 4 configuring development environment 4 1 Eclipse CDT 4 2 context sensitive Help for Intel R MKL in Eclipse CDT 10 4 custom shared object 5 9 5 11 building 5 9 specifying list of functions 5 11 specifying makefile parameters 5 10 D data alignment 8 2 denormal number performance 6 17 denormal performance 6 17 development environment configuring 4 1 directory structure documentation 3 20 high level 3 1 in detail 3 8 dispatching of AVX instructions 6 13 documentation 3 20 for Intel R MKL viewing in Eclipse IDE 10 1 Eclipse CDT configuring 4 2 searching the Intel Web site 10 3 Eclipse CDT Intel R MKL Help 10 1 context sensitive 10 4 Index 1 Intel Math Kernel Library User s Guide end user license location 3 20 environment variables setting 4 1 examples code 2 2 linking general 5 7 ScaLAPACK Cluster FFT linking with 9 4 F FFT functions data alignment 6 15 FFT interface optimized radices 6 17 FFTW interface support B 1 Fortran 95 interfaces to LAPACK and BLAS 7 3 G GNU Multiple Precision Arithmetic Library B 1 H Help for Intel R MKL in Eclipse CDT 10 1 HT Technology see Hyper Threading technology hybrid version of MP LINPACK 11 4 Hyper Threading Technology configuration tip 6 15 ILP64 programming support for 3 6 instability numerical getting rid of 8 1 installation c
128. tive memory e Check correctness of function parameters e Demonstrate performance optimizations The examples use the Java Native Interface J NI developer framework to bind with Intel MKL The J NI documentation is available from http java sun com javase 6 docs technotes guides jni The Java example set includes J NI wrappers that perform the binding The wrappers do not depend on the examples and may be used in your Java applications The wrappers for CBLAS FFT VML VSL RNG and ESSL like convolution and correlation functions do not depend on each other To build the wrappers just run the examples see Running the Examples for details The makefile builds the wrapper binaries After running the makefile you can run the examples which will determine whether the wrappers were built correctly As a result of running the examples the following directories will be created in lt mk1 directory gt examples java docs include classes bin e results The directories docs include classes and bin will contain the wrapper binaries and documentation the directory results will contain the testing results For a Java programmer the wrappers are the following J ava classes com intel mk1l CBLAS com intel mk1 DFTI com intel mk1 ESSL com intel mk1l VML com intel mk1 VSL Documentation for the particular wrapper and example classes will be generated from the Java sources while building and running
129. to their OpenMP equivalents but take precedence over them in the meaning that the MKL specific threading controls are inspected first By using these controls along with OpenMP variables you can thread the part of the application that does not call Intel MKL and the library independently from each other These controls enable you to specify the number of threads for Intel MKL independently of the OpenMP settings Although Intel MKL may actually use a different number of threads from the number suggested the controls will also enable you to instruct the library to try using the suggested number when the number used in the calling application is unavailable 8 NOTE NOTE Intel MKL does not always have a choice on the number of threads for certain reasons such as system resources Use of the Intel MKL threading controls in your application is optional If you do not use them the library will mainly behave the same way as Intel MKL 9 1 in what relates to threading with the possible exception of a different default number of threads Managing Performance and Memory 6 Section Number of User Threads in the Fourier Transform Functions chapter of the Intel MKL Reference Manual shows how the Intel MKL threading controls help to set the number of threads for the FFT computation Table 6 2 lists the Intel MKL environment variables for threading control their equivalent functions and OMP counterparts Table 6 2 Environment Variables f
130. tran specific linking examples and describes the supported MPI Getting Assistance for Programming in the Eclipse IDE Discusses Intel MKL features that assist you while programming in the Eclipse IDE LINPACK and MP LINPACK Benchmarks Describes the Intel Optimized LINPACK Benchmark for Linux OS and Intel Optimized MP LINPACK Benchmark for Clusters Intel Math Kernel Library Language Interfaces Support Summarizes information on language interfaces that Intel MKL provides for each function domain including the respective header files Support for Third Party Interfaces Describes some interfaces that Intel MKL supports The document also includes an Index Notational Conventions The following term is used to refer to the operating system Linux OS This term refers to information that is valid on all supported Linux operating systems The following notation is used in reference to Intel MKL directories lt mk1_directory gt The main directory where Intel MKL is installed Replace this placeholder with the specific pathname in the configuring linking and building instructions For more information see Getting Started lt Intel Compiler Pro directory gt The installation directory for the Intel C Compiler Professional Edition or Intel Fortran Compiler Professional Edition For more information see Getting Started Table 1 1 lists the other notational conventions Table 1 1 Notatio
131. tween different versions of JDK The examples and wrappers include workarounds for these problems Look at the source code in the examples and wrappers for comments that describe the workarounds Coding Tips This chapter discusses programming with the Intel Math Kernel Library Intel MKL to provide coding tips that meet certain specific needs such as numerical stability Similarly Chapter 7 focuses on general language specific programming options and Chapter 6 provides tips relevant to performance and memory management Aligning Data for Numerical Stability If linear algebra routines LAPACK BLAS are applied to input data that are bit for bit identical but the arrays are aligned differently or the computations are performed either on different platforms or with different numbers of threads the output may not be bit for bit identical though they will deviate within the appropriate error bounds The Intel MKL version may also affect numerical stability of the output as the routines may be implemented differently in different versions With a given Intel MKL version the outputs will be bit for bit identical provided all the following conditions are met e the outputs are obtained on the same platform e the inputs are bit for bit identical e the input arrays are aligned identically at 16 byte boundaries e Intel MKL is run in the sequential mode Unlike the first two conditions which you control the alignment of arrays by d
132. uential library Computational layer libmkl_ cdft_core a Cluster version of FFTs libmkl_core a Kernel library for the A 64 architecture libmkl_scalapack_ilp64 a ScaLAPACK routine library supporting the ILP64 interface libmkl_scalapack_lp64 a ScaLAPACK routine library supporting the LP64 interface libmkl_solver_ilp64 a Deprecated Empty library for backward compatibility libmkl_solver_ilp64_ Deprecated Empty library for backward compatibility sequential a libmkl_ solver _lp64 a Deprecated Empty library for backward compatibility libmkl_ solver _1p64_ Deprecated Empty library for backward compatibility sequential a 3 17 3 Intel Math Kernel Library User s Guide Table 3 8 3 18 File RTL Detailed Structure of the I A 64 Architecture Directory lib 64 continued libguide a libiomp5 a libmkl_blacs_ilp64 a libmk libmk sgimp libmkl_blacs_ intelmpi_ilp6 4 a libmkl_blacs_ intelmpi_1p64 a libmkl_blacs_ intelmpi20_ ilp 4 a libmkl_blacs_ intelmpi20 lp 64 a libmkl_blacs_1p64 a libmkl_blacs_ openmpi_ilp64 a libmkl_blacs_ openmpi_lp64 a l_blacs_ sgimpt_ilp64 a l_blacs_ t_lp64 a Contents Legacy OpenMP run time library for static linking Compatibility OpenMP run time library for static linking ILP64 version of BLACS routines supporting the following MPI CH versions Myricom MPICH version 1 2 5 10 e ANL MPICH version 1 2 5 2 ILP64 version of BL
133. uential mode of the library layer is compiled for different environments threaded or sequential and compilers from Intel GNU and so on Intel Math Kernel Library Structure 3 Table 3 3 Intel MKL Layers continued Layer Description Computational Layer Heart of Intel MKL This layer has only one library for each combination of architecture and supported OS The Computational layer accommodates multiple architectures through identification of architecture features and chooses the appropriate binary code at run time Compiler Support Run time Intel MKL provides compiler support RTLs only for Intel compilers Libraries RTL compatibility OpenMP run time library 1ibiomp and legacy OpenMP run time library 1ibguide To thread using third party threading compilers use libraries in the Threading layer or an appropriate compatibility library for more information see Linking with Threading Libraries Sequential Mode of the Library You can use Intel MKL in a sequential non threaded mode In this mode Intel MKL runs unthreaded code However it is thread safe which means that you can use it in a parallel region in your OpenMP code The sequential mode requires no compatibility OpenMP or legacy OpenMP run time library and does not respond to the environment variable OMP_NUM_ THREADS or its Intel MKL equivalents You should use the library in the sequential mode only if you have a particular reason not to use
134. ur cluster 1 Install HPL and make sure HPL is functional on all the nodes 2 You may run nodeperf c included in the distribution to see the performance of DGEMM on all the nodes Compile nodeperf c with your MPI and Intel MKL For example 11 9 1 1 Intel Math Kernel Library User s Guide 11 10 mpiicc 03 nodeperf c LSMKLPATH SMKLPATH libmkl intel l1p64 a W1l start group MKLPATH libmkl sequential a SMKLPATH libmkl_core a Wl end group lpthread Launching nodeperf c on all the nodes is especially helpful in a very large cluster nodeperf enables quick identification of the potential problem spot without numerous small MP LINPACK runs around the cluster in search of the bad node It goes through all the nodes one at a time and reports the performance of DGEMM followed by some host identifier Therefore the higher the DGEMM performance the faster that node was performing Edit HPL dat to fit your cluster needs Read through the HPL documentation for ideas on this However you should use at least 4 nodes Make an HPL run using compile options such as ASYOUGO or ASYOUGO2 or ENDEARLY to aid in your search These options enable you to gain insight into the performance sooner than HPL would normally give this insight When doing so follow these recommendations Use MP LINPACK which is a patched version of HPL to save time in the search All performance intrusive features are compile optional in MP LI
135. ur threads MKL VML 2 VML should try two threads The setting affects no other part of Intel E MKL 6 12 Be aware that the domain specific settings take precedence over the overall ones For example the MKL _BLAS 4 value of MKL_DOMAIN_NUM_THREADS suggests trying four threads for BLAS regardless of later setting MKL_NUM_THREADS and a function call mkl_domain_set_num threads 4 MKL BLAS suggests the same regardless of later calls to mkl_set_num_threads However a function call with input MKL_ ALL such as mkl_domain set_num threads 4 MKL ALL is equivalent to mkl_set_num_threads 4 and thus it will be overwritten by later calls to mkl1_set_num_ threads Similarly the environment setting of MKL_DOMAIN_NUM_ THREADS with MKL_ ALL 4 will be overwritten with MKL_ NUM THREADS 2 Whereas the MKL_DOMAIN NUM THREADS environment variable enables you set several variables at once for example MKL_BLAS 4 MKL_ FFT 2 the corresponding function does not take string syntax So to do the same with the function calls you may need to make several calls which in this example are as follows mkl_domain_set_num threads 4 MKL BLAS mkl_domain_set_num_threads 2 MKL_ FFT Managing Performance and Memory 6 Setting the Environment Variables for Threading Control To set the environment variables used for threading control in the command shell in which the program is going to run enter the export or set
136. urrently use the GMP library you need to modify INCLUDE statements in your programs to mk1l_gmp h FFTW Interface Support Intel MKL offers two collections of wrappers for the FFTW interface www fftw org The wrappers are the superstructure of FFTW to be used for calling the Intel MKL Fourier transform functions These collections correspond to the FFTW versions 2 x and 3 x and the Intel MKL versions 7 0 and later These wrappers enable using Intel MKL Fourier transforms to improve the performance of programs that use FFTW without changing the program source code See the FFTW Interface to Intel Math Kernel Library appendix in the Intel MKL Reference Manual for details on the use of the wrappers B 1 Index A Advanced Vector Extensions dispatching the instructions 6 13 affinity mask 6 16 aligning data 8 2 benchmark 11 1 BLAS calling routines from C 7 5 Fortran 95 interfaces to 7 3 C C calling LAPACK BLAS CBLAS from 7 5 calling BLAS functions in C 7 7 complex BLAS Level 1 function from C 7 8 complex BLAS Level 1 function from C 7 9 Fortran style routines from C 7 5 CBLAS 7 6 CBLAS code example 7 10 Cluster FFT linking with 9 1 cluster software 9 1 linking examples 9 4 linking syntax 9 1 coding data alignment 8 1 mixed language calls 7 7 techniques to improve performance 6 14 compatibility OpenMP run time library 3 5 compiler support 2 2 compiler dependent func
137. use Reason Intel MKL is based on the OpenMP threading By default the OpenMP software sets the number of threads that Intel MKL uses If you need a different number you have to set it yourself using one of the available mechanisms For more information see Using the Intel MKL Parallelism Linking model Decide which linking model is appropriate for linking your application with Intel MKL libraries Static Dynamic Reason The link line syntax and libraries for static and dynamic linking are different For the list of link libraries for static and dynamic models linking examples and other relevant topics like how to save disk space by creating a custom dynamic library see Linking Your Application with the Intel Math Kernel Library MPI used Decide what MPI you will use with the Intel MKL cluster software You are strongly encouraged to use Intel MPI 3 x Reason To link your application with ScaLAPACK and or Cluster FFT the libraries corresponding to your particular MPI should be listed on the link line see Working with the Intel Math Kernel Library Cluster Software 2 5 Intel Math Kernel Library Structure The chapter discusses the structure of the Intel Math Kernel Library Intel MKL including the Intel MKL directory structure architecture specific implementations supported programming interfaces and more Starting with version 10 0 Intel MKL uses a layered model to streamline the libra
138. ut notice Do not finalize a design with this information The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications Current characterized errata are available on request Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order Copies of documents which have an order number and are referenced in this document or other Intel literature may be obtained by calling 1 800 548 4725 or by visiting Intel s Web Site Intel processor numbers are not a measure of performance Processor numbers differentiate features within each processor family not across different processor families See http www intel com products processor_number for details This document contains information on products in the design phase of development BunnyPeople Celeron Celeron Inside Centrino Centrino Atom Centrino Atom Inside Centrino Inside Centrino logo Core Inside FlashFile i960 InstantIP Intel Intel logo Intel386 Intel486 IntelDX2 IntelDX4 IntelSX2 Intel Atom Intel Atom Inside Intel Core Intel Inside Intel Inside logo Intel Leap ahead Intel Leap ahead logo Intel NetBurst Intel NetMerge Intel NetStructure Intel SingleDriver Intel SpeedStep Intel StrataFlash Intel Viiv Intel vPro Intel XScale IPLink Itanium Itanium Inside MCS MMX Oplus OverDrive PDChar
139. value suggested For example if you attempt to do a size one matrix matrix multiply across eight threads the library may instead choose to use only one thread because it is impractical to use eight threads in this event Note also that if Intel MKL is called in a parallel region it will use only one thread by default If you want the library to use nested parallelism and the thread within a parallel region is compiled with the same OpenMP compiler as Intel MKL is using you may experiment with setting MKL _ DYNAMIC to FALSE and manually increasing the number of threads In general set MKL_ DYNAMIC to FALSE only under circumstances that Intel MKL is unable to detect for example to use nested parallelism where the library is already called from a parallel section MKL_DOMAIN_NUM_THREADS The MKL_ DOMAIN NUM THREADS environment variable suggests the number of threads for a particular function domain MKL DOMAIN NUM THREADS accepts a string value lt MKL env string gt which must have the following format lt MKL env string gt lt MKL domain env string gt lt delimiter gt lt MKL domain env string gt lt delimiter gt lt space symbol gt lt space symbol gt lt comma symbol gt lt semicolon symbol gt lt colon symbol gt lt space symbol gt lt MKL domain env string gt lt MKL domain env name gt lt uses gt lt number of threads gt lt MKL domain env name gt MKL_ ALL MKL BLAS MK
140. vided Choose the I LP64 interface if your application uses Intel MKL for calculations with huge data arrays or the library may be used so in future Intel MKL provides the same include directory for the LP64 and LP64 interfaces Compiling for LP64 ILP64 Table 3 4 shows how to compile for the ILP64 and LP64 interfaces Table 3 4 Compiling for the ILP64 and LP64 I nterfaces Fortran Compiling for ILP64 ifort i8 I lt mkl drectory gt include Compiling for LP64 ifort I lt mkl drectory gt include C or C Compiling for ILP64 icc DMKL ILP64 I lt mkl directory gt include Compiling for LP64 icc I lt mkl directory gt include A CAUTION Linking of the application compiled with the i8 or DMKL_ILP64 option to the LP64 libraries may result in unpredictable consequences and erroneous output Coding for ILP64 You do not need to change existing code if you are not using the ILP64 interface 3 6 Intel Math Kernel Library Structure 3 To migrate to ILP64 or write new code for ILP64 use appropriate types for parameters of the Intel MKL functions and subroutines see Table 3 5 Table 3 5 Integer Types Fortran C or C 32 bit integers INTEGER 4 or int INTEGER KIND 4 Universal integers for LP64 LP64 INTEGER MKL_INT e 64 bit for 1LP64 without specifying KIND 32 bit otherwise Universal integers for ILP64 LP64 INTEGER 8 or MKL INT64 e 64 bit integers INTEGER KIND 8 7 FFT interface integers for LP64 LP6
141. y linked against Intel MPI 3 2 New Example of an MP LINPACK benchmark input file for a hybrid binary and the IA 32 architecture New Sample run script for the Intel 64 architecture and a pure MPI binary statically linked against Intel MPI 3 2 New Sample run script for the Intel 64 architecture and a pure MPI binary dynamically linked against Intel MPI 3 2 New Example of an MP LINPACK benchmark input file for a pure MPI binary and the Intel 64 architecture New Sample run script for the Intel 64 architecture and a hybrid binary statically linked against Intel MPI 3 2 New Sample run script for the Intel 64 architecture and a hybrid binary dynamically linked against Intel MPI 3 2 1 1 Intel Math Kernel Library User s Guide Table 11 2 benchmarks mp_linpack bin_in dat bin_in bin_in dynamic bin_in bin_in tel em 4t HPL_ hybrid tel ipf runme_ia64 tel ipf runme_ia64_ tel ipf HPL_ serial dat tel ipf runme_hybrid_ia64 bin_in tel ipf runme_hybrid_ia64 dynamic bin_in tel ipf HPL_ hybrid dat nodeperf c Contents of the MP LINPACK Benchmark New Example of an MP LINPACK benchmark input file for a hybrid binary and the Intel 64 architecture New Sample run script for the A 64 architecture and a pure MPI binary statically linked against Intel MPI 3 2 New Sample run script for the A 64 architecture and a pure MPI binary dynamically linke
Download Pdf Manuals
Related Search
Related Contents
RDR50/54 - K&M Burkhard Reuter 2011 TECHNICAL REGULATIONS UMU Scan para Windows Mobile Manual de Usuario Manual de instalación y mantenimiento Flujostato Serie PFM5 Fostex X-12 User's Manual Configurações do PowerDVD ダウンロード Manuale Posiplan Orderman Developer Template V1.6 Copyright © All rights reserved.
Failed to retrieve file