Home

Intel(R) Math Kernel Library for Linux* OS User's Guide

1. ceeee 105 Dynamic Libraries in the IA 32 Architecture Directory lib ia32 106 Detailed Structure of the Intel R 64 Architecture DirectorieS ccccceeeeeeeeees 108 Static Libraries in the lib intel64 DireCtOry ccccecceceeeeee esse eee eeae eee enees 108 Dynamic Libraries in the Intel R 64 Architecture Directory lib intel64 110 Index viii Introducing the Intel R Math Kernel Library The Intel R Math Kernel Library Intel R MKL enables improving performance of scientific engineering and financial software that solves large computational problems Intel MKL provides a set of linear algebra routines fast Fourier transforms as well as vectorized math and random number generation functions all optimized for the latest Intel R processors including processors with multiple cores see the Inte R MKL Release Notes for the full list of supported processors Intel MKL also performs well on non Intel processors Intel MKL provides the following major functionality e Linear algebra implemented in LAPACK solvers and eigensolvers plus level 1 2 and 3 BLAS offering the vector vector matrix and matrix matrix operations needed for complex mathematical software If you prefer the FORTRAN 90 95 programming language you can call LAPACK driver and computational subroutines through specially designed interfaces with reduced numbers of arguments C interface to LAPACK routines is also available e Sca
2. The last feature is native to the Eclipse IDE CDT See the Code Assist description in Eclipse IDE Help for details Viewing the Intel R Math Kernel Library Reference Manual in the Eclipse IDE To 1 2 3 4 view the Reference Manual in Eclipse Select Help gt Help Contents from the menu In the Help tab under All Topics click Intel R Math Kernel Library Help In the Help tree that expands click Intel Math Kernel Library Reference Manual The Intel MKL Help Index is also available in Eclipse and the Reference Manual is included in the Eclipse Help search Sy ae IEA welcome g All Topics pe B Java Development User Guide Platform Plug in Developer Guide E IDT Plug in Developer Guide E PDE Guide 8 APT in Ecipse amp Intel R Math Kernel Library Help O Intel Math Kernel Library Reference Manual Lega Information OY Overview U BLAS and Sparse BLAS Routines 4 LAPACK Routines Linear Equations l EUS LAPACK Routines Least Squares and Eigenvalue Problems C4 LAPACK Auxiliary and Utility Routines 04 ScaLAPACK Routines EU ScaLAPACK Auxiliary and Utility Routines U Sparse Solver Routines D Vector Mathematical Functions G4 Statistical Functions E O Fourier Transform Functions 04 Interval Linear Solvers amp OY Partial Differential Equations Support 04 Optimization Solvers Routines 04 Support Functions O BLACS Routines S0 Appendices O Bibliography D doss
3. export PATH JAVA_HOME bin PATH You may also need to clear the JDK_HOME environment variable if it is assigned a value unset JDK_HOME 72 Language specific Usage Options 6 To start the examples use the makefile found in the Intel MKL Java examples directory make soia32 sointel64 libia32 libintel64 function compiler If you type the make command and omit the target for example soia32 the makefile prints the help info which explains the targets and parameters For the examples list see the examples 1st file in the Java examples directory Known Limitations of the Java Examples This section explains limitations of Java examples Functionality Some Intel R Math Kernel Library Intel R MKL functions may fail to work if called from the Java environment by using a wrapper like those provided with the Intel MKL Java examples Only those specific CBLAS FFT VML VSL RNG and the convolution correlation functions listed in the Intel MKL Java Examples section were tested with the Java environment So you may use the Java wrappers for these CBLAS FFT VML VSL RNG and convolution correlation functions in your Java applications Performance The Intel MKL functions must work faster than similar functions written in pure Java However the main goal of these wrappers is to provide code examples not maximum performance So an Intel MKL function called from a Java application will prob
4. The Linking Advisor requests information about your system and on how you intend to use Intel MKL link dynamically or statically use threaded or sequential mode etc The tool automatically generates the appropriate link line for your application See Also e Linking Your Application with the Intel R Math Kernel Library e Examples for Linking with ScaLAPACK and Cluster FFT What You Need to Know Before You Begin Using the Intel R Math Kernel Library Target platform Identify the architecture of your target machine e JA 32 or compatible e Intel R 64 or compatible Reason Because Intel MKL libraries are located in directories corresponding to your particular architecture see Architecture Support you should provide proper paths on your link lines see Linking Examples To configure your development environment for the use with Intel MKL set your environment variables using the script corresponding to your architecture see Setting Environment Variables for details Mathematical Identify all Intel MKL function domains that you require problem e BLAS e Sparse BLAS e LAPACK e PBLAS e ScaLAPACK e Sparse Solver routines e Vector Mathematical Library functions VML e Vector Statistical Library functions e Fourier Transform functions FFT 21 2 Intel Math Kernel Library for Linux OS User s Guide Programming language Range of integer data Threading model Number of threads Linking model 2
5. RTL third party threading compilers use libraries in the Threading layer or an appropriate compatibility library See Also e Using the ILP64 Interface vs LP64 Interface e Linking Your Application with the Intel R Math Kernel Library e Linking with Threading Libraries Accessing the Intel R Math Kernel Library Documentation Contents of the Documentation Directories Most of Intel R Math Kernel Library Intel R MKL documentation is installed at lt Composer XE directory gt Documentation lt locale gt mk1 For example the documentation in English is installed at lt Composer XE directory gt Documentation en_US mk1 However some Intel MKL related documents are installed one or two levels up The following table lists MKL related documentation 28 Structure of the Intel R Math Kernel Library 3 File name Comment Files in lt Composer XE directory gt Documentation lt locale gt clicense or Common end user license for the Intel R C Composer XE 2011 or Intel R lt locale gt flicense Fortran Composer XE 2011 respectively mklsupport txt Information on package number for customer support reference Contents of lt Composer XE directory gt Documentation lt locale gt mk1l redist txt List of redistributable files mkl_documentation htm Overview and links for the Intel MKL documentation mk1_manual index htm Intel MKL Reference Manual in an uncompressed HTML format Release_Notes htm Intel MKL Release Notes mkl_u
6. e A function call takes precedence over any environment variables The exception which is a consequence of the previous rule is the OpenMP subroutine omp_set_num_threads which does not have precedence over Intel MKL environment variables such as MKL_NUM_THREADS See Using Additional Threading Control for more details e You cannot change run time behavior in the course of the run using the environment variables because they are read only once at the first call to Intel MKL Setting the Number of Threads Using an OpenMP Environment Variable You can set the number of threads using the environment variable OMP_NUM_THREADS To change the number of threads use the appropriate command in the command shell in which the program is going to run for example e For the bash shell enter export OMP_NUM_THREADS lt number of threads to use gt e For the csh or tcsh shell enter set OMP_NUM_THREADS lt number of threads to use gt See Also e Using Additional Threading Control Changing the Number of Threads at Run Time You cannot change the number of threads during run time using environment variables However you can call OpenMP API functions from your program to change the number of threads during run time The following sample code shows how to change the number of threads during run time using the omp_set_num_threads routine See also Techniques to Set the Number of Threads 49 5 Intel Math Kernel Library for Li
7. libmk1l_sequential a or libmkl_sequential so see High level Directory Structure There are multiple programs The threading software will see multiple processors on the system even running on a multiple cpu though each processor has a separate MPI process running on it In this system for example a case one of the solutions is to set the number of threads to one by any of parallelized program that runs the available means see Techniques to Set the Number of Threads Section using MPI for communication in Intel R Optimized MP LINPACK Benchmark for Clusters discusses another which each processor is treated solution for a Hybrid OQpenMP MPI mode as a node See Also e Using Additional Threading Control e Linking with Compiler Support RTLs 48 Managing Performance and Memory 5 Techniques to Set the Number of Threads Use one of the following techniques to change the number of threads to use in the Intel R Math Kernel Library Intel R MKL e Set one of the OpenMP or Intel MKL environment variables e OMP_NU HREADS e MKL_NU HREADS e MKL_DOMAIN_NUM_THREADS e Call one of the OpenMP or Intel MKL functions e omp_set_num_threads e mkl_set_num_threads e mkl_domain_set_num_threads When choosing the appropriate technique take into account the following rules e The Intel MKL threading controls take precedence over the OpenMP controls because they are inspected first
8. Calling LAPACK BLAS and CBLAS Routines from C C Language Environments Not all Intel R Math Kernel Library Intel R MKL function domains support both C and Fortran environments To use Intel MKL Fortran style functions in C C environments you should observe certain conventions which are discussed for LAPACK and BLAS in the subsections below A CAUTION Avoid calling BLAS 95 LAPACK 95 from C C Such calls require skills in manipulating the descriptor of a deferred shape array which is the Fortran 90 type Moreover BLAS95 LAPACK95 routines contain links to a Fortran RTL LAPACK and BLAS Because LAPACK and BLAS routines are Fortran style when calling them from C language programs follow the Fortran style calling conventions e Pass variables by address not by value Function calls in Example Calling a Complex BLAS Level 1 Function from C and Example Using CBLAS Interface Instead of Calling BLAS Directly from C illustrate this e Store your data in Fortran style that is column major rather than row major order With row major order adopted in C the last array index changes most quickly and the first one changes most slowly when traversing the memory segment where the array is stored With Fortran style column major order the last index changes most slowly whereas the first index changes most quickly as illustrated by the figure below for a two dimensional array 1 2 3 4 0 1 2 3 1 0 2 1 3 2 A Column ma
9. Launching nodeperf c on all the nodes is especially helpful in a very large cluster nodeperf enables quick identification of the potential problem spot without numerous small MP LINPACK runs around the cluster in search of the bad node It goes through all the nodes one at a time and reports the performance of DGEMM followed by some host identifier Therefore the higher the DGEMM performance the faster that node was performing 3 Edit HPL dat to fit your cluster needs Read through the HPL documentation for ideas on this Note however that you should use at least 4 nodes 4 Make an HPL run using compile options such as ASYOUGO ASYOUGO2 or ENDEARLY to aid in your search These options enable you to gain insight into the performance sooner than HPL would normally give this insight When doing so follow these recommendations e Use MP LINPACK which is a patched version of HPL to save time in the search All performance intrusive features are compile optional in MP LINPACK That is if you do not use the new options to reduce search time these features are disabled The primary purpose of the additions is to assist you in finding solutions HPL requires a long time to search for many different parameters In MP LINPACK the goal is to get the best possible number Given that the input is not fixed there is a large parameter space you must search over An exhaustive search of all possible inputs is improbably lar
10. To achieve higher performance set the number of threads to the number of real processors or physical cores as summarized in Techniques to Set the Number of Threads See Also e Managing Multi core Performance Threaded Functions and Problems The following Intel R Math Kernel Library Intel R MKL function domains are threaded e Direct sparse solver e LAPACK For the list of threaded routines see Threaded LAPACK Routines e Levelli and Level2 BLAS For the list of threaded routines see Threaded BLAS Leveli and Level2 Routines e All Level 3 BLAS and all Sparse BLAS routines except Level 2 Sparse Triangular solvers e All mathematical VML functions e FFT For the list of FFT transforms that can be threaded see Threaded FFT Problems Threaded LAPACK Routines In the following list stands for a precision prefix of each flavor of the respective routine and may have the value of s d c Or z The following LAPACK routines are threaded e Linear equations computational routines e Factorization getrf gbtrf potrf pptrf sytrf hetrf sptrf hptrf e Solving dttrsb gbtrs gttrs pptrs pbtrs pttrs sytrs sptrs hptrs tptrs tbtrs e Orthogonal factorization computational routines geqrf ormgqr unmqr ormlgq unmlg Formal unmgql ormrg unmrg e Singular Value Decomposition computational routines gebrd bdsqr e Symmetric Eigenvalue Problems computational routines sytrd hetrd spt
11. associated libraries and associated development tools may or may not optimize to the same degree for non Intel microprocessors for optimizations that are not unique to Intel microprocessors These optimizations include Intel Streaming SIMD Extensions 2 Intel SSE2 Intel Streaming SIMD Extensions 3 Intel SSE3 and Supplemental Streaming SIMD Extensions 3 Intel SSSE3 instruction sets and other optimizations Intel does not guarantee the availability functionality or effectiveness of any optimization on microprocessors not manufactured by Intel Microprocessor dependent optimizations in this product are intended for use with Intel microprocessors While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non Intel microprocessors Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements We hope to win your business by striving to offer the best performance of any compiler or library please let us know if you find we do not Notice revision 20101101 Architecture Support Intel R Math Kernel Library Intel R MKL for Linux OS provides two architecture specific implementations The following table lists the supported architectures and directories where each architecture specific implementation is located Architecture Location IA 32 or compatible lt mkl directory gt lib ia32 Intel R 64 or compatible
12. lt mk1 directory gt lib intel64 25 3 Intel Math Kernel Library for Linux OS User s Guide See Also e High level Directory Structure e Detailed Structure of the IA 32 Architecture Directory lib ia32 e Detailed Structure of the Intel R 64 Architecture Directory lib intel64 High level Directory Structure Directory lt mkl directory gt Contents Installation directory of the Intel R Math Kernel Library Intel R MKL By default opt intel composerxe 2011 y xxx mk1 where y is the release update number and xxx is the package number Subdirectories of lt mk1 directory gt bin bin ia32 bin intel64 benchmarks linpack benchmarks mp_linpack examples include include ia32 include intel64 1p64 include intel64 ilp64 interfaces blas95 interfaces fftw2x_cdft interfaces fftw3x_cdft interfaces fftw2xc interfaces fftw2xf interfaces fftw3xc interfaces fftw3xf 26 Scripts to set environmental variables in the user shell Shell scripts for the IA 32 architecture Shell scripts for the Intel R 64 architecture Shared memory SMP version of the LINPACK benchmark Message passing interface MPI version of the LINPACK benchmark Examples directory Each subdirectory has source and data files INCLUDE files for the library routines as well as for tests and examples Fortran 95 mod files for the IA 32 architecture and Intel R Fortran compiler Fortran 95 mod files for the Intel R
13. processor using the Intel R 64 architecture VML VSL for processors based on the Intel R Core TM microarchitecture VML VSL for 45nm Hi k Intel R Core TM 2 and Intel Xeon R processor families VML VSL for the Intel R Core TM i7 processors VML VSL optimized for the Intel R Advanced Vector Extensions Intel R AVX Directory Structure in Detail C File libmkl_scalapack_lp64 so libmkl_scalapack_ilp64 so libmkl_cdft_core so Run time Libraries RTL libmkl_intelmpi_lp64 so libmkl_intelmpi_ilp64 so locale en_US mkl_msg cat locale ja_JP mkl_msg cat Contents ScaLAPACK routine library supporting the LP64 interface ScaLAPACK routine library supporting the ILP64 interface Cluster version of FFT functions LP64 version of BLACS routines supporting Intel MPI and MPICH2 ILP64 version of BLACS routines supporting Intel MPI and MPICH2 Catalog of Intel R Math Kernel Library Intel R MKL messages in English Catalog of Intel MKL messages in Japanese Available only if the Intel R MKL package provides Japanese localization Please see the Release Notes for this information C Intel Math Kernel Library for Linux OS User s Guide 112 Index A affinity mask 57 aligning data 75 architecture support 25 BLAS calling routines from C 65 Fortran 95 interface to 63 threaded routines 46 C C interface to LAPACK use of 65 C calling LAPACK BLAS CBLAS from 65 C C Intel R MKL complex types 66 calling BLAS func
14. 2 or later Reason To link your application with ScaLAPACK and or Cluster FFT the libraries corresponding to your particular MPI should be listed on the link line see Working with the Cluster Software 23 2 Intel Math Kernel Library for Linux OS User s Guide 24 Structure of the Intel R Math Kernel Library Optimization Notice Intel compilers associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non Intel microprocessors for example SIMD instruction sets but do not optimize equally for non Intel microprocessors In addition certain compiler options for Intel compilers including some that are not specific to Intel micro architecture are reserved for Intel microprocessors For a detailed description of Intel compiler options including the instruction sets and specific microprocessors they implicate please refer to the Intel Compiler User and Reference Guides under Compiler Options Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel compatible microprocessors depending on the options you select your code and other factors you likely will get extra performance on Intel microprocessors Intel compilers
15. 64 architecture Intel Fortran compiler and LP64 interface Fortran 95 mod files for the Intel R 64 architecture Intel Fortran compiler and ILP64 interface Fortran 95 interfaces to BLAS and a makefile to build the library MPI FFTW 2 x interfaces to the Intel MKL Cluster FFTs MPI FFTW 3 x interfaces to the Intel MKL Cluster FFTs FFTW 2 x interfaces to the Intel MKL FFTs C interface FFTW 2 x interfaces to the Intel MKL FFTs Fortran interface FFTW 3 x interfaces to the Intel MKL FFTs C interface FFTW 3 x interfaces to the Intel MKL FFTs Fortran interface Structure of the Intel R Math Kernel Library 3 Directory Contents interfaces lapack95 Fortran 95 interfaces to LAPACK and a makefile to build the library lib ia32 Static libraries and shared objects for the IA 32 architecture lib intel64 Static libraries and shared objects for the Intel R 64 architecture tests Source and data files for tests tools builder Tools for creating custom dynamically linkable libraries tools plugins Eclipse IDE plug in with Intel MKL Reference Manual in WebHelp format com intel mkl help See mk1_documentation htm for more information Subdirectories of lt Composer XE directory gt By default opt intel composerxe 2011 y xxx Documentation en_US mk1l Intel MKL documentation man en_US man3 Man pages for Intel MKL functions No directory for man pages is created in locales other than en_US even if a directory for the localized documentat
16. 95 wrappers for LAPACK LAPACK95 supporting ILP64 interface Interfaces for FFTW version 2 x C interface for Intel R compilers to call Intel MKL FFTs Interfaces for FFTW version 2 x C interface for GNU compilers to call Intel MKL FFTs Interfaces for FFTW version 2 x Fortran interface for Intel compilers to call Intel MKL FFTs Interfaces for FFTW version 2 x Fortran interface for GNU compiler to call Intel MKL FFTs Interfaces for FFTW version 3 x C interface for Intel compiler to call Intel MKL FFTs Interfaces for FFTW version 3 x C interface for GNU compilers to call Intel MKL FFTs Interfaces for FFTW version 3 x Fortran interface for Intel compilers to call Intel MKL FFTs Interfaces for FFTW version 3 x Fortran interface for GNU compilers to call Intel MKL FFTs Single precision interfaces for MPI FFTW version 2 x C interface to call Intel MKL cluster FFTs Language specific Usage Options 6 File name Contains libfftw2x_cdft_DOUBLE a Double precision interfaces for MPI FFTW version 2 x C interface to call Intel MKL cluster FFTs libfftw3x_cdft a Interfaces for MPI FFTW version 3 x C interface to call Intel MKL cluster FFTs libfftw3x_cdft_ilp64 a Interfaces for MPI FFTW version 3 x C interface to call Intel MKL cluster FFTs supporting the ILP64 interface Modules in architecture and interface specific subdirectories of the Intel MKL include directory blas95 mod Fortran 95 i
17. CBLAS Interface Instead of Calling BLAS Directly from C illustrates the use of the CBLAS interface C Interface to LAPACK Instead of calling LAPACK routines from a C language program you can use the C interface to LAPACK provided by Intel MKL The C interface to LAPACK is a C style interface to the LAPACK routines This interface supports matrices in row major and column major order which you can define in the first function argument mat rix_order Use the mk1_lapacke h header file with the C interface to LAPACK The header file specifies constants and prototypes of all the functions It also determines whether the program is being compiled with a C compiler and if it is the included file will be correct for use with C compilation You can find examples of the C interface to LAPACK in the examples lapacke subdirectory in the Intel MKL installation directory Using Complex Types in C C As described in the Building Applications document for the Intel R Fortran Compiler XE C C does not directly implement the Fortran types COMPLEX 4 and COMPLEX 8 However you can write equivalent structures The type COMPLEX 4 consists of two 4 byte floating point numbers The first of them is the real number component and the second one is the imaginary number component The type COMPLEX 8 is similar to COMPLEX 4 except that it contains two 8 byte floating point numbers Intel R Math Kernel Library Intel R MKL provides c
18. Complex BLAS Level 1 Function from C e Example Using CBLAS Interface Instead of Calling BLAS Directly from C Example Calling a Complex BLAS Level 1 Function from C The example below illustrates a call from a C program to the complex BLAS Level 1 function zdotc This function computes the dot product of two double precision complex vectors 67 6 Intel Math Kernel Library for Linux OS User s Guide In this example the complex dot product is returned in the structure c include mkl h define N 5 void main MKL_int n N inca 1 incb 1 i MKL_Complex16 a N b N c for i 0 i lt n itt double i afi im double i 2 0 double n i b i im double i 2 0 b i re zdotc amp c amp n a amp inca b amp incb printf The complex dot product is 6 2f 6 2f n c re c im Calling a Complex BLAS Level 1 Function from C Below is the C implementation include lt complex gt include lt iostream gt define MKL_Complexl6 std complex lt double gt include mkl h define N 5 int main int n inca 1 incb 1 i std complex lt double gt a N b N c n N for i 0 i lt nj i a i std complex lt double gt i i 2 0 b i std complex lt double gt n i i 2 0 zdotc amp c amp n a amp inca b amp incb std cout lt lt The complex dot product is lt lt c lt lt std endl return 0 Exa
19. IA 32 architecture The command takes the list of functions from the user_list file and uses the native Intel MKL error handler xerbla An example of a more complex case follows make ia32 export my_func_list txt name mkl_small xerbla my_xerbla o In this case the command creates the mk1_small so library for processors using the IA 32 architecture The command takes the list of functions from my_func_list txt file and uses the user s error handler my_xerbla o The process is similar for processors using the Intel R 64 architecture See Also e Using the Single Dynamic Library Interface Specifying a List of Functions To specify functions in the user_list file use complete function domain specific lists of functions in the lt mk1 directory gt tools builder folder Adjust function names to the required interface For example for Fortran functions append an underscore character _ to the names as a suffix dgemm_ ddot_ dgetrf_ g TIP Names of Fortran style routines BLAS LAPACK etc can be both upper case or lower case with or without the trailing underscore For example these names are equivalent BLAS dgemm DGEMM dgemm_ DGEMM_ LAPACK dgetrf DGETRF dgetrf_ DGETRF_ Properly capitalize names of C support functions in the function list To do this follow the guidelines below 1 In the mkl_service h include file look up a define directive for your function 2 Take the function name from the repla
20. Libraries in the Intel R 64 Architecture Directory 1ib inte164 File Interface layer libm libm libm libm libm libm kl_rt so kl_intel_lp64 so kl_intel_ilp64 so kl_intel_sp2dp so k1l_gf_lp64 so kl_gf_ilp64 so Threading layer libmkl_intel_thread so libmkl_gnu_thread so libmkl_pgi_thread so libmkl_sequential so Computational layer libm libm libm libm libm libm libm libm libm libm libm kl_core so kl_def so kl_mc so kl_mc3 so kl_avx so kl_vml_def so kl_vml_p4n so kl_vml_mc so k1l_vml_mc2 so kl_vml_mc3 so kl _vml_avx so 110 Contents Single Dynamic Library interface library LP64 interface library for the Intel compilers ILP64 interface library for the Intel compilers SP2DP interface library for the Intel compilers LP64 interface library for the GNU Fortran compilers ILP64 interface library for the GNU Fortran compilers Threading library for the Intel compilers Threading library for the GNU Fortran and C compilers Threading library for the PGI compiler Sequential library Library dispatcher for dynamic load of processor specific kernel Default kernel library Kernel library for processors based on the Intel R Core TM microarchitecture Kernel library for the Intel R Core TM i7 processors Kernel optimized for the Intel R Advanced Vector Extensions Intel R AVX VML VSL part of default kernels VML VSL for the Intel R Xeon R
21. MKL Fourier transform functions These collections correspond to the FFTW versions 2 x and 3 x and the Intel MKL versions 7 0 and later These wrappers enable using Intel MKL Fourier transforms to improve the performance of programs that use FFTW without changing the program source code See the FFTW Interface to Intel R Math Kernel Library appendix in the Intel MKL Reference Manual for details on the use of the wrappers 103 B Intel Math Kernel Library for Linux OS User s Guide 104 Directory Structure in Detail Tables in this section show contents of the Intel R Math Kernel Library Intel R MKL architecture specific directories Optimization Notice Intel compilers associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non Intel microprocessors for example SIMD instruction sets but do not optimize equally for non Intel microprocessors In addition certain compiler options for Intel compilers including some that are not specific to Intel micro architecture are reserved for Intel microprocessors For a detailed description of Intel compiler options including the instruction sets and specific microprocessors they implicate please refer to the Intel Compiler User and Reference Guides under Compiler Options Many library routines that are part of Intel compiler products are more highly optimized for In
22. are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel compatible microprocessors depending on the options you select your code and other factors you likely will get extra performance on Intel microprocessors Intel compilers associated libraries and associated development tools may or may not optimize to the same degree for non Intel microprocessors for optimizations that are not unique to Intel microprocessors These optimizations include Intel Streaming SIMD Extensions 2 Intel SSE2 Intel Streaming SIMD Extensions 3 Intel SSE3 and Supplemental Streaming SIMD Extensions 3 Intel SSSE3 instruction sets and other optimizations Intel does not guarantee the availability functionality or effectiveness of any optimization on microprocessors not manufactured by Intel Microprocessor dependent optimizations in this product are intended for use with Intel microprocessors While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non Intel microprocessors Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements We hope to win your business by striving to offer the best performance of any compiler or library please let us know if you find we
23. arrays on 16 byte boundaries use mk1_malloc in place of system provided memory allocators as shown in the code example below Sequential mode of Intel MKL removes the influence of non deterministic parallelism Aligning Addresses on 16 byte Boundaries KKKKKKK D language KKKKKKK include lt stdlib h gt void darray int workspace Allocate workspace aligned on 16 byte boundary darray mkl_malloc sizeof double workspace 16 call the program using MKL mkl_app darray Free workspace mkl_free darray e44x Fortran language double precision darray pointer p_wrk darray 1 75 7 Intel Math Kernel Library for Linux OS User s Guide integer workspace Allocate workspace aligned on 16 byte boundary p_wrk mkl_malloc 8 workspace 16 call the program using MKL call mkl_app darray Free workspace call mkl_free p_wrk 76 Working with the Intel R Math Kernel Library Cluster Software Optimization Notice Intel compilers associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non Intel microprocessors for example SIMD instruction sets but do not optimize equally for non Intel microprocessors In addition certain compiler options for Intel compilers including some that are not specific to Intel micro architecture are reserved for Intel microproc
24. changing with OpenMP environment variable 49 Intel R MKL choice particular cases 53 setting for cluster 79 techniques to set 49 P parallel performance 48 parallelism of Intel R MKL 45 performance multi core 57 with denormals 58 with subnormals 58 S ScaLAPACK linking with 77 SDL interface 33 sequential mode of Intel R MKL 36 Single Dynamic Library interface 33 structure high level 26 in detail 105 model 27 support technical 11 supported architectures 25 syntax link line 32 system libraries linking with 39 T technical support 11 thread safety of Intel R MKL 45 threaded functions 46 threaded problems 46 threading control Intel R MKL specific 52 threading libraries linking with 37 U uBLAS matrix matrix multiplication substitution with Intel MKL functions 69 unstable output getting rid of 75 usage information 15
25. code modified to do ASYOUGO and ENDEARLY modifications src pgesv HPL_pdgesv0 c HPL 2 0 code modified to do ASYOUGO ASYOUGO2 and ENDEARLY modifications testing ptest HPL dat HPL 2 0 sample HPL dat modified Make ia32 New Sample architecture makefile for processors using the IA 32 architecture and Linux OS Make intel64 New Sample architecture makefile for processors using the Intel R 64 architecture and Linux OS HPL dat A repeat of testing ptest HPL dat in the top level directory Next six files are prebuilt executables readily available for simple performance testing bin_intel ia32 xhpl_ia32 New Prebuilt binary for the IA 32 architecture and Linux OS Statically linked against Intel R MPI 3 2 bin_intel ia32 xhpl_ia32_dynamic New Prebuilt binary for the IA 32 architecture and Linux OS Dynamically linked against Intel R MPI 3 2 bin_intel intel64 xhpl_intel64 New Prebuilt binary for the Intel R 64 architecture and Linux OS Statically linked against Intel R MPI 3 2 bin_intel intel64 xhpl_intel64_dynamic New Prebuilt binary for the Intel R 64 architecture and Linux OS Dynamically linked against Intel R MPI 3 2 Next six files are prebuilt hybrid executables bin_intel ia32 xhpl_hybrid_ia32 New Prebuilt hybrid binary for the IA 32 architecture and Linux OS Statically linked against Intel R MPI 3 2 bin_intel ia32 xhpl_hybrid_ia32_dynamic New Prebuilt hybrid binary for t
26. domain env string gt lt delimiter gt lt space symbol gt lt space symbol gt lt comma symbol gt lt semicolon symbol gt lt colon symbol gt lt space symbol gt lt MKL domain env string gt lt MKL domain env name gt lt uses gt lt number of threads gt lt MKL domain env name gt MKL_ALL MKL_ BLAS MKL FFT MKL VML lt uses gt lt space symbol gt lt space symbol gt lt equality sign gt lt comma symbol gt lt space symbol gt lt number of threads gt lt positive number gt lt positive number gt lt decimal positive number gt lt octal number gt lt hexadecimal number gt In the syntax above MKL_BLAS indicates the BLAS function domain MKL_FFT indicates non cluster FFTs and KL_VML indicates the Vector Mathematics Library For example KL_ALL 2 KL_BLAS 1 MKL_FFT 4 KL_ALL 2 KL_BLAS 1 MKL_FFT 4 KL_ALL 2 KL_BLAS 1 MKL_FFT 4 KL_ALL 2 KL_BLAS 1 MKL_FFT 4 KL_A 2 MKL BLAS 1 MKL_FFT 4 KL_ALL 2 MKL_BLAS 1 MKL_FFT 4 The global variables MKL_ALL MKL_BLAS MKL_FFT and MKL_VML as well as the interface for the Intel R Math Kernel Library Intel R MKL threading control functions can be found in the mk1 h header file The table below illustrates how values of MKL_DOMAIN_NUM_THREADS are interpreted Value of Interp
27. find we do not Notice revision 20101101 Using Parallelism of the Intel R Math Kernel Library The Intel R Math Kernel Library Intel R MKL is extensively parallelized See Threaded Functions and Problems for lists of threaded functions and problems that can be threaded Intel MKL is thread safe which means that all Intel MKL functions except the LAPACK deprecated routine 1acon work correctly during simultaneous execution by multiple threads In particular any chunk of threaded Intel MKL code provides access for multiple threads to the same shared data while permitting only one thread at any given time to access a shared piece of data Therefore you can call Intel MKL from multiple threads and not worry about the function instances interfering with each other 45 5 Intel Math Kernel Library for Linux OS User s Guide The library uses OpenMP threading software so you can use the environment variable OMP_NUM_THREADS to specify the number of threads or the equivalent OpenMP run time function calls Intel MKL also offers variables that are independent of OpenMP such as MKL_NUM_THREADS and equivalent Intel MKL functions for thread management The Intel MKL variables are always inspected first then the OpenMP variables are examined and if neither is used the OpenMP software chooses the default number of threads By default Intel MKL uses the number of threads equal to the number of physical cores on the system
28. i_realloc my_realloc i_free my_free Now you may call Intel MKL functions 59 5 Intel Math Kernel Library for Linux OS User s Guide 60 Language specific Usage Options The Intel R Math Kernel Library Intel R MKL provides broad support for Fortran and C C programming However not all function domains support both Fortran and C interfaces For example LAPACK has no C interface You can call functions comprising such domains from C using mixed language programming If you want to use LAPACK or BLAS which support Fortran in the Fortran 95 environment additional effort may be initially required to build compiler specific interface libraries and modules from the source code provided with Intel MKL Optimization Notice Intel compilers associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non Intel microprocessors for example SIMD instruction sets but do not optimize equally for non Intel microprocessors In addition certain compiler options for Intel compilers including some that are not specific to Intel micro architecture are reserved for Intel microprocessors For a detailed description of Intel compiler options including the instruction sets and specific microprocessors they implicate please refer to the Intel Compiler User and Reference Guides under Compiler Options Many library routines
29. in what relates to threading with the possible exception of a different default number of threads Section Number of User Threads in the Fourier Transform Functions chapter of the Intel MKL Reference Manual shows how the Intel MKL threading controls help to set the number of threads for the FFT computation The table below lists the Intel MKL environment variables for threading control their equivalent functions and OMP counterparts Environment Variable Service Function Comment Equivalent OpenMP Environment Variable MKL_NUM_THREADS mkl_set_num_threads Suggests the number of OMP_NUM_THREADS threads to use MKL_DOMAIN_NUM_ mkl domain set_nm threads Suggests the number of THREADS threads for a particular function domain MKL_DYNAMIC mkl_set_dynamic Enables Intel MKL to OMP_DYNAMIC dynamically change the number of threads NOTE The functions take precedence over the respective environment variables Therefore if you want Intel MKL to use a given number of threads in your application and do not want users of your application to change this number using environment variables set the number of threads by a call to mk1_set_num_threads which will have full precedence over any environment variables being set 52 Managing Performance and Memory 5 The example below illustrates the use of the Intel MKL function mkl_set_num_threads to set one thread KKKKKKK Cc language KKKKKKK include lt o
30. optimized radices are 2 3 5 7 11 and 13 Using Memory Management Memory Management Software of the Intel R Math Kernel Library Intel R Math Kernel Library Intel R MKL has memory management software that controls memory buffers for the use by the library functions New buffers that the library allocates when your application calls Intel MKL are not deallocated until the program ends To get the amount of memory allocated by the memory management software call the mkl1_mem_stat function If your program needs to free memory call mkl1_free_buffers 58 Managing Performance and Memory 5 If another call is made to a library function that needs a memory buffer the memory manager again allocates the buffers and they again remain allocated until either the program ends or the program deallocates the memory This behavior facilitates better performance However some tools may report this behavior as a memory leak The memory management software is turned on by default To turn it off set the MKL_DISABLE_FAST_MM environment variable to any value or call the mk1_disable_fast_mm function Be aware that this change may negatively impact performance of some Intel MKL routines especially for small problem sizes Redefining Memory Functions In C C programs you can replace Intel R Math Kernel Library Intel R MKL memory functions that the library uses by default with your own functions To do this use the memory renami
31. that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel compatible microprocessors depending on the options you select your code and other factors you likely will get extra performance on Intel microprocessors Intel compilers associated libraries and associated development tools may or may not optimize to the same degree for non Intel microprocessors for optimizations that are not unique to Intel microprocessors These optimizations include Intel Streaming SIMD Extensions 2 Intel SSE2 Intel Streaming SIMD Extensions 3 Intel SSE3 and Supplemental Streaming SIMD Extensions 3 Intel SSSE3 instruction sets and other optimizations Intel does not guarantee the availability functionality or effectiveness of any optimization on microprocessors not manufactured by Intel Microprocessor dependent optimizations in this product are intended for use with Intel microprocessors While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non Intel microprocessors Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements We hope to win your business by striving to offer the best performance of any compiler or library please let us know if you fin
32. use with Intel microprocessors While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non Intel microprocessors Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements We hope to win your business by striving to offer the best performance of any compiler or library please let us know if you find we do not Notice revision 20101101 Contents of the Intel R Optimized MP LINPACK Benchmark for Clusters The Intel Optimized MP LINPACK Benchmark for Clusters MP LINPACK Benchmark includes the HPL 2 0 distribution in its entirety as well as the modifications delivered in the files listed in the table below and located in the benchmarks mp_linpack subdirectory in the Intel R Math Kernel Library Intel R MKL directory Directory File in Contents benchmarks mp_linpack testing ptest HPL_pdtest c HPL 2 0 code modified to display captured DGEMM information in ASYOUGO2_DISPLAy if it was captured for details see New Features 91 1 0 Intel Math Kernel Library for Linux OS User s Guide Directory File in Contents benchmarks mp_linpack src blas HPL_dgemm c HPL 2 0 code modified to capture DGEMM information if desired from ASYOUGO2_DISPLAY src grid HPL_grid_init c HPL 2 0 code modified to do additional grid experiments originally not in HPL 2 0 src pgesv HPL_pdgesvK2 c HPL 2 0
33. 0 045 0 050 0 055 0 060 0 065 0 070 0 075 0 080 0 085 0 090 0 095 0 100 0 105 0 110 0 115 0 120 0 125 0 130 0 135 0 140 0 145 0 150 0 155 0 160 0 165 0 170 0 175 0 180 0 185 0 190 0 195 0 200 0 205 0 210 0 215 0 220 0 225 0 230 0 235 0 240 0 245 0 250 0 255 0 260 0 265 0 270 0 275 0 280 0 285 0 290 0 295 0 300 0 305 0 310 0 315 0 320 0 325 0 330 0 335 0 340 0 345 0 350 0 355 0 360 0 365 0 370 0 375 0 380 0 385 0 390 0 395 0 400 0 405 0 410 0 415 0 420 0 425 0 430 0 435 0 440 0 445 0 450 0 455 0 460 0 465 0 470 0 475 0 480 0 485 0 490 0 495 0 515 0 535 0 555 0 575 0 595 0 615 0 635 0 655 0 675 0 695 0 795 0 895 However this problem size is so small and the block size so big by comparison that as soon as it prints the value for 0 045 it was already through 0 08 fraction of the columns On a really big problem the fractional number will be more accurate It never prints more than the 112 numbers above So smaller problems will have fewer than 112 updates and the biggest problems will have precisely 112 updates Mflops is an estimate based on 1280 columns of LU being completed However with lookahead steps sometimes that work is not actually completed when the output is made Nevertheless this is a good estimate for comparing identical runs The 3 numbers in parenthesis are intrusive ASYOUGO2 addins DT is the total time processor 0 has spent in DGEMM DF is the number of billion operations that have been performed in DGEMM by one processor Hence
34. 2 e Cluster FFT e Trigonometric Transform routines e Poisson Laplace and Helmholtz Solver routines e Optimization Trust Region Solver routines e GMP arithmetic functions Reason The function domain you intend to use narrows the search in the Reference Manual for specific routines you need Additionally if you are using the Intel MKL cluster software your link line is function domain specific see Working with the Cluster Software Coding tips may also depend on the function domain see Tips and Techniques to Improve Performance Intel MKL provides support for both Fortran and C C programming Identify the language interfaces that your function domains support see Intel R Math Kernel Library Language Interfaces Support Reason Intel MKL provides language specific include files for each function domain to simplify program development see Language Interfaces Support by Function Domain For a list of language specific interface libraries and modules and an example how to generate them see also Using Language Specific Interfaces with Intel R MKL If your system is based on the Intel 64 architecture identify whether your application performs calculations with large data arrays of more than Piel elements Reason To operate on large data arrays you need to select the ILP64 interface where integers are 64 bit otherwise use the default LP64 interface where integers are 32 bit see Using the ILP64 Interface vs LP64 I
35. 32 Architecture Directories Static Libraries in the IA 32 Architecture Directory 1ib ia32 File Contents Interface layer libmkl_intel a Interface library for the Intel R compilers 105 C Intel Math Kernel Library for Linux OS User s Guide File libmkl_blas95 a libmkl_lapack95 a libmkl_gf a Threading layer libmkl_intel_thread a libmkl_gnu_thread a libmkl_pgi_thread a libmkl_sequential a Computational layer libmkl_core a libmkl_solver a libmk1l_solver_sequential a libmkl_scalapack_core a libmkl_cdft_core a Run time Libraries RTL libmkl_blacs a libmkl_blacs_intelmpi a libmkl_blacs_intelmpi20 a libmk1l_blacs_openmpi a Contents Fortran 95 interface library for BLAS for the Intel R Fortran compiler Fortran 95 interface library for LAPACK for the Intel Fortran compiler Interface library for the GNU Fortran compiler Threading library for the Intel compilers Threading library for the GNU Fortran and C compilers Threading library for the PGI compiler Sequential library Kernel library for the IA 32 architecture Deprecated Empty library for backward compatibility Deprecated Empty library for backward compatibility ScaLAPACK routines Cluster version of FFT functions BLACS routines supporting the following MPICH versions e Myricom MPICH version 1 2 5 10 e ANL MPICH version 1 2 5 2 BLACS routines supporting Intel MPI and MPICH2 A soft link to lib 32 libmkl_blacs_intelmpi a BLACS routine
36. 64 a libmkl_blacs_intelmpi_ilp64 a libmkl_blacs_intelmpi20_1p64 a lilnkl_blacs_intelmpi20_ilp64 a libmk1_blacs_openmpi_lp64 a libmkl_blacs_openmpi_ilp64 a libmk1l_blacs_sgimpt_lp64 a libmk1_blacs_sgimpt_ilp64 a Contents Kernel library for the Intel R 64 architecture Deprecated Empty library for backward compatibility Deprecated Empty library for backward compatibility Deprecated Empty library for backward compatibility Deprecated Empty library for backward compatibility ScaLAPACK routine library supporting the LP64 interface ScaLAPACK routine library supporting the ILP64 interface Cluster version of FFT functions LP64 version of BLACS routines supporting the following MPICH versions e Myricom MPICH version 1 2 5 10 e ANL MPICH version 1 2 5 2 ILP64 version of BLACS routines supporting the following MPICH versions e Myricom MPICH version 1 2 5 10 e ANL MPICH version 1 2 5 2 LP64 version of BLACS routines supporting Intel MPI and MPICH2 ILP64 version of BLACS routines supporting Intel MPI and MPICH2 A soft link to lib intel64 libmkl_blacs_intelmpi_lp64 a A soft link to lib intel64 libmk1l_blacs_intelmpi_ilp6 4 a LP64 version of BLACS routines supporting OpenMPI ILP64 version of BLACS routines supporting OpenMPI LP64 version of BLACS routines supporting SGI MPT ILP64 version of BLACS routines supporting SGI MPT 109 C Intel Math Kernel Library for Linux OS User s Guide Dynamic
37. ACS gt lt MKL core libraries gt Wl nd group where the placeholders stand for paths and libraries as explained in the following table lt MKL cluster library gt lt BLACS gt lt MKL core libraries gt lt MKL kernel libraries gt lt MKL LAPACK amp kernel libraries gt lt MPI gt lt lt MPI gt linker script gt One of ScaLAPACK or Cluster FFT libraries for the appropriate architecture which are listed in Directory Structure in Detail For example for IA 32 architecture it is either lmkl_scalapack_core or lmkl_cdft_core The BLACS library corresponding to your architecture programming interface LP64 or ILP64 and MPI version Available BLACS libraries are listed in Directory Structure in Detail For example for the IA 32 architecture choose one of 1mk1_blacs lmkl_blacs_intelmpi or lmk1_blacs_openmpi depending on the MPI version you use specifically for Intel MPI 3 x choose 1mk1_blacs_intelmpi lt MKL LAPACK amp MKL kernel libraries gt for ScaLAPACK and lt MKL kernel libraries gt for Cluster FFTs Processor optimized kernels threading library and system library for threading support linked as described in Listing Libraries on a Link Line The LAPACK library and lt MKL kernel libraries gt One of several MPI implementations MPICH Intel MPI and so on A linker script that corresponds to the MPI version For instance for Intel MPI 3 x use lt Intel MPI 3 x linke
38. DGEMM call ASYOUGO2_DISPLAY Displays the performance of all the significant DGEMMs inside the run ENDEARLY Displays a few performance hints and then terminates the run early FASTSWAP Inserts the LAPACK optimized DLASWP into HPL s code You can experiment with this to determine best results HYBRID Establishes the Hybrid OpenMP MPI mode of MP LINPACK providing the possibility to use threaded Intel R Math Kernel Library Intel R MKL and prebuilt MP LINPACK hybrid libraries A CAUTION Use this option only with an Intel compiler and the Intel R MPI library version 3 1 or higher You are also recommended to use the compiler version 10 0 or higher Benchmarking a Cluster To benchmark a cluster follow the sequence of steps below some of them are optional Pay special attention to the iterative steps 3 and 4 They make a loop that searches for HPL parameters specified in HPL dat that enable you to reach the top performance of your cluster 1 Install HPL and make sure HPL is functional on all the nodes 2 You may run nodeperf c included in the distribution to see the performance of DGEMM on all the nodes 94 LINPACK and MP LINPACK Benchmarks 1 0 Compile nodeperf c with your MPI and Intel R Math Kernel Library Intel R MKL For example mpiicc 03 nodeperf c LSMKLPATH MKLPATH libmkl_intel_lp64 a W1l start group SMKLPATH libmkl_sequential a SMKLPATH libmkl_core a Wl end group lpthread
39. H libmk1l_intel_thread a SMKLPATH libmkl_core a Wl nd group liomp5 lpthread Static linking of myprog f Fortran 95 BLAS interface and parallel Intel MKL supporting the LP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE ISMKLINCLUDE intel64 1p64 lmkl_blas95_lp64 W1l start group SMKLPATH libmkl_intel_lp 64 a SMKLPATH 1libmk1l_intel_thread a SMKLPATH libmkl_core a Wl nd group liomp5 lpthread See Also Fortran 95 Interfaces to LAPACK and BLAS Examples for linking a C application using cluster components Examples for linking a Fortran application using cluster components Using the Single Dynamic Library Interface 41 4 Intel Math Kernel Library for Linux OS User s Guide Building Custom Shared Objects Custom shared objects reduce the collection of functions available in Intel MKL libraries to those required to solve your particular problems which helps to save disk space and build your own dynamic libraries for distribution The Intel R Math Kernel Library Intel R MKL custom shared object builder enables you to create a dynamic library Shared object containing the selected functions and located in the tools builder directory The builder contains a makefile and a definition file with the list of functions NOTE The objects in Intel MKL static libraries are position independent code PIC which is not typical for static libraries Therefore the cus
40. Intel Math Kernel Library for Linux OS User s Guide MKL 10 3 Linux OS Document Number 315930 012US Legal Information Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL R PRODUCTS NO LICENSE EXPRESS OR IMPLIED BY ESTOPPEL OR OTHERWISE TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE MERCHANTABILITY OR INFRINGEMENT OF ANY PATENT COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT UNLESS OTHERWISE AGREED IN WRITING BY INTEL THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR Intel may make changes to specifications and product descriptions at any time without notice Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them The information here is subject to change without notice Do not finalize a design with this information The products describ
41. K The MP LINPACK Benchmark contains a few sample architecture makefiles You can edit them to fit your specific configuration Specifically e Set TOPdir to the directory that MP LINPACK is being built in e You may set MPI variables that is MPdir MPinc and MPlib e Specify the location of the Intel R Math Kernel Library Intel R MKL and of files to be used LAdir LAinc LAlib e Adjust compiler and compiler linker options e Specify the version of MP LINPACK you are going to build hybrid or non hybrid by setting the version parameter for the make command For example make arch intel64 version hybrid install For some sample cases like Linux systems based on the Intel R 64 architecture the makefiles contain values that must be common However you need to be familiar with building an HPL and picking appropriate values for these variables New Features of Intel R Optimized MP LINPACK Benchmark The toolset is basically identical with the HPL 2 0 distribution There are a few changes that are optionally compiled in and disabled until you specifically request them These new features are ASYOUGO Provides non intrusive performance information while runs proceed There are only a few outputs and this information does not impact performance This is especially useful because many runs can go for hours without any information ASYOUGO2 Provides slightly intrusive additional performance information by intercepting every
42. K benchmarks to help you obtain high LINPACK benchmark results on your systems based on genuine Intel R processors more easily than with the HPL benchmark Use the Intel R Optimized MP LINPACK Benchmark to benchmark your cluster The prebuilt binaries require that you first install Intel R MPI 3 x be installed on the cluster The run time version of Intel MPI is free and can be downloaded from www intel com software products The Intel package includes software developed at the University of Tennessee Knoxville Innovative Computing Laboratories and neither the University nor ICL endorse or promote this product Although HPL 2 0 is redistributable under certain conditions this particular package is subject to the Intel R Math Kernel Library Intel R MKL license Intel MKL has introduced a new functionality into MP LINPACK which is called a hybrid build while continuing to support the older version The term hybrid refers to special optimizations added to take advantage of mixed OpenMP MPI parallelism If you want to use one MPI process per node and to achieve further parallelism by means of OpenMP use the hybrid build In general the hybrid build is useful when the number of MPI processes per core is less than one If you want to rely exclusively on MPI for parallelism and use one MPI per core use the non hybrid build In addition to supplying certain hybrid prebuilt binaries Intel MKL supplies some hybrid prebuilt libraries for Int
43. LAPACK SCAlable LAPACK with its support functionality including the Basic Linear Algebra Communications Subprograms BLACS and the Parallel Basic Linear Algebra Subprograms PBLAS Available with Intel MKL for Linux and Windows operating systems e Direct sparse solver an iterative sparse solver and a supporting set of sparse BLAS level 1 2 and 3 for solving sparse systems of equations e Multidimensional discrete Fourier transforms 1D 2D 3D with a mixed radix support not limited to sizes of powers of 2 Distributed versions of these functions are provided for use on clusters on the Linux and Windows operating systems e A set of vectorized transcendental functions called the Vector Math Library VML For most of the supported processors the Intel MKL VML functions offer greater performance than the libm scalar functions while keeping the same high accuracy e The Vector Statistical Library VSL which offers high performance vectorized random number generators for several probability distributions convolution and correlation routines and summary statistics functions Intel MKL is thread safe and extensively threaded using the OpenMP technology For details see the Intel R MKL Reference Manual Optimization Notice Intel compilers associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non Intel microprocessors f
44. LINPACK 90 hyper threading technology configuration tip 56 113 Intel Math Kernel Library for Linux OS User s Guide I ILP64 programming support for 34 include files Intel R MKL 100 installation checking 17 Intel R Web site searching in Eclipse IDE 85 interface Fortran 95 libraries 36 LP64 and ILP64 use of 34 Single Dynamic Library 33 interface libraries and modules Intel R MKL 62 interface libraries linking with 33 J Java examples 70 L language interfaces support 99 language specific interfaces 61 62 interface libraries and modules 62 LAPACK C interface to use of 65 calling routines from C 65 Fortran 95 interface to 63 performance of packed routines 55 threaded routines 46 layers Intel R MKL structure 27 libraries to link against 31 33 37 38 39 compiler support RTL 38 computational 38 interface 33 system libraries 39 threading 37 link line syntax 32 linking examples cluster software 80 general 39 linking with compiler support RTL 38 computational libraries 38 interface libraries 33 system libraries 39 threading libraries 37 linking quick start 31 linking Web based advisor 21 LINPACK benchmark 87 man pages viewing 29 memory functions redefining 59 memory management 58 memory renaming 59 114 mixed language programming 65 module Fortran 95 63 MP LINPACK benchmark 90 multi core performance 57 notational conventions 13 number of threads changing at run time 49
45. N N N N ALPHA A N B N BETA C N print Row AC DO i 1 10 write 14 F20 8 F20 8 I A 1 1 C 1 1 END DO STOP END 51 5 Intel Math Kernel Library for Linux OS User s Guide Using Additional Threading Control Intel R MKL specific Environment Variables for Threading Control The Intel R Math Kernel Library Intel R MKL provides optional threading controls that is the environment variables and service functions that are independent of OpenMP They behave similar to their OpenMP equivalents but take precedence over them in the meaning that the MKL specific threading controls are inspected first By using these controls along with OpenMP variables you can thread the part of the application that does not call Intel MKL and the library independently from each other These controls enable you to specify the number of threads for Intel MKL independently of the OpenMP settings Although Intel MKL may actually use a different number of threads from the number suggested the controls will also enable you to instruct the library to try using the suggested number when the number used in the calling application is unavailable NOTE Sometimes Intel MKL does not have a choice on the number of threads for certain reasons such as system resources Use of the Intel MKL threading controls in your application is optional If you do not use them the library will mainly behave the same way as Intel MKL 9 1
46. SSL e com intel mk1l VML e com intel mk1l VSL Documentation for the particular wrapper and example classes will be generated from the Java sources while building and running the examples To browse the documentation open the index file in the docs directory created by the build script lt mkl directory gt examples java docs index html The Java wrappers for CBLAS VML VSL RNG and FFT establish the interface that directly corresponds to the underlying native functions so you can refer to the Intel MKL Reference Manual for their functionality and parameters Interfaces for the ESSL like functions are described in the generated documentation for the com intel mk1 ESSL class Each wrapper consists of the interface part for Java and JNI stub written in C You can find the sources in the following directory lt mkl directory gt examples java wrappers Both Java and C parts of the wrapper for CBLAS and VML demonstrate the straightforward approach which you may use to cover additional CBLAS functions The wrapper for FFT is more complicated because it needs to support the lifecycle for FFT descriptor objects To compute a single Fourier transform an application needs to call the FFT software several times with the same copy of the native FFT descriptor The wrapper provides the handler class to hold the native descriptor while the virtual machine runs Java bytecode The wrapper for VSL RNG is similar to the one for FFT The
47. TS cceeeeeeee eee ee teeta eee eee eae eaeeeenaees 77 Setting the Number of Threads cccceeee eee eee eee eee eee eee a eens eae eed 79 Using Shared Libraries medisinene n aN EONA N nA 79 Building ScaL APACK TESTS iienaa aaiae aa Aa A a SAANEN AA A a AAA ed 79 Examples for Linking with ScaLAPACK and Cluster FFT eeseeeeeeeeeeeeeeeeees 80 Examples for Linking a C Application cccceceeeeeeeeeeeeeeeeeeeeeeeenaeeaeennes 80 Examples for Linking a Fortran Application cccccceseeseeeeeeeeeeeeeeeeaeeaees 80 Chapter 9 Programming with Intel R Math Kernel Library in the Eclipse Integrated Development Environment IDE Configuring the Eclipse IDE CDT to Link with Intel R Math Kernel Library 83 Configuring the Eclipse IDE CDT 3 X ccecceee eect eee ee eee eee teeta ea eee ees 83 Configuring the Eclipse IDE CDT 4 0 cceceeeeee cence eee ee teeta eta eeeea ees 84 Getting Assistance for Programming in the Eclipse IDE ceeeeeeee eee ee es 84 Viewing the Intel R Math Kernel Library Reference Manual in the Eclipse T DB ia actu fins enn debe AE A AE Vee rau eee t nad aie bye habia 85 Searching the Intel Web Site from the Eclipse IDE eceeeeeee eee es 85 vii Intel Math Kernel Library for Linux OS User s Guide Chapter 10 LINPACK and MP LINPACK Benchmarks Intel R Optimized LINPACK Benchmark for LiINUX OS wo ceeeceeeeee eset eee eee
48. _intel_thread lmkl_core liomp5 lpthread Static linking of myprog and sequential version of Intel MKL supporting the LP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE W1 start group SMKLPATH libmkl_intel_1lp64 a SMKLPATH libmkl_sequential a SMKLPATH libmkl_core a Wl nd group lpthread Dynamic linking of myprog f and sequential version of Intel MKL supporting the LP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE lmkl_intel_lp64 lmkl_sequential lmkl_core lpthread Static linking of myprog f and parallel Intel MKL supporting the ILP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE W1 start group SMKLPATH libmkl_intel_ilp64 a SMKLPATH libmkl_intel_thread a SMKLPATH libmkl_core a Wl nd group liomp5 lpthread Dynamic linking of myprog f and parallel Intel MKL supporting the ILP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE lmkl_intel_ilp64 1mkl_intel_thread lmkl_core liomp5 lpthread Dynamic linking of user code myprog f and parallel or sequential Intel MKL Call appropriate functions or set environment variables to choose threaded or sequential mode and to set the interface ifort myprog f lmkl_rt Static linking of myprog f Fortran 95 LAPACK interface and parallel Intel MKL supporting the LP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE ISMKLINCLUDE intel64 1p64 lmkl_lapack95_l1p64 W1l start group SMKLPATH libmkl_intel_lp64 a SMKLPAT
49. a E CDT Plugin Developer Guide E C C Development User Guide Searching the Intel Web Site from the Eclipse IDE The Intel R Math Kernel Library Intel R MKL plug in tunes Eclipse Help search to target http www intel com so that when you are connected to the Internet and run a search from the Eclipse Help pane the search hits at the site are shown through a separate link The following figure shows search results for VML Functions in Eclipse Help In the figure 1 hit means an entry hit to the respective site Click Intel com 1 hit to open the list of actual hits to the Intel Web site 85 9 Intel Math Kernel Library for Linux OS User s Guide 86 LINPACK and MP LINPACK Benchmarks Optimization Notice Intel compilers associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non Intel microprocessors for example SIMD instruction sets but do not optimize equally for non Intel microprocessors In addition certain compiler options for Intel compilers including some that are not specific to Intel micro architecture are reserved for Intel microprocessors For a detailed description of Intel compiler options including the instruction sets and specific microprocessors they implicate please refer to the Intel Compiler User and Reference Guides under Compiler Options Many library routines that
50. ably work slower than the same function called from a program written in C C or Fortran Known bugs There are a number of known bugs in Intel MKL identified in the Release Notes as well as incompatibilities between different versions of JDK The examples and wrappers include workarounds for these problems Look at the source code in the examples and wrappers for comments that describe the workarounds 73 6 Intel Math Kernel Library for Linux OS User s Guide 74 Coding Tips This section discusses programming with the Intel R Math Kernel Library Intel R MKL to provide coding tips that meet certain specific needs such as consistent results of computations Aligning Data for Consistent Results Routines in the Intel R Math Kernel Library Intel R MKL may return different results from run to run on the same system This is usually due to a change in the order in which floating point operations are performed The two most influential factors are array alignment and parallelism Array alignment can determine how internal loops order floating point operations Non deterministic parallelism may change the order in which computational tasks are executed While these results may differ they should still fall within acceptable computational error bounds To better assure identical results from run to run do the following e Align input arrays on 16 byte boundaries e Run Intel MKL in the sequential mode To align input
51. ally executes and sets the paths to the appropriate Intel R Math Kernel Library Intel R MKL directories To do this with a local user account edit the following files by adding the appropriate script to the path manipulation section right before exporting variables 19 2 Intel Math Kernel Library for Linux OS User s Guide Shell Files Commands bash bash_profile setting up MKL environment for bash bash_login lt absolute_path_to_installed_MKL gt bin or profile lt arch gt mklvars lt arch gt sh lt arch gt mod 1p64 ilp64 sh profile setting up MKL environment for sh lt absolute_path_to_installed_MKL gt bin lt arch gt mklvars lt arch gt sh lt arch gt mod 1p64 i1p64 csh login setting up MKL environment for sh lt absolute_path_to_installed_MKL gt bin lt arch gt mklvars lt arch gt csh lt arch gt mod 1p64 ilp64 In the above commands replace lt arch gt with ia32 or intel 4 If you have super user permissions add the same commands to a general system file in etc profile for bash and sh or in etc csh login for csh Al CAUTION Before uninstalling Intel MKL remove the above commands from all profile files where the script execution was added Otherwise you may experience problems logging in See Also e Scripts to Set Environment Variables Compiler Support Intel R Math Kernel Library Intel R MKL supports compilers identif
52. and 1 37 0 To get them visit www boost org A code example provided in the lt mk1 directory gt examples ublas source sylvester cpp file illustrates usage of the Intel MKL uBLAS header file for solving a special case of the Sylvester equation To run the Intel MKL ublas examples specify the BOOST_ROOT parameterinthe make command for instance when using Boost version 1 37 0 make libia32 BOOST_ROOT lt your_path gt boost_1_37_0 See Also e Using Code Examples Invoking Intel R Math Kernel Library Functions from Java Applications Intel MKL Java Examples To demonstrate binding with Java Intel R Math Kernel Library Intel R MKL includes a set of Java examples in the following directory lt mkl directory gt examples java The examples are provided for the following MKL functions e gemm gemv and dot families from CBLAS e The complete set of non cluster FFT functions e ESSL like functions for one dimensional convolution and correlation e VSL Random Number Generators RNG except user defined ones and file subroutines e VML functions except GetErrorCallBack SetErrorCallBack and ClearErrorCallBack You can see the example sources in the following directory lt mkl directory gt examples java examples The examples are written in Java They demonstrate usage of the MKL functions with the following variety of data e 1 and 2 dimensional data sequences e Real and complex types of the data e Sing
53. ations over multiple threads unless you specifically request Intel MKL to do so via the MKL_DYNAMIC functionality However Intel MKL can be aware that it is in a parallel region only if the threaded program and Intel MKL are using the same threading library If your program is threaded by some other means Intel MKL may operate in multithreaded mode and the performance may suffer due to overuse of the resources The following table considers several cases where the conflicts may arise and provides recommendations depending on your threading model Threading model Discussion You thread the program using If more than one thread calls Intel MKL and the function being called is OS threads pthreads on threaded it may be important that you turn off Intel MKL threading Set the Linux OS number of threads to one by any of the available means see Techniques to Set the Number of Threads You thread the program using This is more problematic because setting of the OMP_NUM_THREADS OpenMP directives and or environment variable affects both the compiler s threading library and pragmas and compile the libiomp In this case choose the threading library that matches the layered program using acompiler other Intel MKL with the OpenMP compiler you employ see Linking Examples on than a compiler from Intel how to do this If this is not possible use Intel MKL in the sequential mode To do this you should link with the appropriate threading library
54. benefit from the Eclipse provided code assist feature See Code Context Assist description in the Eclipse IDE Help Configuring the Eclipse IDE CDT 3 x To configure Eclipse IDE C C Development Tools CDT 3 x to link with Intel R Math Kernel Library Intel R MKL follow the instructions below e For Standard Make projects e Goto C C Include Paths and Symbols property page and set the Intel MKL include path to lt mk1 directory gt include e Goto C C Project Paths gt Libraries and set the Intel MKL libraries to link with your applications for example lt mk1 directory gt lib intel64 libmkl_intel_lp64 a lt mkl directory gt lib intel64 libmkl_intel_thread a and lt mk1 directory gt lib intel64 libmkl_core a Note that with the Standard Make the above settings are needed for the CDT internal functionality only The compiler linker will not automatically pick up these settings and you will still have to specify them directly in the makefile e For Managed Make projects you can specify settings for a particular build To do this e Go to C C Build gt Tool Settings All the settings you need to specify are on this page Names of the particular settings depend on the compiler integration and therefore are not explained below 83 9 Intel Math Kernel Library for Linux OS User s Guide e If the compiler integration supports include path options set the Intel MKL include path to lt mk1 directory gt inclu
55. binary statically linked against Intel MPI 3 2 New Sample run script for the IA 32 architecture and a pure MPI binary dynamically linked against Intel MPI 3 2 New Example of an MP LINPACK benchmark input file for a pure MPI binary and the IA 32 architecture New Sample run script for the IA 32 architecture and a hybrid binary statically linked against Intel MPI 3 2 New Sample run script for the IA 32 architecture and a hybrid binary dynamically linked against Intel MPI 3 2 New Example of an MP LINPACK benchmark input file for a hybrid binary and the IA 32 architecture New Sample run script for the Intel R 64 architecture and a pure MPI binary statically linked against Intel MPI 3 2 New Sample run script for the Intel R 64 architecture and a pure MPI binary dynamically linked against Intel MPI 3 2 New Example of an MP LINPACK benchmark input file for a pure MPI binary and the Intel R 64 architecture New Sample run script for the Intel R 64 architecture and a hybrid binary statically linked against Intel MPI 3 2 New Sample run script for the Intel R 64 architecture and a hybrid binary dynamically linked against Intel MPI 3 2 New Example of an MP LINPACK benchmark input file for a hybrid binary and the Intel R 64 architecture New Sample utility that tests the DGEMM speed across the cluster 93 1 0 Intel Math Kernel Library for Linux OS User s Guide Building the MP LINPAC
56. bsite for Intel products at http www intel com software products support See Also e Directory Structure in Detail Examples for Linking a C Application These examples illustrate linking of an application whose main module is in C under the following conditions e MPICH2 1 0 7 or higher is installed in opt mpich e SMKLPATH is a user defined variable containing lt mkl_directory gt lib ia32 e You use the Intel R C Compiler 10 0 or higher To link with ScaLAPACK for a cluster of systems based on the IA 32 architecture use the following link line opt mpich bin mpicc lt user files to link gt LSMKLPATH lmk1l_scalapack_core lmkl_blacs_intelmpi lmkl_intel lmkl_intel_thread lmkl_core liomp5 lpthread To link with Cluster FFT for a cluster of systems based on the IA 32 architecture use the following link line opt mpich bin mpicc lt user files to link gt W1 start group SMKLPATH libmkl_cdft_core a SMKLPATH libmkl_blacs_intelmpi a SMKLPATH libmkl_intel a SMKLPATH libmkl_intel_thread a SMKLPATH libmkl_core a W1 end group liomp5 lpthread ZAZA AAA A See Also e Linking with ScaLAPACK and Cluster FFTs Examples for Linking a Fortran Application These examples illustrate linking of an application whose main module is in Fortran under the following conditions e Intel MPI 3 0 is installed in opt intel mpi 3 0 80 Working with the Intel R Math Kernel Library Cluster So
57. cement part of that directive 43 4 Intel Math Kernel Library for Linux OS User s Guide For example the define directive for the mk1_disable_fast_mm function is define mkl_disable_fast_mm MKL_Disable_ Fast_MM Capitalize its name in the list like this MKL_Disable_Fast_MM For the names of the Fortran support functions see the tip NOTE If selected functions have several processor specific versions the builder will automatically include them all in the custom library and the dispatcher will manage them Distributing Your Custom Shared Object To enable use of your custom shared object in a threaded mode distribute 1ibiomp5 so along with the custom shared object 44 Managing Performance and Memory Optimization Notice Intel compilers associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non Intel microprocessors for example SIMD instruction sets but do not optimize equally for non Intel microprocessors In addition certain compiler options for Intel compilers including some that are not specific to Intel micro architecture are reserved for Intel microprocessors For a detailed description of Intel compiler options including the instruction sets and specific microprocessors they implicate please refer to the Intel Compiler User and Reference Guides under Compiler Options Many library rout
58. cessors Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements We hope to win your business by striving to offer the best performance of any compiler or library please let us know if you find we do not Notice revision 20101101 Linking with ScaLAPACK and Cluster FFTs The Intel R Math Kernel Library Intel R MKL ScaLAPACK and Cluster FFTs support MPI implementations identified in the Inte MKL Release Notes To link a program that calls ScaLAPACK or Cluster FFTs you need to know how to link a message passing interface MPI application first Use mpi scripts to do this For example mpicc or mpif77 are C or FORTRAN 77 scripts respectively that use the correct MPI header files The location of these scripts and the MPI library depends on your MPI implementation For example for the default installation of MPICH opt mpich bin mpicc and opt mpich bin mpif77 are the compiler scripts and opt mpich 1lib libmpich a is the MPI library Check the documentation that comes with your MPI implementation for implementation specific details of linking 77 8 Intel Math Kernel Library for Linux OS User s Guide To link with the Intel R Math Kernel Library Intel R MKL ScaLAPACK and or Cluster FFTs use the following general form lt lt MPI gt linker script gt lt files to link gt L lt MKL path gt Wl start group lt MKL cluster library gt lt BL
59. cision array with element_size 8 avoid leading dimensions 256 512 768 1024 elements LAPACK Packed Routines The routines with the names that contain the letters HP OP PP SP TP UP in the matrix type and storage position the second and third letters respectively operate on the matrices in the packed format see LAPACK Routine Naming Conventions sections in the Intel MKL Reference Manual Their functionality is strictly equivalent to the functionality of the unpacked routines with the names containing the letters HE OR PO SY TR UN in the same positions but the performance is significantly lower 55 5 Intel Math Kernel Library for Linux OS User s Guide If the memory restriction is not too tight use an unpacked routine for better performance In this case you need to allocate n7 2 more memory than the memory required by a respective packed routine where nis the problem size the number of equations For example to speed up solving a symmetric eigenproblem with an expert driver use the unpacked routine call dsyevx jobz range uplo n a lda vl vu il iu abstol m w z ldz work lwork iwork ifail info where a is the dimension Jda by n which is at least w elements instead of the packed routine call dspevx jobz range uplo n ap vl vu il iu abstol m w z ldz work iwork ifail info where ap is the dimension n n 1 2 FFT Functions Additional conditions can improve
60. d we do not Notice revision 20101101 Using Language Specific Interfaces with Intel R Math Kernel Library This section discusses mixed language programming and the use of language specific interfaces with Intel R Math Kernel Library Intel R MKL 61 6 Intel Math Kernel Library for Linux OS User s Guide See also Appendix G in the Intel MKL Reference Manual for details of the FFTW interfaces to Intel MKL Interface Libraries and Modules You can create the following interface libraries and modules using the respective makefiles located in the interfaces directory File name Contains Libraries in Intel R Math Kernel Library Intel R MKL architecture specific directories libm libf libfft libff libff libfft libfft libfft libff libff libmk1_blas95 a libmkl_blas95_ilp64 a libmkl_blas95_1p64 a 1 libmkl_lapack95 a libmkl_lapack95_lp64 a 1_lapack95_ilp64 a tw2xc_intel at tw2xc_gnu a tw2xf_intel a tw2xf_gnu a ftw3xc_intel a tw3xc_gnu a i 2 tw3xf_intel a tw3xf_gnu a 62 tw2x_cdft_SINGLE Fortran 95 wrappers for BLAS BLAS95 for IA 32 architecture Fortran 95 wrappers for BLAS BLAS95 supporting LP64 interface Fortran 95 wrappers for BLAS BLAS95 supporting ILP64 interface Fortran 95 wrappers for LAPACK LAPACK95 for IA 32 architecture Fortran 95 wrappers for LAPACK LAPACK95 supporting LP64 interface Fortran
61. de e Ifthe compiler integration supports library path options set a path to the Intel MKL libraries for the target architecture such as lt mk1 directory gt lib intel 4 e Specify the names of the Intel MKL libraries to link with your application for example mk1_intel_1p 64 mkl_intel_thread_1p64 mkl_core and iomp5 compilers typically require library names rather than library file names so omit the 1ib prefix and a extension See Also e Selecting Libraries to Link Configuring the Eclipse IDE CDT 4 0 Before configuring Eclipse IDE C C Development Tools CDT 4 0 make sure to turn on the automatic makefile generation To configure Eclipse CDT 4 0 to link with Intel MKL follow the instructions below 1 If the tool chain compiler integration supports include path options go to C C General gt Paths and Symbols gt Includes and set the Intel R Math Kernel Library Intel R MKL include path that is lt mk1 directory gt include 2 If the tool chain compiler integration supports library path options go to C C General gt Paths and Symbols gt Library Paths and set the Intel MKL library path for the target architecture such as lt mk1 directory gt lib intel 4 3 Go to C C Build gt Settings gt Tool Settings and specify the names of the Intel MKL libraries to link with your application for example mk1_intel_1p64 mkl_intel_thread_1p64 mkl_core and iomp5 compilers typically require library names rathe
62. do not Notice revision 20101101 Intel R Optimized LINPACK Benchmark for Linux OS Intel R Optimized LINPACK Benchmark is a generalization of the LINPACK 1000 benchmark It solves a dense real 8 system of linear equations Ax b measures the amount of time it takes to factor and solve the system converts that time into a performance rate and tests the results for accuracy The generalization is in the number of equations N it can solve which is not limited to 1000 It uses partial pivoting to assure the accuracy of the results Do not use this benchmark to report LINPACK 100 performance because that is a compiled code only benchmark This is a shared memory SMP implementation which runs on a single platform Do not confuse this benchmark with e MP LINPACK which is a distributed memory version of the same benchmark e LINPACK the library which has been expanded upon by the LAPACK library 87 1 0 Intel Math Kernel Library for Linux OS User s Guide Intel provides optimized versions of the LINPACK benchmarks to help you obtain high LINPACK benchmark results on your genuine Intel R processor systems more easily than with the High Performance Linpack HPL benchmark Use this package to benchmark your SMP machine Additional information on this software as well as other Intel R software performance products is available at http www intel com software products Contents of the Intel R Optimized LINPACK Bench
63. e Function Domain Trigonometric Transform routines Fast Poisson Laplace and Helmholtz Solver Poisson Library routines Optimization Trust Region Solver routines GMP arithmetic functions Support functions including memory allocation FORTRAN 77 Fortran90 95 C C interface interface interface Yes Yes Yes Yes Yes Yes Yes Yes Yes t Supported using a mixed language programming call See Intel R MKL Include Files for the respective header file Include Files Function domain All function domains BLAS Routines BLAS like Extension Transposition Routines CBLAS Interface to BLAS Sparse BLAS Routines LAPACK Routines C Interface to LAPACK ScaLAPACK Routines All Sparse Solver Routines PARDISO DSS Interface RCI Iterative Solvers ILU Factorization Optimization Solver Routines 100 Fortran Include Files mk1l fi blas f 90 mkl_blas fi mkl_trans fi mkl_spblas fi lapack f 90 mkl_lapack fi mkl_solver f90 mkl_pardiso f77 mkl_pardiso f 90 mkl_dss f 77 mkl_dss f90 mkl_rci fi mki red 1 C C Include Files mkl h mkl_blas h mkl_trans h mkl_cblas h mkl_spblas h mkl_lapack h mkl_lapacke h mkl_scalapack h mkl_solver h mkl_pardiso h mkl_dss h mkl_orci h mkl_orci h Intel R Math Kernel Library Language Interfaces Support A Function domain Fortran Include Files C C Include Files Vector Mathematical Functions mkl_vml 77 mkl_vml h mkl_v
64. e A mapping between single precision names and double precision names for applications using Cray style naming SP2DP interface SP2DP interface supports Cray style naming in applications targeted for the Intel 64 architecture and using the ILP64 interface SP2DP interface provides a mapping between single precision names for both real and complex types in the application and double precision names in Intel MKL BLAS and LAPACK Function names are mapped as shown in the following example for BLAS functions GEMM SGEMM gt DGI DGEMM gt DGI CGEMM gt ZG ZGEMM gt ZGE Mind that no changes are made to double precision names a 1 ESI Threading Layer This layer e Provides a way to link threaded Intel MKL with different threading compilers e Enables you to link with a threaded or sequential mode of the library This layer is compiled for different environments threaded or sequential and compilers from Intel GNU and so on Computational Heart of Intel MKL This layer has only one library for each combination of architecture Layer and supported OS The Computational layer accommodates multiple architectures through identification of architecture features and chooses the appropriate binary code at run time Compiler Support To support threading with Intel compilers Intel MKL uses the compiler support RTL of Run time Libraries the Intel R C Composer XE or Intel R Fortran Composer XE To thread using
65. e Also e Intel MKL Function Domains What s New This User s Guide documents the Intel R Math Kernel Library Intel R MKL 10 3 Update 3 The following product changes have been documented and enhancements done in this User s Guide e The list of FFT optimized radices has been updated e General coding techniques that improve performance have been explained in more detail e Description of memory management software has been corrected e Detected bugs have been fixed Related Information To reference how to use the library in your application use this guide in conjunction with the following documents 15 1 Intel Math Kernel Library for Linux OS User s Guide e The Intel R Math Kernel Library Reference Manual which provides reference information on routine functionalities parameter descriptions interfaces calling syntaxes and return values e The Intel R Math Kernel Library for Linux OS Release Notes 16 Getting Started Optimization Notice Intel compilers associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non Intel microprocessors for example SIMD instruction sets but do not optimize equally for non Intel microprocessors In addition certain compiler options for Intel compilers including some that are not specific to Intel micro architecture are reserved for Intel microprocessors For a de
66. e data types respectively and the native int must be 4 bytes long 1 IBM Engineering Scientific Subroutine Library ESSL See Also e Running the Examples Running the Java Examples The Java examples support all the C and C compilers that the Intel R Math Kernel Library Intel R MKL does The makefile intended to run the examples also needs the make utility which is typically provided with the Linux OS distribution To run Java examples the JDK developer toolkit is required for compiling and running Java code A Java implementation must be installed on the computer or available via the network You may download the JDK from the vendor website The examples should work for all versions of JDK However they were tested only with the following Java implementation s for all the supported architectures e J2SE SDK 1 4 2 JDK 5 0 and 6 0 from Sun Microsystems Inc http sun com e JRockit JDK 1 4 2 and 5 0 from Oracle Corporation http oracle com Note that the Java run time environment JRE system which may be pre installed on your computer is not enough You need the JDK developer toolkit that supports the following set of tools e java e javac e javah javadoc To make these tools available for the examples makefile set the JAVA_HOME environment variable and add the JDK binaries directory to the system PATH for example using the bash shell export JAVA_HOME home lt user name gt jdk1 5 0_09
67. eads to the cores on different sockets Then the Intel MKL FFT function is called define _GNU_SOURCE for using the GNU CPU affinity works with the appropriate kernel and glibc Set affinity mask include lt sched h gt include lt stdio h gt include lt unistd h gt include lt omp h gt int main void int NCPUs sysconf _SC_NPROCESSORS_CONF printf Using thread affinity on i NCPUs n NCPUs pragma omp parallel default shared cpu_set_t new_mask cpu_set_t was_mask int tid omp_get_thread_num CPU_ZERO amp new_mask j 2 packages x 2 cores pkg x 1 threads core 4 total cores CPU_SET tid 0 0 2 amp new_mask if sched_getaffinity 0 sizeof was_mask amp was_mask 1 printf Error sched_getaffinity d sizeof was_mask amp was_mask n tid if sched_setaffinity 0 sizeof new_mask amp new_mask 1 printf Error sched_setaffinity d sizeof new_mask amp new_mask n tid 57 5 Intel Math Kernel Library for Linux OS User s Guide printf tid d new_mask 08X was_mask 08X n tid unsigned int amp new_mask unsigned int amp was_mask Call Intel MKL FFT function return 0 Compile the application with the Intel compiler using the following command icc test_application c openmp where test_application c is the filename for the application Build the application Run it in two threads fo
68. ecommends that you evaluate other compilers and libraries to determine which best meet your requirements We hope to win your business by striving to offer the best performance of any compiler or library please let us know if you find we do not Notice revision 20101101 Checking Your Installation After installing the Intel R Math Kernel Library Intel R MKL verify that the library is properly installed and configured 1 Intel MKL installs in lt Composer XE directory gt Check that the subdirectory of lt Composer XE directory gt referred to as lt mkl directory gt was created By default opt intel composerxe 2011 y xxx mk1 where y is the release update number and xxx is the package number 2 If you want to keep multiple versions of Intel MKL installed on your system update your build scripts to point to the correct Intel MKL version 3 Check that the following files appear in the lt mk1 directory gt bin directory and its subdirectories 17 2 Intel Math Kernel Library for Linux OS User s Guide mklvars sh mklvars csh ia32 mklvars_ia32 sh ia32 mklvars_ia32 csh intel64 mklvars_intel64 sh intel64 mklvars_intel64 csh Use these files to assign Intel MKL specific values to several environment variables 4 To understand how the Intel MKL directories are structured see Intel R Math Kernel Library Structure See Also e Setting Environment Variables Setting Environment Variables See Also e Set
69. ecture Static Linking Dynamic Linking ScaLAPACK librkl_scalapack_core a libmkl_scalapack_core See below See below libmkl_core a a libmkl_core so 2 ScaLAPACK n a n a lilmkl_scalapack_lp64 a _Lilark1_scalapack_lp64 so LP64 a oa interface libmkl_core a libmkl_core so ScaLAPACK n a n a lilmkl_scalapack_ilp64 limk scalapack ilp64 ILP64 i a so interface a Si libmkl_core a libmkl_core so Cluster libmkl_cdft_core a libmkl_cdft_core so libmkl_cdft_core a libmkl_cdft_core so Fourier IE Transform libmkl_core a libmkl_core so libmkl_core a libmkl_core so Functions 1 Add also the library with BLACS routines corresponding to the used MPI For details see Linking with ScaLAPACK and Cluster FFTs 2 Not applicable See Also e Linking with ScaLAPACK and Cluster FFTs e Using the Web based Linking Advisor Linking with Compiler Support RTLs You are strongly encouraged to dynamically link in the compatibility OpenMP run time library libiomp Link with libiomp dynamically even if other libraries are linked statically Linking to static OpenMP run time library is not recommended because it is very easy with complex software to link in more than one copy of the library This causes performance problems too many threads and may cause correctness problems if more than one copy is initialized 38 Linking Your Application with the Intel R Math Kernel Library 4 If you link with dynamic version of 1ibiomp
70. ed in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications Current characterized errata are available on request Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order Copies of documents which have an order number and are referenced in this document or other Intel literature may be obtained by calling 1 800 548 4725 or go to http www intel com design literature htm Intel processor numbers are not a measure of performance Processor numbers differentiate features within each processor family not across different processor families See http www intel com products processor_number for details Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products For more information go to http www intel com performance BunnyPeople Celeron Celeron Inside Centrino Centrino Inside Cilk Core Inside i960 Intel the Intel log
71. eee enees 87 Contents of the Intel R Optimized LINPACK Benchmiark c cceceeeeeeeeees 88 Running the Software ccccceeeee eee eee eee eee eee eee eee eee eee nena nae ee tetas 88 Known Limitations of the Intel R Optimized LINPACK Benchmark 89 Intel R Optimized MP LINPACK Benchmark for CIUStEIrS cccceeeeeeeeeeeeeeeeeeees 90 Overview of the Intel R Optimized MP LINPACK Benchmark for CIUSCENS aiicradec A a sedans ate A E DAA EE EE EAE AATE E A EAEE 90 Contents of the Intel R Optimized MP LINPACK Benchmark for E UEL E A A E A A T ale 91 Building the MP LINPACK wrieni i a E a ea 94 New Features of Intel R Optimized MP LINPACK Benchmark 00085 94 Benchmarking a Cl ster muisie aiie tea tan rene E dee cece neta 94 Options to Reduce Search TiIME cceccecee cece eset eee e eee eeeeeeeeeeeaeeeae eae eaneins 95 Appendix A Intel R Math Kernel Library Language Interfaces Support Language Interfaces Support by Function DOMAIN ccece cece tenet eee eeeeeeeeaeees 99 Include PUES ryeni serdar eaa IS EN IN I ried neve dtednd 100 Appendix B Support for Third Party Interfaces GMP FUNCTIONS ea aa aa A A E A a O a aa a a 103 FFTW Interface SUP O E a aaae teen E A A eas taseeaeeaeeeaeeaaeags 103 Appendix C Directory Structure in Detail Detailed Structure of the IA 32 Architecture DirectOrieS cccccceceeseeeee eee eaees 105 Static Libraries in the IA 32 Architecture Directory lib ia32
72. el R MPI to take advantage of the additional OpenMP optimizations If you wish to use an MPI version other than Intel MPI you can do so by using the MP LINPACK source provided You can use the source to build a non hybrid version that may be used in a hybrid mode but it would be missing some of the optimizations added to the hybrid version 90 LINPACK and MP LINPACK Benchmarks 1 0 Non hybrid builds are the default of the source code makefiles provided In some cases the use of the hybrid mode is required for external reasons If there is a choice the non hybrid code may be faster To use the non hybrid code in a hybrid mode use the threaded version of Intel MKL BLAS link with a thread safe MPI and call function MPI_init_thread so as to indicate a need for MPI to be thread safe Intel MKL also provides prebuilt binaries that are dynamically linked against Intel MPI libraries NOTE Performance of statically and dynamically linked prebuilt binaries may be different The performance of both depends on the version of Intel MPI you are using You can build binaries statically linked against a particular version of Intel MPI by yourself Optimization Notice Intel compilers associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non Intel microprocessors for example SIMD instruction sets but do not optimize equally for n
73. er compilers and libraries to determine which best meet your requirements We hope to win your business by striving to offer the best performance of any compiler or library please let us know if you find we do not Notice revision 20101101 Contents Legal Information sassen nnrir dives suena ieee vsdewasne sees devussaenuuseeesdusenaeienan 3 Introducing the Intel R Math Kernel Library ccccsccseceeeeesceeseeeseeeenes 9 Getting Help and SUppOrt sssssssss s 2 2 2s222020u2020n0u0uuunununnunnnnnnnnnnnnn 11 Notational ConventionS sssssssss2u sss25252u2u225252uu2u2uuuuuunnnn 13 Chapter 1 Overview DOCUMENT OVERVIEW se died cae cieie ed nck eo Saa wa vitae ea oa eve delet Sula N a eeeae Reese utes ea had aa 15 What si NeW neose etanna ours tosteetere edna ades e tenetewe r E a IATA Aae E teat ee Aa 15 Related aea nalii eln eA E ATA A ATE A AE AAN 15 Chapter 2 Getting Started Checking Your Installaatio Misir diarana aeaa A Aaaa ANE reece 17 Setting Environment VariableS ssssssssssrrsserrnssurnrnnnnrrnusurrrnnnurrnnnnnrnnnnnennnn 18 Scripts to Set Environment Variables sssssssssssssssrrrrrrrsrrssrrrerrrrrarrrreress 18 Automating the Process of Setting Environment Variables 065 19 Compiler Supporters ativan aaa aN ives neat Pea AAE east A AEAT eels 20 Using Code Examples miissen iritan i na a N N I E a nee 20 Using the Web based Linking AdV SOT s sssssssss
74. ernel Library Intel R MKL directories lt mkl directory gt lt Composer XE directory gt The main directory where Intel MKL is installed Replace this placeholder with the specific pathname in the configuring linking and building instructions The default value of this placeholder is opt intel composerxe 2011 y xxx mk1 where y is the release update number and xxx is the package number The installation directory for the Intel R C Composer XE 2011 or Intel R Fortran Composer XE 2011 The default value of this placeholder is opt intel composerxe 2011 y xxx The following font conventions are used in this document Italic Monospace lowercase Monospace lowercase mixed with uppercase UPPERCASE MONOSPAC Gl Monospace italic items Italic is used for emphasis and also indicates document names in body text for example see Intel MKL Reference Manual Indicates filenames directory names and pathnames for example libmkl_core a opt intel composerxe 2011 0 004 Indicates e Commands and command line options for example icc myprog c LSMKLPATH ISMKLINCLUDE lmk1l liomp5 lpthread e C C code fragments for example a new double SIZE SIZE Indicates system variables for example SMKLPATH Indicates a parameter in discussions e Routine parameters for example 1da e Makefile parameters for example functions_list When enclosed in angle brackets indicate
75. essors For a detailed description of Intel compiler options including the instruction sets and specific microprocessors they implicate please refer to the Intel Compiler User and Reference Guides under Compiler Options Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel compatible microprocessors depending on the options you select your code and other factors you likely will get extra performance on Intel microprocessors Intel compilers associated libraries and associated development tools may or may not optimize to the same degree for non Intel microprocessors for optimizations that are not unique to Intel microprocessors These optimizations include Intel Streaming SIMD Extensions 2 Intel SSE2 Intel Streaming SIMD Extensions 3 Intel SSE3 and Supplemental Streaming SIMD Extensions 3 Intel SSSE3 instruction sets and other optimizations Intel does not guarantee the availability functionality or effectiveness of any optimization on microprocessors not manufactured by Intel Microprocessor dependent optimizations in this product are intended for use with Intel microprocessors While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non Intel micropro
76. f tsf n i a li SIZE c i SIZE omp_set_num_threads 2 for i 0 i lt SIZE i i for j 0 j lt SIZE j a i SIZE j double i j b i SIZE j double i j c i SIZE j double 0 cblas_dgemm CblasRowMajor CblasNoTrans CblasNoTrans m n k alpha a lda b ldb beta c ldc printf row ta tc n for i 0 i lt 10 i printf Sd tSf tsf n i a li SIZE c i SIZE delete a 50 Managing Performance and Memory 5 KKKKKKK Fortran language kkkkkkx k ROGRAM DGEMM_DIFF_THREADS TEGER N I J RAMETER N 1000 AL 8 A N N B N N C N N AL 8 ALPHA BETA TEGER 8 MKL_MALLOC teger ALLOC_SIZE teger NTHRS LLOC_SIZE 8 N N _PTR MKL_MALLOC ALLOC_SIZE 128 PTR MKL_MALLOC ALLOC_SIZE 128 _PTR MKL_MALLOC ALLOC_SIZE 128 LPHA D F E F b Oe ete tls a 0 Q B o eal H D gt QWProo Il u 0 0 END DO END DO CALL DGEMM N N N N N ALPHA A N B N BETA C N print Row A C DO i 1 10 write 14 F20 8 F20 8 I A 1 1 C 1 1 END DO CALL OMP_SET_NUM_THREADS 1 DO I 1 N DO J 1 N A I J Itd B I J 1 j C I J 0 0 END DO END DO CALL DGEMM N N N N N ALPHA A N B N BETA C N print Row A C DO i 1 10 write 14 F20 8 F20 8 I A 1 1 C 1 1 END DO CALL OMP_SET_NUM_THREADS 2 DO I 1 N DO J 1 N A I J Itd B I J tj C I J 0 0 END DO END DO CALL DGEMM N
77. fault value is 1p64 threading parallel ential Defines whether to use the Intel R Math Kernel Library Intel R MKL in the threaded or sequential mode The default value is parallel export f Specifies the full name of the file that contains the list of entry point functions to be lt file name gt included in the shared object The default name is user_list no extension 42 Linking Your Application with the Intel R Math Kernel Library 4 Parameter Values Description RRE EEE eer Specifies the name of the library to be created By default the names of the created library is mkl_custom so xerbla Specifies the name of the object file lt user_xerbla gt o that contains the user s error lt error handler gt handler The makefile adds this error handler to the library for use instead of the default Intel MKL error handler xerbla If you omit this parameter the native Intel MKL xerbla is used See the description of the xerbla function in the Intel MKL Reference Manual on how to develop your own error handler MKLROOT Specifies the location of Intel MKL libraries used to build the custom shared object By lt mk1 directory gt default the builder uses the Intel MKL installation directory All the above parameters are optional In the simplest case the command line is make ia32 and the missing options have default values This command creates the mkl_custom so library for processors using the
78. ffectiveness of any optimization on microprocessors not manufactured by Intel Microprocessor dependent optimizations in this product are intended for use with Intel microprocessors While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non Intel microprocessors Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements We hope to win your business by striving to offer the best performance of any compiler or library please let us know if you find we do not Notice revision 20101101 Known Limitations of the Intel R Optimized LINPACK Benchmark The following limitations are known for the Intel Optimized LINPACK Benchmark for Linux OS 89 1 0 Intel Math Kernel Library for Linux OS User s Guide e Intel Optimized LINPACK Benchmark is threaded to effectively use multiple processors So in multi processor systems best performance will be obtained with Hyper Threading technology turned off which ensures that the operating system assigns threads to physical processors only e If an incomplete data input file is given the binaries may either hang or fault See the sample data input files and or the extended help for insight into creating a correct data input file Intel R Optimized MP LINPACK Benchmark for Clusters Overview of the Intel R Optimized MP LINPACK Benchmark for Clusters The Intel R Op
79. ftware 8 e SMKLPATH is a user defined variable containing lt mk1 directory gt lib 64 e You use the Intel R Fortran Compiler 10 0 or higher To link with ScaLAPACK for a cluster of systems based on the Intel R 64 architecture use the following link line opt intel mpi 3 0 bin mpiifort lt user files to link gt LSMKLPATH lmkl_scalapack_1p64 lmkl_blacs_intelmpi_1p64 lmkl_intel_1lp64 1mkl_intel_thread lmkl_core liomp5 lpthread To link with Cluster FFT for a cluster of systems based on the Intel R 64 architecture use the following link line opt intel mpi 3 0 bin mpiifort lt user files to link gt W1 start group SMKLPATH libmkl_cdft_core a SMKLPATH libmk1l_blacs_intelmpi_ilp64 a SMKLPATH libmkl_intel_ilp64 a SMKLPATH libmkl_intel_thread a SMKLPATH libmkl_core a W1 end group liomp5 lpthread PO ge RE a a See Also e Linking with ScaLAPACK and Cluster FFTs 81 8 Intel Math Kernel Library for Linux OS User s Guide 82 Programming with Intel R Math Kernel Library in the Eclipse Integrated Development Environment IDE Configuring the Eclipse IDE CDT to Link with Intel R Math Kernel Library This section explains how to configure the Eclipse Integrated Development Environment IDE C C Development Tools CDT 3 x and 4 0 to link with Intel R Math Kernel Library Intel R MKL g TIP After linking your CDT with Intel MKL you can
80. g MKL_DYNAMIC FALSE does not ensure that Intel MKL will use the number of threads that you request The library may have no choice on this number for such reasons as system resources Additionally the library may examine the problem and use a different number of threads than the value suggested For example if you attempt to do a size one matrix matrix multiply across eight threads the library may instead choose to use only one thread because it is impractical to use eight threads in this event Note also that if Intel MKL is called in a parallel region it will use only one thread by default If you want the library to use nested parallelism and the thread within a parallel region is compiled with the same OpenMP compiler as Intel MKL is using you may experiment with setting MKL_DYNAMIC to FALSE and manually increasing the number of threads In general set MKL_DYNAMIC to FALSE only under circumstances that Intel MKL is unable to detect for example to use nested parallelism where the library is already called from a parallel section MKL_DOMAIN_NUM_THREADS The MKL_DOMAIN_NUM_THREADS environment variable suggests the number of threads for a particular function domain MKL_DOMAIN_NUM_THREADS accepts a string value lt MKL env string gt which must have the following format 53 5 Intel Math Kernel Library for Linux OS User s Guide lt MKL env string gt lt MKL domain env string gt lt delimiter gt lt MKL
81. ge even for a powerful cluster MP LINPACK optionally prints information on performance as it proceeds You can also terminate early e Save time by compiling with DENDEARLY DASYOUGO2 and using a negative threshold do not use a negative threshold on the final run that you intend to submit as a Top500 entry Set the threshold in line 13 of the HPL 2 0 input file HPL dat e If you are going to run a problem to completion do it with DASYOUGO 5 Using the quick performance feedback return to step 3 and iterate until you are sure that the performance is as good as possible See Also Options to Reduce Search Time Options to Reduce Search Time Running large problems to completion on large numbers of nodes can take many hours The search space for MP LINPACK is also large not only can you run any size problem but over a number of block sizes grid layouts lookahead steps using different factorization methods and so on It can be a large waste of time to run a large problem to completion only to discover it ran 0 01 slower than your previous best problem Use the following options to reduce the search time DASYOUGO 95 1 0 Intel Math Kernel Library for Linux OS User s Guide e DENDEARLY e DASYOUGO2 Use DASYOUGO2 cautiously because it does have a marginal performance impact To see DGEMM internal performance compile with DASYOUGO2 and DASYOUGO2_DISPLAY These options provide a lot of useful DGEMM
82. ge specific Usage Options Using Language Specific Interfaces with Intel R Math Kernel Library 61 Interface Libraries ANd MOdUIES cicceceece ects eee ee tees e eee cnet nee eee eee eae eanenas 62 Fortran 95 Interfaces to LAPACK and BLAS cceecceceeseeeeeeeeeeeeeaeeeenaeenees 63 Compiler dependent Functions and Fortran 90 MOodules ccccceeeeeeeees 64 Mixed language Programming with the Intel R Math Kernel Library 65 Calling LAPACK BLAS and CBLAS Routines from C C Language ENVIFONMENES vcs covsceov never NEDA E AENA eA ETAS 65 Using Complex Types in C C sssrsssinsrusranrsnnnnrinsnunnanianssnnnnrrn nannane 66 Calling BLAS Functions that Return the Complex Values in C C Code 67 Support for Boost uBLAS Matrix matrix Multiplication ccccceeeeeee eens 69 Invoking Intel R Math Kernel Library Functions from Java Applications sery soi eeses cove eini EE ved etvan NEENA Na a ENT AAA 70 Intel MKL Java ExampleS ssssssssrrssrssssenserrnrrnrnnennennesnerrnnnran 70 Running the Java Examples ccceeceeeee cece eee eee eats teeta ee eeeeenaees 72 Known Limitations of the Java Examples ccccceceseeeeeeeseeeeenneenes 73 Chapter 7 Coding Tips Aligning Data for Consistent ReSUItS ecceee eee e teen eee eee eee e eee ee eens teetaeaes 75 Chapter 8 Working with the Intel R Math Kernel Library Cluster Software Linking with ScaLAPACK and Cluster FF
83. group SMKLPATH libmkl_intel a SMKLPATH libmkl_intel_thread a SMKLPATH libmkl_core a W1 end group liomp5 lpthread See Also e Fortran 95 Interfaces to LAPACK and BLAS e Examples for linking a C application using cluster components e Examples for linking a Fortran application using cluster components e Using the Single Dynamic Library Interface Linking on Intel R 64 Architecture Systems The following examples illustrate linking that uses Intel R compilers The examples use the f Fortran source file C C users should instead specify a cpp C or c C file and replace the ifort linker with icc NOTE If you successfully completed the Setting Environment Variables step of the Getting Started process you can omit ISMKLINCLUDE in all the examples and omit LSMKLPATH in the examples for dynamic linking In these examples MKLPATH SMKLROOT lib intel64 MKLINCLUDE SMKLROOT include 40 Linking Your Application with the Intel R Math Kernel Library 4 Static linking of myprog f and parallel Intel R Math Kernel Library Intel R MKL supporting the LP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE W1 start group SMKLPATH libmkl_intel_l1p64 a SMKLPATH libmkl_intel_thread a SMKLPATH libmkl_core a Wl nd group liomp5 lpthread Dynamic linking of myprog f and parallel Intel MKL supporting the LP64 interface ifort myprog f LSMKLPATH ISMKLINCLUDE lmkl_intel_1lp64 1mkl
84. he threshold parameter in HPL dat to a negative number when testing ENDEARLY It also sometimes gives a better picture to compile with DASYOUGO2 when using DENDEARLY Usage notes on DENDEARLY follow e DENDEARLY stops the problem after a few iterations of DG EMM on the block size the bigger the blocksize the further it gets It prints only 5 or 6 updates whereas DASYOUGO prints about 46 or so output elements before the problem completes e Performance for DASYOUGO and D END EARLY always Starts off at one speed slowly increases and then slows down toward the end because that is what LU does DEND slow down EARLY is likely to terminate before it starts to e DENDEARLY terminates the problem early with an HPL Error exit It means that you need to ignore the missing residual results which are wrong because the problem never completed However you can get an idea what the initial performance was and if it looks good then run the problem to completion without DENDEARLY To avoid the error check you can set HPL s threshold parameter in HPL dat to a negative number e Though DENDEARLY terminates early HPL treats the problem as completed and computes Gflop rating as though the problem ran to completion Ignore this erroneously high rating e The bigger the problem the m
85. he IA 32 architecture and Linux OS Dynamically linked against Intel R MPI 3 2 bin_intel intel64 xhpl_hybrid_intel64 New Prebuilt hybrid binary for the Intel R 64 architecture and Linux OS Statically linked against Intel R MPI 3 2 bin_intel intel64 xhpl_hybrid_intel64_dynamic New Prebuilt hybrid binary for the Intel R 64 and Linux OS Dynamically linked against Intel R MPI 3 2 Next 3 files are prebuilt libraries 92 LINPACK and MP LINPACK Benchmarks 1 0 Directory File in benchmarks mp_linpack lib_hybrid ia32 libhpl_hybrid a lib_hybrid intel64 libhpl_hybrid a Next 18 files refer to run scripts bin_inte bin_inte bin_inte bin_inte 1 ia32 runme_ia32 1 ia32 runme_ia32_dynamic 1 ia32 HPL_serial dat 1 ia32 runme_hybrid_ia32 bin_intel ia32 runme_hybrid_ia32_dynamic bin_intel ia32 HPL_hybrid dat bin_intel intel64 runme_intel64 bin_intel intel64 runme_intel64_dynamic bin_intel intel64 HPL_serial dat bin_intel intel164 runme_hybrid_intel64 bin_intel intel64 runme_hybrid_intel64_dynamic bin_intel intel64 HPL_hybrid dat nodeperf See Also High level Directory Structure C Contents New Prebuilt library with the hybrid version of MP LINPACK for the IA 32 architecture and Intel MPI 3 2 New Prebuilt library with the hybrid version of MP LINPACK for the Intel R 64 architecture and Intel MPI 3 2 New Sample run script for the IA 32 architecture and a pure MPI
86. he software for other problem sizes see the extended help included with the program Extended help can be viewed by running the program executable with the e option xlinpack_xeon32 e xlinpack_xeon64 e The pre defined data input files Lininput_xeon32 and lininput_xeon64 are provided merely as examples Different systems have different number of processors or amount of memory and thus require new input files The extended help can be used for insight into proper ways to change the sample input files Each input file requires at least the following amount of memory lininput_xeon32 2 GB lininput_xeon64 16 GB If the system has less memory than the above sample data input requires you may need to edit or create your own data input files as explained in the extended help Each sample script uses the OMP_NUM_THREADS environment variable to set the number of processors it is targeting To optimize performance on a different number of physical processors change that line appropriately If you run the Intel Optimized LINPACK Benchmark without setting the number of threads it will default to the number of cores according to the OS You can find the settings for this environment variable in the runme_ sample scripts If the settings do not yet match the situation for your machine edit the script Optimization Notice Intel compilers associated libraries and associated development tools may include or utilize options that optimize for i
87. ibraries All needed shared libraries must be visible on all the nodes at run time To achieve this point these libraries by the LD_LIBRARY_PATH environment variable in the bashrc file If the Intel R Math Kernel Library Intel R MKL is installed only on one node link statically when building your Intel MKL applications rather than use shared libraries The Intel R compilers or GNU compilers can be used to compile a program that uses Intel MKL However make sure that the MPI implementation and compiler match up correctly Building ScaLAPACK Tests To build ScaLAPACK tests e For the IA 32 architecture add 1ibmk1_scalapack_core a to your link command e Forthe Intel R 64 architecture add 1ibmk1_scalapack_lp64 a or libmkl_scalapack_ilp64 a depending upon the desired interface 79 8 Intel Math Kernel Library for Linux OS User s Guide Examples for Linking with ScaLAPACK and Cluster FFT This section provides examples of linking with ScaLAPACK and Cluster FFT Note that a binary linked with ScaLAPACK runs the same way as any other MPI application refer to the documentation that comes with your MPI implementation For instance the script mpirun is used in the case of MPICH2 and OpenMPI and a number of MPI processes is set by np In the case of MPICH 2 0 and all Intel MPIs start the daemon before running your application the execution is driven by the script mpiexec For further linking examples see the support we
88. ied in the Release Notes However the library has been successfully used with other compilers as well Intel MKL provides a set of include files to simplify program development by specifying enumerated values and prototypes for the respective functions Calling Intel MKL functions from your application without an appropriate include file may lead to incorrect behavior of the functions See Also e Intel R MKL Include Files Using Code Examples The Intel R Math Kernel Library Intel R MKL package includes code examples located in the examples subdirectory of the installation directory Use the examples to determine e Whether Intel MKL is working on your system e How you should call the library 20 Getting Started 2 e How to link the library The examples are grouped in subdirectories mainly by Intel MKL function domains and programming languages For example the examples spblas subdirectory contains a makefile to build the Sparse BLAS examples and the examples vmlc subdirectory contains the makefile to build the C VML examples Source code for the examples is in the next level sources subdirectory See Also e High level Directory Structure Using the Web based Linking Advisor Use the Intel R Math Kernel Library Intel R MKL Linking Advisor to determine the libraries and options to specify on your link or compilation line The tool is available at http software intel com en us articles intel mkl link line advisor
89. if for example there are two threads to every physical core the thread scheduler may assign two threads to some cores and ignore the other cores altogether If you are using the OpenMP library of the Intel Compiler read the respective User Guide on how to best set the thread affinity interface to avoid this situation For Intel MKL apply the following setting set KMP_AFFINITY granularity fine compact 1 0 See Also e Using Parallelism of the Intel R Math Kernel Library Managing Multi core Performance You can obtain best performance on systems with multi core processors by requiring that threads do not migrate from core to core To do this bind threads to the CPU cores by setting an affinity mask to threads Use one of the following options e OpenMP facilities recommended if available for example the KMP_AFFINITY environment variable using the Intel OpenMP library e A system function as explained below Consider the following performance issue e The system has two sockets with two cores each for a total of four cores CPUs e The two thread parallel application that calls the Intel R Math Kernel Library Intel R MKL FFT happens to run faster than in four threads but the performance in two threads is very unstable The following code example shows how to resolve this issue by setting an affinity mask by operating system means using the Intel R compiler The code calls the system function sched_setaffinity to bind the thr
90. ines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel compatible microprocessors depending on the options you select your code and other factors you likely will get extra performance on Intel microprocessors Intel compilers associated libraries and associated development tools may or may not optimize to the same degree for non Intel microprocessors for optimizations that are not unique to Intel microprocessors These optimizations include Intel Streaming SIMD Extensions 2 Intel SSE2 Intel Streaming SIMD Extensions 3 Intel SSE3 and Supplemental Streaming SIMD Extensions 3 Intel SSSE3 instruction sets and other optimizations Intel does not guarantee the availability functionality or effectiveness of any optimization on microprocessors not manufactured by Intel Microprocessor dependent optimizations in this product are intended for use with Intel microprocessors While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non Intel microprocessors Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements We hope to win your business by striving to offer the best performance of any compiler or library please let us know if you
91. ing LibrarieS sssssssssrsssssrnrrsrrrrnnsrrnnnnerrnnrnrnrrnens 36 Sequential Mode of the Library ssssssssssssssssrrsrrrrrrrrrssrsnerrerrrrrrnns 36 Selecting Libraries in the Threading Layer and RTL 37 Linking with Computational LibrarieS s s ssssssssrrssssrrrsssrrnrnsrrrnnrnrerrnens 38 Linking with Compiler Support RTLS ssssssssssrrrrssrsssrrnnrrrrrrrrrrsrrserseen 38 Linking with System Libraries ccceceeeee eee eee eee eee eee ee eae eee eae 39 LINKING EXAM ple Sc aaa acl ead ede uae deaddoaiendevess pac EET EAEE 39 Linking on IA 32 Architecture SySteMS cceceee eee etree eee eee eee ee eae 39 Linking on Intel R 64 Architecture SYStEMS ccceceseeeeeeeeeeeeeeeeeeeaeeaees 40 Building Custom Shared Objects cecceeeee cece eee eee eee eee teeta eee eae e ee enaees 42 Using the Custom Shared Object Builder cceceeceee eee ee eee eeeeeeeeene ees 42 Specifying a List Of FUNCIONS oreas a a Een iA e a A ETEA Ea aa A a 43 Distributing Your Custom Shared Object ss sssssssssserrrssrrrrrsrerrnsesrrrrsens 44 Chapter 5 Managing Performance and Memory Using Parallelism of the Intel R Math Kernel Library sssssssssssrrrrsssssssrrerrens 45 Threaded Functions and ProblemS ssssssssrrrrssrrrnesrrnrunrrrnnnurnrrnerrrnrns 46 Avoiding Conflicts in the Execution Environment ssssssssrsssssrrrrserrrresren 48 Techniques to Set the Number of ThreadS ssssssss
92. inux OS User s Guide Optimization Notice specific microprocessors they implicate please refer to the Intel Compiler User and Reference Guides under Compiler Options Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel compatible microprocessors depending on the options you select your code and other factors you likely will get extra performance on Intel microprocessors Intel compilers associated libraries and associated development tools may or may not optimize to the same degree for non Intel microprocessors for optimizations that are not unique to Intel microprocessors These optimizations include Intel Streaming SIMD Extensions 2 Intel SSE2 Intel Streaming SIMD Extensions 3 Intel SSE3 and Supplemental Streaming SIMD Extensions 3 Intel SSSE3 instruction sets and other optimizations Intel does not guarantee the availability functionality or effectiveness of any optimization on microprocessors not manufactured by Intel Microprocessor dependent optimizations in this product are intended for use with Intel microprocessors While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non Intel microprocessors Intel recommends that you evaluate oth
93. ion is created in the respective locales For more information see Contents of the Documentation Directories Layered Model Concept Intel R Math Kernel Library Intel R MKL is structured to support multiple compilers and interfaces different OpenMP implementations both serial and multiple threads and a wide range of processors Conceptually Intel MKL can be divided into distinct parts to support different interfaces threading models and core computations 1 Interface Layer 2 Threading Layer 3 Computational Layer You can combine Intel MKL libraries to meet your needs by linking with one library in each part layer by layer Once the interface library is selected the threading library you select picks up the chosen interface and the computational library uses interfaces and OpenMP implementation or non threaded mode chosen in the first two layers To support threading with different compilers one more layer is needed which contains libraries not included in Intel MKL e Compiler support run time libraries RTL The following table provides more details of each layer Layer Description Interface Layer Matches compiled code of your application with the threading and or computational parts of the library This layer provides e LP64 and ILP64 interfaces 27 3 Intel Math Kernel Library for Linux OS User s Guide Layer Description e Compatibility with compilers that return function values differently
94. ipt parameters Script Architecture Addition of a Path Interface required to Fortran 95 Modules optional when applicable optional mklvars_ia32 n a mod n a mklvars_intel64 n a mod 1p64 default ilp64 mklvars ia32 mod 1p64 default intel64 i1p64 Not applicable For example e The command mkivars_ia32 sh sets environment variables for the IA 32 architecture and adds no path to the Fortran 95 modules e The command mkivars_intel64 sh mod ilp64 sets environment variables for the Intel R 64 architecture and adds the path to the Fortran 95 modules for the ILP64 interface to the FPATH environment variable e The command mkivars sh intel64 mod sets environment variables for the Intel R 64 architecture and adds the path to the Fortran 95 modules for the LP64 interface to the FPATH environment variable NOTE Supply the parameter specifying the architecture first if it is needed Values of the other two parameters can be listed in any order See Also e High level Directory Structure e Intel R MKL Interface Libraries and Modules e Fortran 95 Interfaces to LAPACK and BLA e Setting the Number of Threads Using an OpenMP Environment Variable Automating the Process of Setting Environment Variables To automate setting of the INCLUDE MKLROOT LD_LIBRARY_PATH MANPATH LIBRARY_PATH CPATH FPATH and NLSPATH environment variables add mk1lvars sh to your shell profile so that each time you login the script automatic
95. jor order Fortran style B Row major order C style For example if a two dimensional matrix A of size m x n is stored densely in a one dimensional array B you can access a matrix element like this A il j B i n j inc 450 ss m l j 0 1 A i j B j m i in Fortran i 1 m j l n 65 6 Intel Math Kernel Library for Linux OS User s Guide When calling LAPACK or BLAS routines from C be aware that because the Fortran language is case insensitive the routine names can be both upper case or lower case with or without the trailing underscore For example the following names are equivalent e LAPACK dgetrf DGETRF dgetrf_ and DGETRF_ e BLAS dgemm DGEMM dgemm_ and DGEMM_ See Example Calling a Complex BLAS Level 1 Function from C on how to call BLAS routines from C See also the Intel R MKL Reference Manual for a description of the C interface to LAPACK functions CBLAS Instead of calling BLAS routines from a C language program you can use the CBLAS interface CBLAS is a C style interface to the BLAS routines You can call CBLAS routines using regular C style calls Use the mk1 h header file with the CBLAS interface The header file specifies enumerated values and prototypes of all the functions It also determines whether the program is being compiled with a C compiler and if it is the included file will be correct for use with C compilation Example Using
96. kl_intel_sp2dp a kl_blas95_lp64 a kl_blas95_ilp64 a kl_lapack95_lp64 a kl_lapack95_ilp64 a kl_gf_lp64 a kl_gf_ilp64 a Threading layer libmkl_intel_thread a libmkl_gnu_thread a libmkl_pgi_thread a libmkl_sequential a 108 Contents LP64 interface library for the Intel compilers ILP64 interface library for the Intel compilers SP2DP interface library for the Intel compilers Fortran 95 interface library for BLAS for the Intel R Fortran compiler Supports the LP64 interface Fortran 95 interface library for BLAS for the Intel R Fortran compiler Supports the ILP64 interface Fortran 95 interface library for LAPACK for the Intel R Fortran compiler Supports the LP64 interface Fortran 95 interface library for LAPACK for the Intel R Fortran compiler Supports the ILP64 interface LP64 interface library for the GNU Fortran compilers ILP64 interface library for the GNU Fortran compilers Threading library for the Intel compilers Threading library for the GNU Fortran and C compilers Threading library for the PGI compiler Sequential library Directory Structure in Detail C File Computational layer libmkl_core a libmkl_solver_lp64 a lilmk1_solver_lp64_sequential a libmkl_solver_ilp64 a lilrkl_solver_ilp64_sequential a libmkl_scalapack_lp64 a libmkl_scalapack_ilp64 a libmkl_cdft_core a Run time Libraries RTL libmkl_blacs_1lp64 a libmkl_blacs_ilp64 a libmkl_blacs_intelmpi_lp
97. le and double precision However the wrappers used in the examples do not e Demonstrate the use of large arrays gt 2 billion elements e Demonstrate processing of arrays in native memory 70 Language specific Usage Options 6 e Check correctness of function parameters e Demonstrate performance optimizations The examples use the Java Native Interface JNI developer framework to bind with Intel MKL The JNI documentation is available from http java sun com javase 6 docs technotes guides jni The Java example set includes JNI wrappers that perform the binding The wrappers do not depend on the examples and may be used in your Java applications The wrappers for CBLAS FFT VML VSL RNG and ESSL like convolution and correlation functions do not depend on each other To build the wrappers just run the examples The makefile builds the wrapper binaries After running the makefile you can run the examples which will determine whether the wrappers were built correctly As a result of running the examples the following directories will be created in lt mkl1 directory gt examples java e docs e include e classes e bin e _results The directories docs include classes and bin will contain the wrapper binaries and documentation the directory _results will contain the testing results For a Java programmer the wrappers are the following Java classes e com intel mk1 CBLAS e com intel mk1 DFTI e com intel mkl E
98. le below helps explain what threading library and RTL you should choose under different scenarios when using Intel MKL static cases only Compiler Application Threading Layer RTL Recommended Comment Threaded Intel Does not librkl_intel thread a libiomp5 so matter PGI Yes libmkl_pgi_thread a PGI supplied Use of libmkl_sequential a or removes threading from Intel libmk1l_sequential a MKL calls PGI No liorkl_intel_thread a libiomp5 so PGI No libmkl_pgi_thread a PGI supplied PGI No libmkl_sequential a None gnu Yes libmkl_gnu_thread a libiomp5 so or GNU libiomp5 offers superior OpenMP run time library scaling performance gnu Yes libmkl_sequential a None gnu No libmkl_intel thread a libiomp5 so other Yes libmkl_sequential a None other No librkl_intel_thread a libiomp5 so 37 4 Intel Math Kernel Library for Linux OS User s Guide Linking with Computational Libraries If you are not using the Intel R Math Kernel Library Intel R MKL cluster software you need to link your application with only one computational library e libmkl_core a in case of static linking e libmkl_core so in case of dynamic linking ScaLAPACK and Cluster FFT require more computational libraries The following table lists computational libraries that you must list on the link line for these function domains Function IA 32 Architecture IA 32 Architecture Intel R 64 Intel R 64 domain Static Linking Dynamic Linking Architecture Archit
99. lean INSTALL DIR lt user dir gt N CAUTION Even if you have administrative rights avoid setting INSTALL_DIR Or INSTALL_DIR lt mkl directory gt ina build or clean command above because these settings replace or delete the Intel MKL prebuilt Fortran 95 library and modules Compiler dependent Functions and Fortran 90 Modules Compiler dependent functions occur whenever the compiler inserts into the object code function calls that are resolved in its run time library RTL Linking of such code without the appropriate RTL will result in undefined symbols Intel R Math Kernel Library Intel R MKL has been designed to minimize RTL dependencies In cases where RTL dependencies might arise the functions are delivered as source code and you need to compile the code with whatever compiler you are using for your application In particular Fortran 90 modules result in the compiler specific code generation requiring RTL support Therefore Intel MKL delivers these modules compiled with the Intel R compiler along with source code to be used with different compilers 64 Language specific Usage Options 6 Mixed language Programming with the Intel R Math Kernel Library Appendix A Intel R Math Kernel Library Language Interfaces Support lists the programming languages supported for each Intel R Math Kernel Library Intel R MKL function domain However you can call Intel MKL routines from different language environments
100. lers and libraries are excellent choices to assist in obtaining the best performance on Intel and non Intel microprocessors Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements We hope to win your business by striving to offer the best performance of any compiler or library please let us know if you find we do not Notice revision 20101101 Linking Quick Start To link with Intel R Math Kernel Library Intel R MKL choose one library from the Interface layer one library from the Threading layer one and typically the only library from the Computational layer and if necessary add run time libraries The following table lists Intel MKL libraries to link with your application Interface layer Threading layer Computational RTL layer IA 32 libmkl_intel a liorkl_intel_thread a libmkl_core a libiomp5 so architecture static linking 31 4 Intel Math Kernel Library for Linux OS User s Guide Interface layer Threading layer Computational RTL layer IA 32 libmkl_rt so architecture dynamic linking Intel R 64 libmkl_intel_lp64 a Libmkl_intel_thread a libmkl_core a libiomp5 so architecture static linking Intel R 64 libmkl_rt so architecture dynamic linking For exceptions and alternatives to the libraries listed above see Selecting Libraries to Link See Also e Using the Web based Linking Advisor e Using the Single Dynamic Library Interface e Li
101. make sure the LD_LIBRARY_PATH environment variable is defined correctly See Also e Scripts to Set Environment Variables Linking with System Libraries To use the Intel R Math Kernel Library Intel R MKL FFT Trigonometric Transform or Poisson Laplace and Helmholtz Solver routines link in the math support system library by adding 1m to the link line On Linux OS the libiomp library relies on the native pthread library for multi threading Any time 1ibiomp is required add lpthread to your link line afterwards the order of listing libraries is important Linking Examples See Also e Using the Web based Linking Advisor e Examples for Linking with ScaLAPACK and Cluster FFT Linking on IA 32 Architecture Systems The following examples illustrate linking that uses Intel R compilers The examples use the f Fortran source file C C users should instead specify a cpp C or c C file and replace the ifort linker with icc NOTE If you successfully completed the Setting Environment Variables step of the Getting Started process you can omit ISMKLINCLUDE in all the examples and omit LSMKLPATH in the examples for dynamic linking In these examples MKLPATH SMKLROOT 1lib ia32 MKLINCLUDE SMKLROOT include e Static linking of myprog f and parallel Intel R Math Kernel Library Intel R MKL ifort myprog f LSMKLPATH ISMKLINCLUDE W1 start group SMKLPATH libmkl_intel a SMKLPATH libmkl_intel_th
102. mark The Intel Optimized LINPACK Benchmark for Linux OS contains the following files located in the benchmarks linpack subdirectory in the Intel R Math Kernel Library Intel R MKL directory File in Description benchmarks linpack linpack_xeon32 The 32 bit program executable for a system based on Intel R Xeon R processor or Intel R Xeon R processor MP with or without Streaming SIMD Extensions 3 SSE3 linpack_xeon64 The 64 bit program executable for a system with Intel R Xeon R processor using Intel R 64 architecture runme_xeon32 A sample shell script for executing a pre determined problem set for linpack_xeon32 OMP_NUM_THREADS set to 2 processors runme_xeon64 A sample shell script for executing a pre determined problem set for linpack_xeon64 OMP_NUM_THREADS set to 4 processors lininput_xeon32 Input file for pre determined problem for the runme_xeon32 script lininput_xeon64 Input file for pre determined problem for the runme_xeon64 script lin_xeon32 txt Result of the runme_xeon32 script execution lin_xeon64 txt Result of the runme_xeon64 script execution help lpk Simple help file xhelp lpk Extended help file See Also e High level Directory Structure Running the Software To obtain results for the pre determined sample problem sizes on a given system type one of the following as appropriate runme_xeon32 runme_xeon64 88 LINPACK and MP LINPACK Benchmarks 1 0 To run t
103. metic functions do not support ILP64 See Also e High level Directory Structure e Intel R MKL Include Files e Language Interfaces Support by Function Domain e Layered Model Concept e Directory Structure in Detail Linking with Fortran 95 Interface Libraries The libmk1l_blas95 aand libmk1_lapack95 a libraries contain Fortran 95 interfaces for BLAS and LAPACK respectively which are compiler dependent In the Intel R Math Kernel Library Intel R MKL package they are prebuilt for the Intel R Fortran compiler If you are using a different compiler build these libraries before using the interface See Also e Fortran 95 Interfaces to LAPACK and BLAS e Compiler dependent Functions and Fortran 90 Modules Linking with Threading Libraries Sequential Mode of the Library You can use Intel R Math Kernel Library Intel R MKL in a sequential non threaded mode In this mode Intel MKL runs unthreaded code However it is thread safe except the LAPACK deprecated routine lacon which means that you can use it in a parallel region in your OpenMP code The sequential mode requires no compatibility OpenMP run time library and does not respond to the environment variable OMP_NUM_THREADS or its Intel MKL equivalents You should use the library in the sequential mode only if you have a particular reason not to use Intel MKL threading The sequential mode may be helpful when using Intel MKL with programs threaded with some non Intel c
104. ml1 90 Vector Statistical Functions mkl_vsl 77 mkl_vsl_functions h mkl_vsl 90 Fourier Transform Functions mkl_dfti f 90 mkl_dfti h Cluster Fourier Transform Functions mkl_cdft 90 mkl_cdft h Partial Differential Equations Support Routines Trigonometric Transforms mkl_trig_transforms f90 mkl_trig_transforms h Poisson Solvers mkl_poisson f90 mkl_poisson h GMP interface mkl_gmp h Support functions mkl_service f90 mkl_service h mkl_service fi Memory allocation routines i_malloc h MKL examples interface mkl_example h See Also e Language Interfaces Support by Function Domain 101 A Intel Math Kernel Library for Linux OS User s Guide 102 Support for Third Party Interfaces GMP Functions Intel R Math Kernel Library Intel R MKL implementation of GMP arithmetic functions includes arbitrary precision arithmetic operations on integer numbers The interfaces of such functions fully match the GNU Multiple Precision GMP Arithmetic Library For specifications of these functions please see http software intel com en us sites products documentation hpc mkl gnump index htm If you currently use the GMP library you need to modify INCLUDE statements in your programs to mk1_gmp h FFTW Interface Support Intel R Math Kernel Library Intel R MKL offers two collections of wrappers for the FFTW interface www fftw org The wrappers are the superstructure of FFTW to be used for calling the Intel
105. mp h gt include lt mkl h gt mkl_set_num_threads 1 k kkk k Fortran language x call mkl_set_num_threads 1 See the Intel MKL Reference Manual for the detailed description of the threading control functions their parameters calling syntax and more code examples MKL_DYNAMIC The MKL_DYNAMIC environment variable enables the Intel R Math Kernel Library Intel R MKL to dynamically change the number of threads Gl The default value of MKL_DYNAMIC iS TRUE regardless of OMP_DYNAMIC whose default value may be FALSI When MKL_DYNAMIC iS TRUE Intel MKL tries to use what it considers the best number of threads up to the maximum number you specify For example MKL_DYNAMIC Set to TRUE enables optimal choice of the number of threads in the following cases e Ifthe requested number of threads exceeds the number of physical cores perhaps because of hyper threading and MKL_DYNAMIC is not changed from its default value of TRUE Intel MKL will scale down the number of threads to the number of physical cores e Ifyou are able to detect the presence of MPI but cannot determine if it has been called in a thread safe mode it is impossible to detect this with MPICH 1 2 x for instance and MKL_DYNAMIC has not been changed from its default value of TRUE Intel MKL will run one thread When MKL_DYNAMIC S FALSE Intel MKL tries not to deviate from the number of threads the user requested However settin
106. mple Using CBLAS Interface Instead of Calling BLAS Directly from C This example uses CBLAS include mkl h typedef struct double re double im complex16 extern C void cbhlas_zdotc_sub const int const complexl 6 const int const complexl6 const int const complex1l6 define N 5 void main 68 Language specific Usage Options 6 int n inca 1 incb 1 i complexl6 a N b N c n N for i 0 i lt nj itt a i re double i a i im double i 2 0 b i re double n i b i im double i 2 0 chlas_zdotc_sub n a inca b inch amp c printf The complex dot product is 6 2f 6 2f n c re c im Support for Boost uBLAS Matrix matrix Multiplication If you are used to uBLAS you can perform BLAS matrix matrix multiplication in C using the Intel R Math Kernel Library Intel R MKL substitution of Boost uBLAS functions UBLAS is the Boost C open source library that provides BLAS functionality for dense packed and sparse matrices The library uses an expression template technique for passing expressions as function arguments which enables evaluating vector and matrix expressions in one pass without temporary matrices uBLAS provides two modes e Debug safe mode default Checks types and conformance e Release fast mode Does not check types and conformance To enable this mode use the NDEBUG preprocessor symbol The documenta
107. ng feature Memory Renaming Intel MKL memory management by default uses standard C run time memory functions to allocate or free memory These functions can be replaced using memory renaming Intel MKL accesses the memory functions by pointers i_malloc i_free i_calloc and i_realloc which are visible at the application level These pointers initially hold addresses of the standard C run time memory functions malloc free calloc and realloc respectively You can programmatically redefine values of these pointers to the addresses of your application s memory management functions Redirecting the pointers is the only correct way to use your own set of memory management functions If you call your own memory functions without redirecting the pointers the memory will get managed by two independent memory management packages which may cause unexpected memory issues How to Redefine Memory Functions To redefine memory functions use the following procedure 1 Include the i_malloc h header file in your code This header file contains all declarations required for replacing the memory allocation functions The header file also describes how memory allocation can be replaced in those Intel libraries that support this feature 2 Redefine values of pointers i_malloc i_free i_calloc and i_realloc prior to the first call to MKL functions as shown in the following example include i_malloc h i_malloc my_malloc i_calloc my_calloc
108. nstruction sets that are available in both Intel and non Intel microprocessors for example SIMD instruction sets but do not optimize equally for non Intel microprocessors In addition certain compiler options for Intel compilers including some that are not specific to Intel micro architecture are reserved for Intel microprocessors For a detailed description of Intel compiler options including the instruction sets and specific microprocessors they implicate please refer to the Intel Compiler User and Reference Guides under Compiler Options Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel compatible microprocessors depending on the options you select your code and other factors you likely will get extra performance on Intel microprocessors Intel compilers associated libraries and associated development tools may or may not optimize to the same degree for non Intel microprocessors for optimizations that are not unique to Intel microprocessors These optimizations include Intel Streaming SIMD Extensions 2 Intel SSE2 Intel Streaming SIMD Extensions 3 Intel SSE3 and Supplemental Streaming SIMD Extensions 3 Intel SSSE3 instruction sets and other optimizations Intel does not guarantee the availability functionality or e
109. nterface Identify whether and how your application is threaded e Threaded with the Intel R compiler e Threaded with a third party compiler e Not threaded Reason The compiler you use to thread your application determines which threading library you should link with your application For applications threaded with a third party compiler you may need to use Intel MKL in the sequential mode for more information see Sequential Mode of the Library and Linking with Threading Libraries Determine the number of threads you want Intel MKL to use Reason Intel MKL is based on the OpenMP threading By default the OpenMP software sets the number of threads that Intel MKL uses If you need a different number you have to set it yourself using one of the available mechanisms For more information see Using Parallelism of the Intel R Math Kernel Library Decide which linking model is appropriate for linking your application with Intel MKL libraries e Static e Dynamic Reason The link line syntax and libraries for static and dynamic linking are different For the list of link libraries for static and dynamic models linking examples and other relevant topics like how to save disk space by creating a custom dynamic library see Linking Your Application with the Intel R Math Kernel Library Getting Started 2 MPI used Decide what MPI you will use with the Intel MKL cluster software You are strongly encouraged to use Intel R MPI 3
110. nterface module for BLAS BLAS95 lapack95 mod Fortran 95 interface module for LAPACK LAPACK95 95_precision mod Fortran 95 definition of precision parameters for BLAS95 and LAPACK95 mk195_blas moa Fortran 95 interface module for BLAS BLAS95 identical to blas95 mod To be removed in one of the future releases mk195_lapack mod Fortran 95 interface module for LAPACK LAPACK95 identical to lapack95 mod To be removed in one of the future releases mk195_precision mod Fortran 95 definition of precision parameters for BLAS95 and LAPACK95 identical to 95 _precision mod To be removed in one of the future releases mkl_service mod Fortran 95 interface module for Intel MKL support functions 1Prebuilt for the Intel R Fortran compiler 2FFTW3 interfaces are integrated with Intel MKL Look into lt mk1 directory gt interfaces fftw3x makefile for options defining how to build and where to place the standalone library with the wrappers See Also e Fortran 95 Interfaces to LAPACK and BLAS Fortran 95 Interfaces to LAPACK and BLAS Fortran 95 interfaces are compiler dependent Intel R Math Kernel Library Intel R MKL provides the interface libraries and modules precompiled with the Intel R Fortran compiler Additionally the Fortran 95 interfaces and wrappers are delivered as sources For more information see Compiler dependent Functions and Fortran 90 Modules If you are using a different compiler build the appropria
111. nux OS User s Guide The following example shows both C and Fortran code examples To run this example in the C language use the omp h header file from the Intel R compiler package If you do not have the Intel compiler but wish to explore the functionality in the example use Fortran API for omp_set_num_threads rather than the C version For example omp_set_num_threads_ amp i_one Ne ERK eke language KKKKKKK include omp h include mkl h include lt stdio h gt define SIZE 1000 void main int args char argv double a b c a new double SIZE SIZE b new double SIZE SIZE c new double SIZE SIZE double alpha 1 beta 1 int m SIZE n SIZE k SIZE lda SIZE ldb SIZE ldc SIZE i 0 j 0 char transa n transb n for i 0 i lt SIZE i for j 0 j lt SIZE j a i SIZE j double b i SIZE j double c i SIZE j double 0 cblas_dgemm CblasRowMajor CblasNoTrans CblasNoTrans m n k alpha a lda b ldb beta c ldc printf row ta tc n for i 0 i lt 10 i printf Sd t f tsf n i a li SIZE c i SIZE omp_set_num_threads for i 0 i lt SIZE i 3 for j 0 j lt SIZE j a i SIZE j double i j b i SIZE j double i j c i SIZE j double 0 cblas_dgemm CblasRowMajor CblasNoTrans CblasNoTrans m n k alpha a lda b ldb beta c ldc printf row ta tc n for i 0 i lt 10 i printf Sd t
112. o Intel AppUp Intel Atom Intel Atom Inside Intel Core Intel Inside Intel Inside logo Intel NetBurst Intel NetMerge Intel NetStructure Intel SingleDriver Intel SpeedStep Intel Sponsors of Tomorrow the Intel Sponsors of Tomorrow logo Intel StrataFlash Intel Viiv Intel vPro Intel XScale InTru the InTru logo InTru soundmark Itanium Itanium Inside MCS MMX Moblin Pentium Pentium Inside skoool the skoool logo Sound Mark The Journey Inside vPro Inside VTune Xeon and Xeon Inside are trademarks of Intel Corporation in the U S and other countries Other names and brands may be claimed as the property of others Java and all Java based trademarks and logos are trademarks or registered trademarks of Sun Microsystems Inc in the U S and other countries Copyright C 2006 2011 Intel Corporation All rights reserved Optimization Notice Intel compilers associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non Intel microprocessors for example SIMD instruction sets but do not optimize equally for non Intel microprocessors In addition certain compiler options for Intel compilers including some that are not specific to Intel micro architecture are reserved for Intel microprocessors For a detailed description of Intel compiler options including the instruction sets and Intel Math Kernel Library for L
113. ompilers or in other situations where you need a non threaded version of the library for instance in some MPI cases To set the sequential mode in the Threading layer choose the sequential library Add the POSIX threads library pthread to your link line for the sequential mode because the sequential library depends on pthread See Also e Directory Structure in Detail e Using Parallelism of the Intel R Math Kernel Library 36 Linking Your Application with the Intel R Math Kernel Library 4 e Avoiding Conflicts in the Execution Environment e Linking Examples Selecting Libraries in the Threading Layer and RTL Several compilers that Intel R Math Kernel Library Intel R MKL supports use the OpenMP threading technology Intel MKL supports implementations of the OpenMP technology that these compilers provide To make use of this support you need to link with the appropriate library in the Threading Layer and Compiler Support Run time Library RTL Threading Layer Each Intel MKL threading library contains the same code compiled by the respective compiler Intel gnu and PGI compilers on Linux OS RTL This layer includes libiomp the compatibility OpenMP run time library of the Intel compiler In addition to the Intel compiler 1ibiomp provides support for one additional threading compiler on Linux OS GNU That is a program threaded with a GNU compiler can safely be linked with Intel MKL and libiomp The tab
114. omplex types MKL_Complex8s and MKL_Complex16 which are structures equivalent to the Fortran complex types COMPLEX 4 and COMPLEX 8 respectively The MKL_Complex8 and MKL_Complex16 types are defined in the mk1_types h header file You can use these types to define complex data You can also redefine the types with your own types before including the mkl_types h header file The only requirement is that the types must be compatible with the Fortran complex layout that is the complex type must be a pair of real numbers for the values of real and imaginary parts For example you can use the following definitions in your C code define MKL_Complex8 std complex lt float gt 66 Language specific Usage Options 6 and define MKL_Complexl6 std complex lt double gt See Example Calling a Complex BLAS Level 1 Function from C for details You can also define these types in the command line DMKL_Complex8 std complex lt float gt DMKL_Complex16 std complex lt double gt Calling BLAS Functions that Return the Complex Values in C C Code Complex values that functions return are handled differently in C and Fortran Because BLAS is Fortran style you need to be careful when handling a call from C to a BLAS function that returns complex values However in addition to normal function calls Fortran enables calling functions as though they were subroutines which provides a mechanism for returning the complex value correc
115. on Intel microprocessors In addition certain compiler options for Intel compilers including some that are not specific to Intel micro architecture are reserved for Intel microprocessors For a detailed description of Intel compiler options including the instruction sets and specific microprocessors they implicate please refer to the Intel Compiler User and Reference Guides under Compiler Options Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel compatible microprocessors depending on the options you select your code and other factors you likely will get extra performance on Intel microprocessors Intel compilers associated libraries and associated development tools may or may not optimize to the same degree for non Intel microprocessors for optimizations that are not unique to Intel microprocessors These optimizations include Intel Streaming SIMD Extensions 2 Intel SSE2 Intel Streaming SIMD Extensions 3 Intel SSE3 and Supplemental Streaming SIMD Extensions 3 Intel SSSE3 instruction sets and other optimizations Intel does not guarantee the availability functionality or effectiveness of any optimization on microprocessors not manufactured by Intel Microprocessor dependent optimizations in this product are intended for
116. or Intel compilers including some that are not specific to Intel micro architecture are reserved for Intel microprocessors For a detailed description of Intel compiler options including the instruction sets and specific microprocessors they implicate please refer to the Intel Compiler User and Reference Guides under Compiler Options Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel compatible microprocessors depending on the options you select your code and other factors you likely will get extra performance on Intel microprocessors Intel compilers associated libraries and associated development tools may or may not optimize to the same degree for non Intel microprocessors for optimizations that are not unique to Intel microprocessors These optimizations include Intel Streaming SIMD Extensions 2 Intel SSE2 Intel Streaming SIMD Extensions 3 Intel SSE3 and Supplemental Streaming SIMD Extensions 3 Intel SSSE3 instruction sets and other optimizations Intel does not guarantee the availability functionality or effectiveness of any optimization on microprocessors not manufactured by Intel Microprocessor dependent optimizations in this product are intended for use with Intel microprocessors While Intel believes our compi
117. or effectiveness of any optimization on microprocessors not manufactured by Intel Microprocessor dependent optimizations in this product are intended for use with Intel microprocessors While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non Intel microprocessors Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements We hope to win your business by striving to offer the best performance of any compiler or library please let us know if you find we do not Notice revision 20101101 10 Getting Help and Support Intel provides a support web site that contains a rich repository of self help information including getting started tips known product issues product errata license information user forums and more Visit the Intel R MKL support website at http www intel com software products support The Intel R Math Kernel Library documentation integrates into the Eclipse integrated development environment IDE See Getting Assistance for Programming in the Eclipse IDE 11 Intel Math Kernel Library for Linux OS User s Guide 12 Notational Conventions The following term is used in reference to the operating system Linux OS This term refers to information that is valid on all supported Linux operating systems The following notations are used to refer to Intel R Math K
118. or example SIMD instruction sets but do not optimize equally for non Intel microprocessors In addition certain compiler options for Intel compilers including some that are not specific to Intel micro architecture are reserved for Intel microprocessors For a detailed description of Intel compiler options including the instruction sets and specific microprocessors they implicate please refer to the Intel Compiler User and Reference Guides under Compiler Options Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel compatible microprocessors depending on the options you select your code and other factors you likely will get extra performance on Intel microprocessors Intel Math Kernel Library for Linux OS User s Guide Optimization Notice Intel compilers associated libraries and associated development tools may or may not optimize to the same degree for non Intel microprocessors for optimizations that are not unique to Intel microprocessors These optimizations include Intel Streaming SIMD Extensions 2 Intel SSE2 Intel Streaming SIMD Extensions 3 Intel SSE3 and Supplemental Streaming SIMD Extensions 3 Intel SSSE3 instruction sets and other optimizations Intel does not guarantee the availability functionality
119. ore TM Duo and Intel R Core TM Solo processors for which mkl_p4p so is intended Kernel library for the Intel R Core TM i7 processors VML VSL part of default kernel for old Intel R Pentium R processors VML VSL default kernel for newer Intel R architecture processors VML VSL part of Pentium R 4 processor kernel VML VSL for processors based on the Intel R Core TM microarchitecture VML VSL for 45nm Hi k Intel R Core TM 2 and Intel Xeon R processor families VML VSL for the Intel R Core TM i7 processors VML VSL for Pentium R 4 processor with Streaming SIMD Extensions 3 SSE3 VML VSL optimized for the Intel R Advanced Vector Extensions Intel R AVX ScaLAPACK routines Cluster version of FFT functions 107 C Intel Math Kernel Library for Linux OS User s Guide File Run time Libraries RTL libmkl_blacs_intelmpi so locale en_US mkl_msg cat locale ja_JP mkl_msg cat Contents BLACS routines supporting Intel MPI and MPICH2 Catalog of Intel R Math Kernel Library Intel R MKL messages in English Catalog of Intel MKL messages in Japanese Available only if the Intel R MKL package provides Japanese localization Please see the Release Notes for this information Detailed Structure of the Intel R 64 Architecture Directories Static Libraries in the 1ib inte164 Directory File Interface layer libm libm libm libm libm libm libm libm libm kl_intel_lp64 a kl_intel_ilp64 a
120. ore accurately the last update that DENDEARLY returns is close to what happens when the problem runs to completion DENDEARLY is a poor approximation for small problems It is for this reason that you are suggested to use END EARLY in conjunction with ASYOUGO2 because ASYOUGO2 reports actual DGEMM performance which can be a closer approximation to problems just starting 96 LINPACK and MP LINPACK Benchmarks 1 0 DASYOUGO2 DASYOUGO2 gives detailed single node DGEMM performance information It captures all DGEMM calls if you use Fortran BLAS and records their data Because of this the routine has a marginal intrusive overhead Unlike DASYOUGO which is quite non intrusive DASYOUGO2 interrupts every DGEMM call to monitor its performance You should beware of this overhead although for big problems it is less than 0 1 Here is a sample ASYOUGO2 output the first 3 non intrusive numbers can be found in ASYOUGO and ENDEARLY so it suffices to describe these numbers here Col 001280 Fract 0 050 Mflops 42454 99 DT 9 5 DF 34 1 DMF 38322 78 The problem size was N 16000 with a block size of 128 After 10 blocks that is 1280 columns an output was sent to the screen Here the fraction of columns completed is 1280 16000 0 08 Only up to 40 outputs are printed at various places through the matrix decomposition fractions 0 005 0 010 0 015 0 020 0 025 0 030 0 035 0 040
121. ot using the ILP64 interface To migrate to ILP64 or write new code for ILP64 use appropriate types for parameters of the Intel MKL functions and subroutines Integer Types 32 bit integers Fortran INTEGER 4 or INTEGER KIND 4 Universal integers for ILP64 LP64 INTEGER e 64 bit for ILP64 e 32 bit otherwise without specifying KIND Universal integers for ILP64 LP64 INTEGER 8 or INTEGER KIND 8 e 64 bit integers FFT interface integers for INTEGER ILP64 LP64 without specifying KIND Browsing the Intel MKL Include Files C or C int MKL_INT MKL_INT64 MKL_LONG The Reference Manual does not explain which integer parameters of a function become 64 bit and which remain 32 bit for ILP64 To get to know this browse the include files examples and tests for the ILP64 interface details Some function domains that support only a Fortran interface provide header files for C C in the include directory Such h files enable using a Fortran binary interface from C C code These files can also be used to understand the ILP64 usage 35 4 Intel Math Kernel Library for Linux OS User s Guide Limitations All Intel MKL function domains support ILP64 programming with the following exceptions e FFTW interfaces to Intel MKL e FFTW 2 x wrappers do not support ILP64 e FFTW 3 2 wrappers support ILP64 by a dedicated set of functions plan_guru 4 e GMP arith
122. pend on the OpenMP libraries used with the compiler to set the default number For the threading layer based on the Intel R compiler libmkl_intel_thread a this value is the number of CPUs according to the OS A CAUTION Avoid over prescribing the number of threads which may occur for instance when the number of MPI ranks per node and the number of threads per node are both greater than one The product of MPI ranks per node and the number of threads per node should not exceed the number of physical cores per node The best way to set an environment variable such as OMP_NUM_THREADS is your login environment Remember that changing this value on the head node and then doing your run as you do on a shared memory SMP system does not change the variable on all the nodes because mpirun starts a fresh default shell on all of the nodes To change the number of threads on all the nodes in bashrc add a line at the top as follows OMP_NUM_THREADS 1 export OMP_NUM_THREADS You can run multiple CPUs per node using MPICH To do this build MPICH to enable multiple CPUs per node Be aware that certain MPICH applications may fail to work perfectly in a threaded environment see the Known Limitations section in the Release Notes If you encounter problems with MPICH and setting of the number of threads is greater than one first try setting the number of threads to one and see whether the problem persists See Also Using Shared L
123. performance information at the cost of around 0 2 performance loss If you want to use the old HPL simply omit these options and recompile from scratch To do this try make arch lt arch gt clean_arch_all DASYOUGO DASYOUGO gives performance data as the run proceeds The performance always starts off higher and then drops because this actually happens in LU decomposition a decomposition of a matrix into a product of a lower L and upper U triangular matrices The ASYOUGO performance estimate is usually an overestimate because the LU decomposition slows down as it goes but it gets more accurate as the problem proceeds The greater the lookahead step the less accurate the first number may be ASYOUGO tries to estimate where one is in the LU decomposition that MP LINPACK performs and this is always an overestimate as compared to ASYOUGO2 which measures actually achieved DGEMM performance Note that the ASYOUGO output is a subset of the information that ASYOUGO2 provides So refer to the description of the output DENDEARLY DASYOUGO2 option below for the details of the DENDEARLY t erminates the problem after a few steps so that you can set up 10 or 20 HPL runs without monitoring them see how they all do and then only run the fastest ones to completion DENDEARLY assumes DASYOUGO You do not need to define both although it doesn t hurt To avoid the residual check for a problem that terminates early set t
124. performance of the FFT functions The addresses of the first elements of arrays and the leading dimension values in bytes n element_size of two dimensional arrays should be divisible by cache line size which equals e 32 bytes for the Intel R Pentium R III processors e 64 bytes for the Intel R Pentium R 4 processors and processors using Intel R 64 architecture Hardware Configuration Tips Dual Core Intel R Xeon R processor 5100 series systems To get the best performance with the Intel R Math Kernel Library Intel R MKL on Dual Core Intel R Xeon R processor 5100 series systems enable the Hardware DPL streaming data Prefetcher functionality of this processor To configure this functionality use the appropriate BIOS settings as described in your BIOS documentation The use of Hyper Threading Technology Hyper Threading Technology HT Technology is especially effective when each thread performs different types of operations and when there are under utilized resources on the processor However Intel MKL fits neither of these criteria because the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread You may obtain higher performance by disabling HT Technology 56 Managing Performance and Memory 5 If you run with HT enabled performance may be especially impacted if you run on fewer threads than physical cores Moreover
125. r ILP64 interface respectively e libmkl_intel_lp64 a or libmkl_intel_ilp 64 a for static linking e libmkl_intel_1p64 so or libmkl1_intel_ilp64 so for dynamic linking The ILP64 interface provides for the following e Support large data arrays with more than 231 1 elements e Enable compiling your Fortran code with the i8 compiler option The LP64 interface provides compatibility with the previous Intel MKL versions because LP64 is just a new name for the only interface that the Intel MKL versions lower than 9 1 provided Choose the ILP64 interface if your application uses Intel MKL for calculations with large data arrays or the library may be used so in future Intel MKL provides the same include directory for the ILP64 and LP64 interfaces Compiling for LP64 ILP64 The table below shows how to compile for the ILP64 and LP64 interfaces 34 Linking Your Application with the Intel R Math Kernel Library 4 Fortran Compiling for ILP64 Compiling for LP64 C or C Compiling for ILP64 Compiling for LP64 ifort i8 I lt mkl directory gt include ifort I lt mkl directory gt include icc DMKL_ILP64 I lt mkl directory gt include icc I lt mkl directory gt include AL CAUTION Linking of an application compiled with the i8 or DMKL_ILP64 option to the LP64 libraries may result in unpredictable consequences and erroneous output Coding for ILP64 You do not need to change existing code if you are n
126. r example by using the environment variable to set the number of threads env OMP_NUM_THREADS 2 a out See the Linux Programmer s Manual in man pages format for particulars of the sched_setaffinity function used in the above example Operating on Denormals The IEEE 754 2008 standard An IEEE Standard for Binary Floating Point Arithmetic defines denormal or subnormal numbers as non zero numbers smaller than the smallest possible normalized numbers for a specific floating point format Floating point operations on denormals are slower than on normalized operands because denormal operands and results are usually handled through a software assist mechanism rather than directly in hardware This software processing causes Intel R Math Kernel Library Intel R MKL functions that consume denormals to run slower than with normalized floating point numbers You can mitigate this performance issue by setting the appropriate bit fields in the MXCSR floating point control register to flush denormals to zero FTZ or to replace any denormals loaded from memory with zero DAZ Check your compiler documentation to determine whether it has options to control FTZ and DAZ Note that these compiler options may slightly affect accuracy FFT Optimized Radices You can improve the performance of the Intel R Math Kernel Library Intel R MKL FFT if the length of your data vector permits factorization into powers of optimized radices In Intel MKL the
127. r script gt For example if you are using Intel MPI 3 x want to statically use the LP64 interface with ScaLAPACK and have only one MPI process per core and thus do not use threading specify the following linker options LSMKLPATH ISMKLINCLUDE W1l start group SMKLPATH libmkl_scalapack_lp64 a SMKLPATH 1libmkl_blacs_intelmpi_1p64 a SMKLPATH libmkl_intel_lp64 a SMKLPATH libmkl_sequential a SMKLPATH libmkl_core a static_mpi Wl end group lpthread lm NOTE Grouping symbols w1 start group and W1 end group are required for static linking g TIP Use the Web based Linking Advisor to quickly choose the appropriate set of lt MKL cluster Library gt lt BLACS gt and lt MKL core libraries gt See Also e Linking Your Application with the Intel R Math Kernel Library e Examples for Linking with ScaLAPACK and Cluster FFT 78 Working with the Intel R Math Kernel Library Cluster Software 8 Setting the Number of Threads The OpenMP software responds to the environment variable OMP_NUM_THREADS Intel R Math Kernel Library Intel R MKL also has other mechanisms to set the number of threads such as the MKL_NUM_THREADS or MKL_DOMAIN_NUM_THREADS environment variables see Using Additional Threading Control Make sure that the relevant environment variables have the same and correct values on all the nodes Intel MKL versions 10 0 and higher no longer set the default number of threads to one but de
128. r than library file names so omit the 1ib prefix and a extension To learn how to choose the libraries see Selecting Libraries to Link The name of the particular setting where libraries are specified depends upon the compiler integration Getting Assistance for Programming in the Eclipse IDE Intel MKL provides an Eclipse IDE plug in com intel mk1 help that contains the Intel R Math Kernel Library Intel R MKL Reference Manual see High level Directory Structure for the plug in location after the library installation To install the plug in do one of the following e Use the Eclipse IDE Update Manager recommended To invoke the Manager use Help gt Software Updates command in your Eclipse IDE e Copy the plug in to the plugins folder of your Eclipse IDE directory In this case if you use earlier C C Development Tools CDT versions 3 x 4 x delete or rename the index subfolder in the eclipse configuration org eclipse help base folder of your Eclipse IDE to avoid delays in Index updating The following Intel MKL features assist you while programming in the Eclipse IDE e The Intel MKL Reference Manual viewable from within the IDE e Eclipse Help search tuned to target the Intel Web sites e Code Content Assist in the Eclipse IDE CDT The Intel MKL plug in for Eclipse IDE provides the first two features 84 Programming with Intel R Math Kernel Library in the Eclipse Integrated Development Environment IDE 9
129. rd hptrd steqr stedc e Generalized Nonsymmetric Eigenvalue Problems computational routines chgeqz zhgeqz 46 Managing Performance and Memory 5 A number of other LAPACK routines which are based on threaded LAPACK or BLAS routines make effective use of parallelism gesv posv gels gesvd syev heev cgegs zgegs cgegv zgegv cgges zgges cggesx zggesx cggev zggev cggevx zggevx and so on Threaded BLAS Level1 and Level2 Routines In the following list stands for a precision prefix of each flavor of the respective routine and may have the value of s d c orz The following routines are threaded for Intel R Core 2 Duo and Intel R Core i7 processors e Levell BLAS axpy copy swap ddot sdot cdotc drot srot e Level2 BLAS gemv trmv dsyr ssyr dsyr2 ssyr2 dsymv ssymv Threaded FFT Problems The following characteristics of a specific problem determine whether your FFT computation may be threaded e rank e domain e size length e precision single or double e placement in place or out of place e strides e number of transforms e layout for example interleaved or split layout of complex data Most FFT problems are threaded In particular computation of multiple transforms in one call number of transforms gt 1 is threaded Details of which transforms are threaded follow One dimensional 1D transforms 1D transforms are threaded in many cases 1D complex to comple
130. read a SMKLPATH libmkl_core a Wl nd group liomp5 lpthread e Dynamic linking of myprog f and parallel Intel MKL ifort myprog f LSMKLPATH ISMKLINCLUDE lmkl_intel lmkl_intel_thread lmkl_core liomp5 lpthread e Static linking of myprog f and sequential version of Intel MKL ifort myprog f LSMKLPATH ISMKLINCLUDE 39 4 Intel Math Kernel Library for Linux OS User s Guide W1 start group SMKLPATH libmkl_intel a SMKLPATH libmkl_sequential a SMKLPATH libmkl_core a W1 end group lpthread e Dynamic linking of myprog f and sequential version of Intel MKL ifort myprog f LSMKLPATH ISMKLINCLUDE lmkl_intel lmkl_sequential lmkl_core lpthread e Dynamic linking of user code myprog f and parallel or sequential Intel MKL Call the mkl_set_threading_layer function or set value of the MKL_THREADING_LAYER environment variable to choose threaded or sequential mode ifort myprog f lmkl_rt e Static linking of myprog f Fortran 95 LAPACK interface and parallel Intel MKL ifort myprog f LSMKLPATH ISMKLINCLUDE ISMKLINCLUDE ia32 lmkl_lapack95 W1 start group SMKLPATH libmkl_intel a SMKLPATH libmkl_intel_thread a SMKLPATH libmkl_core a W1 end group liomp5 lpthread e Static linking of myprog f Fortran 95 BLAS interface and parallel Intel MKL T ifort myprog f LSMKLPATH ISMKLINCLUDE ISMKLINCLUDE ia32 ilmkl_blas95 W1 start
131. retation MKL DOMAIN NUM THREADS MKL_ALL 4 All parts of Intel MKL should try four threads The actual number of threads may be still different because of the MKL_DYNAMIC setting or system resource issues The setting is equivalent to MKL_NUM_THREADS 4 MKL_ALL 1 All parts of Intel MKL should try one thread except for BLAS which is suggested to try MKL_BLAS 4 four threads MKL_VML 2 VML should try two threads The setting affects no other part of Intel MKL Be aware that the domain specific settings take precedence over the overall ones For example the MKL_BLAS 4 value of MKL_DOMAIN_NUM_THREADS suggests trying four threads for BLAS regardless of later setting MKL_NUM_THREADS and a function call mk1_domain_set_num_threads 4 MKL_BLAS suggests the same regardless of later calls to mk1_set_num_threads However a function call with input MKL_ALL such as mk1_domain_set_num_threads 4 MKL_ALL is equivalent to mk1_set_num_threads 4 and thus it will be overwritten by later calls to mkl_set_num threads Similarly the environment setting of MKL_DOMAIN_NUM_THREADS with MKL_ALL 4 will be overwritten with MKL_NUM_THREADS 2 54 Managing Performance and Memory 5 Whereas the MKL_DOMAIN_NUM_THREADS environment variable enables you set several variables at once for example MKL_BLAS 4 MKL_FFT 2 the corresponding function does not take string syntax So to do the same with the f
132. rnel Library for Linux OS User s Guide See Also e Benchmarking a Cluster 98 Intel R Math Kernel Library Language Interfaces Support Language Interfaces Support by Function Domain The following table shows language interfaces that Intel R Math Kernel Library Intel R MKL provides for each function domain However Intel MKL routines can be called from other languages using mixed language programming See Mixed language Programming with Intel R MKL for an example of how to call Fortran routines from C C Function Domain FORTRAN 77 Fortran90 95 C C interface interface interface Basic Linear Algebra Subprograms BLAS Yes Yes via CBLAS BLAS like extension transposition routines Yes Yes Sparse BLAS Level 1 Yes Yes via CBLAS Sparse BLAS Level 2 and 3 Yes Yes Yes LAPACK routines for solving systems of linear equations Yes Yes Yes LAPACK routines for solving least squares problems eigenvalue and Yes Yes Yes singular value problems and Sylvester s equations Auxiliary and utility LAPACK routines Yes Yes Parallel Basic Linear Algebra Subprograms PBLAS Yes ScaLAPACK routines Yes T DSS PARDISO solvers Yes Yes Yes Other Direct and Iterative Sparse Solver routines Yes Yes Yes Vector Mathematical Library VML functions Yes Yes Yes Vector Statistical Library VSL functions Yes Yes Yes Fourier Transform functions FFT Yes Yes Cluster FFT functions Yes Yes 99 A Intel Math Kernel Library for Linux OS User s Guid
133. s Guide By default Intel threading is used See the Intel MKL Reference Manual for details of the mkl_set_threading_layer function Setting the Interface Layer Available interfaces depend on the architecture of your system On systems based on the Intel R 64 architecture LP64 and ILP64 interfaces are available To set one of these interfaces at run time use the mk1_set_interface_layer function or the MKL_INTERFACE_LAYER environment variable The following table provides values to be used to set each interface Interface Layer Value of MKL_INTERFACE_LAYER Value of the Parameter of mk1_set_interface_layer LP64 LP 64 MKL_INTERFACE_LP64 ILP64 ILP64 MKL_INTERFACE_ILP64 If the mkl_set_interface_layer function is called the environment variable MKL_INTERFACE_LAYER is ignored By default the LP64 interface is used See the Intel MKL Reference Manual for details of the mkl_set_interface_layer function See Also e Layered Model Concept e Directory Structure in Detail Using the ILP64 Interface vs LP64 Interface The Intel R Math Kernel Library Intel R MKL ILP64 libraries use the 64 bit integer type necessary for indexing large arrays with more than 2 1 elements whereas the LP64 libraries index arrays with the 32 bit integer type The LP64 and ILP64 interfaces are implemented in the Interface layer Link with the following interface libraries for the LP64 o
134. s a placeholder for an identifier an expression a string a symbol or a value for example lt mk1 directory gt Substitute one of these items for the placeholder Square brackets indicate that the items enclosed in brackets are optional 13 Intel Math Kernel Library for Linux OS User s Guide item item Braces indicate that only one of the items listed between braces should be selected A vertical bar separates the items 14 Overview Document Overview The Intel R Math Kernel Library Intel R MKL User s Guide provides usage information for the library The usage information covers the organization configuration performance and accuracy of Intel MKL specifics of routine calls in mixed language programming linking and more This guide describes OS specific usage of Intel MKL along with OS independent features It contains usage information for all Intel MKL function domains This User s Guide provides the following information e Describes post installation steps to help you start using the library e Shows you how to configure the library with your development environment e Acquaints you with the library structure e Explains how to link your application with the library and provides simple usage scenarios e Describes how to code compile and run your application with Intel MKL This guide is intended for Linux OS programmers with beginner to advanced experience in software development Se
135. s supporting OpenMPI Dynamic Libraries in the IA 32 Architecture Directory 1ib ia32 File Interface layer libmkl_rt so 106 Contents Single Dynamic Library interface library C Directory Structure in Detail File libmkl_intel so libmkl_gf so Threading layer libmkl_intel_thread so libmkl_gnu_thread so libmkl_pgi_thread so libmkl_sequential so Computational layer libmkl_core so libmkl_def so libmkl_p4 so libmkl_p4p so libmkl_p4m so libmk1l_p4m3 so libmkl_vml_def so libmkl_vml_ia so libmkl_vml_p4 so libmk1l_vml_p4m so libmk1l_vml_p4m2 so libmk1l_vml_p4m3 so libmkl_vml_p4p so libmkl_vml_avx so libmkl_scalapack_core so libmkl_cdft_core so Contents Interface library for the Intel compilers Interface library for the GNU Fortran compiler Threading library for the Intel compilers Threading library for the GNU Fortran and C compilers Threading library for the PGI compiler Sequential library Library dispatcher for dynamic load of processor specific kernel library Default kernel library Intel R Pentium R Pentium R Pro Pentium R II and Pentium R III processors Pentium R 4 processor kernel library Kernel library for the Intel R Pentium R 4 processor with Streaming SIMD Extensions 3 SSE3 including Intel R Core TM Duo and Intel R Core TM Solo processors Kernel library for processors based on the Intel R Core TM microarchitecture except Intel R C
136. sed Linking Advisor e Selecting Libraries to Link e Linking Examples e Working with the Cluster Software Selecting Libraries to Link This section recommends which libraries to link against depending on your Intel MKL usage scenario and provides details of the linking Linking with Interface Libraries Using the Single Dynamic Library Interface Intel R Math Kernel Library Intel R MKL provides the Single Dynamic Library interface SDL interface It enables you to dynamically select the interface and threading layer for Intel MKL To use the SDL interface put the only library on your link line Libmkl_rt so For example ic application c lmkl_rt Setting the Threading Layer To set the threading layer at run time use the mk1_set_threading_layer function or the MKL_THREADING_LAYER environment variable The following table lists available threading layers along with values to be used to set each layer Threading Layer Value of MKL_THREADING LAYER Value of the Parameter of mk1_set_threading_layer Intel threading INTEL KL_THREADING_INTEL Sequential mode of SEQUENTIAL KL_THREADING_SEQUENTIAL Intel MKL GNU threading GNU KL_THREADING_GNU PGI threading PGI KL_THREADING_PGI If the mk1_set_threading_layer function is called the environment variable MKL_THREADING_LAYER is ignored 33 4 Intel Math Kernel Library for Linux OS User
137. serguide index htm Intel MKL User s Guide in an uncompressed HTML format this document Viewing Man Pages To access man pages for the Intel R Math Kernel Library Intel R MKL add the man pages directory to the MANPATH environment variable If you performed the Setting Environment Variables step of the Getting Started process this is done automatically To view the man page for an Intel MKL function enter the following command in your command shell man lt function base name gt In this release lt function base name gt is the function name with omitted prefixes denoting data type precision or function domain Examples e For the BLAS function ddot enter man dot e For the ScaLAPACK function pzgeql2 enter man pgeql2 e For the FFT function DftiCommitDescriptor enter man CommitDescriptor NOTE Function names in the man command are case sensitive See Also e High level Directory Structure e Setting Environment Variables 29 3 Intel Math Kernel Library for Linux OS User s Guide 30 Linking Your Application with the Intel R Math Kernel Library Optimization Notice Intel compilers associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non Intel microprocessors for example SIMD instruction sets but do not optimize equally for non Intel microprocessors In addition certain compiler options f
138. ssrrsnssrrnnrsrnrrnusrrnnnurrrnnrernrrnens 21 What You Need to Know Before You Begin Using the Intel R Math Kernel LID RAR osaan a A A E N ta hee OA EEA 21 Chapter 3 Structure of the Intel R Math Kernel Library Architecture SUPO E a lavage daunted A sien ted AE AE AAA E E Ein EAEE EA ERREF 25 High level Directory Struct Ure nisanu nnas aaka aka h aa a AEN aak 26 Layered Model Concept ssesssrrsrrssssesserrerrsrnnnnnesnssnsrrrerrannannnennennorreernnnann 27 Accessing the Intel R Math Kernel Library Documentation ccccceceeeeeeeeee ees 28 Contents of the Documentation DirectorieS sssssssssssssssrrrrrrrrrrrsrrssrrrers 28 Viewing Man PAQES vice a ceeec ee a E tea ies eivcne earee sedate tele sie weeks 29 Chapter 4 Linking Your Application with the Intel R Math Kernel Library Linking Quick Startins esse anee eine dee einer vad eel a E Aa a a aa EA S cuedale tiles 31 Listing Libraries on a Link LiNe cicee cece eee eee eee ene eee a ted 32 Intel Math Kernel Library for Linux OS User s Guide Selecting Libraries to Link icp ducers einna ndn EAEE A EEE NEDE ee 33 Linking with Interface Libraries c ce cceeeeee a a a a aA a 33 Using the Single Dynamic Library Interface sssssssrsrsssrrrrserreresres 33 Using the ILP64 Interface vs LP64 Interface s ssssessssrreserrrrsrsres 34 Linking with Fortran 95 Interface LibrarieS ssssssssseresrerenserrrresres 36 Linking with Thread
139. ssssssssssrrrrrrrrrrrrresens 49 Setting the Number of Threads Using an OpenMP Environment Vala Dilek e een AUG cette os estan ey Oana Vita earth Cee 49 Changing the Number of Threads at RUN Tim c eeeeeeee eee eee ee teeta eee 49 Using Additional Threading Control cccceceeeeee eee ee ee teens eee eeeeeeeeneees 52 Intel R MKL specific Environment Variables for Threading COMO es vide n ewes a a Ea EEA aida E EEA cred aaa 52 MKL DYNAMIC paana i AA EAE EE AA EE E RAE 53 MKL_DOMAIN_NUM_THREADS ssssssssrsssrrrnssrrnrrsrnrrnrsrrrrnserrnresees 53 Setting the Environment Variables for Threading Control 55 Tips and Techniques to Improve Performance sssssssrrrsrrsssrsrrrrrrrrrrrrrrrssens 55 Coding Technique Serani das coseadieecnsanscdeucecssditacnstedsdeudecgstsewsesandetaacats 55 Hardware Configuration TipS ccecceeeee eee eee eee teens teeta eee eee teen eee ena 56 Managing Multi core Performance cceeeee cece ee eee eee nett eee ee eee ea eee eae 57 Operating On Denormals ceccece eect eee e eee eee eae idani 58 FFT Optimized Radices ainara aa a AE acess venenatis vs ce EIE AA a iawn aia A eis 58 Using Memory ManaGeMe nt ieresepior de e AEE TEE 58 Memory Management Software of the Intel R Math Kernel Library 58 vi Contents Redefining Memory FUNCTIONS ccceee eee e eee eee teens eee e ee eae ee teeta teeta eas 59 Chapter 6 Langua
140. sting Libraries on a Link Line e Working with the Cluster Software Listing Libraries on a Link Line To link with Intel R Math Kernel Library specify paths and libraries on the link line as shown below NOTE The syntax below is for dynamic linking For static linking replace each library name preceded with 1 with the path to the library file For example replace 1mk1l_core with SMKLPATH 1libmkl_core a where SMKLPATH is the appropriate user defined environment variable lt files to link gt L lt MKL path gt I lt MKL include gt I lt MKL include gt ia32 intel64 ilp64 1p64 lmkl_blas 95 95_i1p64 95_1p64 lmkl_lapack 95 95_i1p64 95_1p64 lt cluster components gt lImkl_ intel intel_ilp64 intel_1lp64 intel_sp2dp gf lgf_ilp64 gf_1p64 lmkl_ intel_thread gnu_thread pgi_thread sequential ilmkl_core liomp5 lpthread 1m 32 Linking Your Application with the Intel R Math Kernel Library 4 In case of static linking enclose the cluster components interface threading and computational libraries in grouping symbols for example W1 start group MKLPATH libmk1l_cdft_core a SMKLPATH libmk1l_blacs_intelmpi_ilp64 a SMKLPATH libmkl_intel_ilp64 a SMKLPATH libmkl_intel_thread a SMKLPATH libmkl_core a W1 nd group The order of listing libraries on the link line is essential except for the libraries enclosed in the grouping symbols above See Also e Using the Web ba
141. tailed description of Intel compiler options including the instruction sets and specific microprocessors they implicate please refer to the Intel Compiler User and Reference Guides under Compiler Options Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel compatible microprocessors depending on the options you select your code and other factors you likely will get extra performance on Intel microprocessors Intel compilers associated libraries and associated development tools may or may not optimize to the same degree for non Intel microprocessors for optimizations that are not unique to Intel microprocessors These optimizations include Intel Streaming SIMD Extensions 2 Intel SSE2 Intel Streaming SIMD Extensions 3 Intel SSE3 and Supplemental Streaming SIMD Extensions 3 Intel SSSE3 instruction sets and other optimizations Intel does not guarantee the availability functionality or effectiveness of any optimization on microprocessors not manufactured by Intel Microprocessor dependent optimizations in this product are intended for use with Intel microprocessors While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non Intel microprocessors Intel r
142. te library and modules with your compiler and link the library as a user s library 1 Go to the respective directory lt mk1 directory gt interfaces blas95 or lt mkl directory gt interfaces lapack95 2 Type one of the following commands depending on your architecture e For the IA 32 architecture 63 6 Intel Math Kernel Library for Linux OS User s Guide make libia32 INSTALL _DIR lt user dir gt e For the Intel R 64 architecture make libintel64 interface 1p64 ilp64 INSTALL _DIR lt user dir gt NOTE Parameter INSTALL_DIR is required As a result the required library is built and installed in the lt user dir gt 1lib directory and the mod files are built and installed in the lt user dir gt include lt arch gt 1p64 ilp64 directory where lt arch gt is one of ia32 intel64 By default the ifort compiler is assumed You may change the compiler with an additional parameter of make FC lt compiler gt For example command make libintel64 FC pgf95 INSTALL _DIR lt user pgf95 dir gt interface 1p64 builds the required library and mod files and installs them in subdirectories of lt user pgf95 dir gt To delete the library from the building directory use the following commands e For the IA 32 architecture make cleania32 INSTALL_DIR lt user dir gt e For the Intel R 64 architecture make cleanintel64 interface 1p64 ilp64 INSTALL_DIR lt user dir gt e For all the architectures make c
143. tel microprocessors than for other microprocessors While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel compatible microprocessors depending on the options you select your code and other factors you likely will get extra performance on Intel microprocessors Intel compilers associated libraries and associated development tools may or may not optimize to the same degree for non Intel microprocessors for optimizations that are not unique to Intel microprocessors These optimizations include Intel Streaming SIMD Extensions 2 Intel SSE2 Intel Streaming SIMD Extensions 3 Intel SSE3 and Supplemental Streaming SIMD Extensions 3 Intel SSSE3 instruction sets and other optimizations Intel does not guarantee the availability functionality or effectiveness of any optimization on microprocessors not manufactured by Intel Microprocessor dependent optimizations in this product are intended for use with Intel microprocessors While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non Intel microprocessors Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements We hope to win your business by striving to offer the best performance of any compiler or library please let us know if you find we do not Notice revision 20101101 Detailed Structure of the IA
144. the performance of processor 0 in Gflops in DGEMM is always DF DT Using the number of DGEMM flops as a basis instead of the number of LU flops you get a lower bound on performance of the run by looking at DMF which can be compared to Mflops above It uses the global LU time but the DGEMM flops are computed under the assumption that the problem is evenly distributed amongst the nodes as only HPL s node 0 0 returns any output Note that when using the above performance monitoring tools to compare different HPL dat input data sets you should be aware that the pattern of performance drop off that LU experiences is sensitive to some input data For instance when you try very small problems the performance drop off from the initial values to end values is very rapid The larger the problem the less the drop off and it is probably safe to use the first few performance values to estimate the difference between a problem size 700000 and 701000 for instance Another factor that influences the performance drop off is the grid dimensions P and Q For big problems the performance tends to fall off less from the first few steps when P and Q are roughly equal in value You can make use of a large number of parameters such as broadcast types and change them so that the final performance is determined very closely by the first few steps Using these tools will greatly assist the amount of data you can test 97 1 0 Intel Math Ke
145. timized MP LINPACK Benchmark for Clusters is based on modifications and additions to HPL 2 0 from Innovative Computing Laboratories ICL at the University of Tennessee Knoxville UTK The Intel Optimized MP LINPACK Benchmark for Clusters can be used for Top 500 runs see http www top500 org To use the benchmark you need be intimately familiar with the HPL distribution and usage The Intel Optimized MP LINPACK Benchmark for Clusters provides some additional enhancements and bug fixes designed to make the HPL usage more convenient as well as explain Intel R Message Passing Interface MPI settings that may enhance performance The benchmarks mp_linpack directory adds techniques to minimize search times frequently associated with long runs The Intel R Optimized MP LINPACK Benchmark for Clusters is an implementation of the Massively Parallel MP LINPACK benchmark by means of HPL code It solves a random dense real 8 system of linear equations Ax b measures the amount of time it takes to factor and solve the system converts that time into a performance rate and tests the results for accuracy You can solve any size N system of equations that fit into memory The benchmark uses full row pivoting to ensure the accuracy of the results Use the Intel Optimized MP LINPACK Benchmark for Clusters on a distributed memory machine On a shared memory machine use the Intel Optimized LINPACK Benchmark Intel provides optimized versions of the LINPAC
146. ting the Number of Threads Using an OpenMP Environment Variable Scripts to Set Environment Variables When the installation of the Intel R Math Kernel Library Intel R MKL for Linux OS is complete set the INCLUDE MKLROOT LD_LIBRARY_PATH MANPATH LIBRARY_PATH CPATH FPATH and NLSPATH environment variables in the command shell using one of the script files in the bin subdirectory of the Intel MKL installation directory Choose the script corresponding to your system architecture and command shell as explained in the following table Architecture Shell Script File IA 32 C ia32 mklvars_ia32 csh IA 32 Bash and Bourne sh ia32 mklvars_ia32 sh Intel R 64 C intel64 mklvars_intel64 csh Intel R 64 Bash and Bourne sh intel64 mklvars_intel64 sh IA 32 and Intel R 64 IA 32 and Intel R 64 Running the Scripts The scripts accept parameters to specify the following e Architecture 18 C Bash and Bourne sh mklvars csh mklvars sh Getting Started 2 e Addition of a path to Fortran 95 modules precompiled with the Intel R Fortran compiler to the FPATH environment variable Supply this parameter only if you are using the Intel R Fortran compiler e Interface of the Fortran 95 modules This parameter is needed only if you requested addition of a path to the modules Usage and values of these parameters depend on the script name regardless of the extension The following table lists values of the scr
147. tion for the Boost uBLAS is available at www boost org Intel MKL provides overloaded prod functions for substituting uBLAS dense matrix matrix multiplication with the Intel MKL gemm calls Though these functions break uBLAS expression templates and introduce temporary matrices the performance advantage can be considerable for matrix sizes that are not too small roughly over 50 You do not need to change your source code to use the functions To call them e Include the header file mk1_boost_ublas_matrix_prod hpp in your code from the Intel MKL include directory e Add appropriate Intel MKL libraries to the link line The list of expressions that are substituted follows prod ml m2 prod trans m1 m2 prod trans conj ml1l m2 prod conj trans ml1l m2 prod ml trans m2 prod trans m1 trans m2 prod trans conj ml trans m2 prod conj trans ml trans m2 prod ml trans conj m2 prod trans m1 trans conj m2 69 6 Intel Math Kernel Library for Linux OS User s Guide prod trans conj m1 trans conj m2 prod conj trans ml trans conj m2 prod ml conj trans m2 prod trans m1 conj trans m2 prod trans conj m1 conj trans m2 prod conj trans ml conj trans m2 These expressions are substituted in the release mode only with NDEBUG preprocessor symbol defined Supported uBLAS versions are Boost 1 34 1 1 35 0 1 36 0
148. tions from C 67 CBLAS interface from C 67 complex BLAS Level 1 function from C 67 complex BLAS Level 1 function from C 67 Fortran style routines from C 65 CBLAS interface use of 65 Cluster FFT linking with 77 cluster software Intel R MKL 77 cluster software linking with commands 77 linking examples 80 code examples use of 20 coding data alignment 75 techniques to improve performance 55 compiler support RTL linking with 38 compiler dependent function 64 complex types in C and C Intel R MKL 66 computation results consistency 75 computational libraries linking with 38 configuring Eclipse CDT 83 consistent results 75 conventions notational 13 custom shared object building 42 specifying list of functions 43 D denormal number performance 58 directory structure documentation 28 high level 26 in detail 105 documentation directories contents 28 man pages 29 documentation for Intel R MKL viewing in Eclipse IDE 85 Eclipse CDT configuring 83 viewing Intel R MKL documentation in 85 Eclipse IDE searching the Intel Web site 85 environment variables setting 18 examples linking for cluster software 80 general 39 F FFT interface data alignment 55 optimised radices 58 threaded problems 46 FFTW interface support 103 Fortran 95 interface libraries 36 G GNU Multiple Precision Arithmetic Library 103 H header files Intel R MKL 100 HT technology configuration tip 56 hybrid version of MP
149. tly when the function is called from a C program When a Fortran function is called as a subroutine the return value is the first parameter in the calling sequence You can use this feature to call a BLAS function from C The following example shows how a call to a Fortran function as a subroutine converts to a call from C and the hidden parameter result gets exposed Normal Fortran function call result cdotc n x 1 yy L A call to the function as a subroutine call cdotc result n x 1 y 1 A call to the function from C cdotc amp result amp n x amp one y amp one NOTE Intel R Math Kernel Library Intel R MKL has both upper case and lower case entry points in the Fortran style case insensitive BLAS with or without the trailing underscore So all these names are equivalent and acceptable cdotc CDOTC cdotc_ and CDOTC_ The above example shows one of the ways to call several level 1 BLAS functions that return complex values from your C and C applications An easier way is to use the CBLAS interface For instance you can call the same function using the CBLAS interface as follows cbhlas_cdotu n x 1 y 1 amp result NOTE The complex value comes last on the argument list in this case The following examples show use of the Fortran style BLAS interface from C and C as well as the CBLAS C language interface e Example Calling a Complex BLAS Level 1 Function from C e Example Calling a
150. tom shared object builder can create a shared object from a subset of Intel MKL functions by picking the respective object files from the static libraries Using the Custom Shared Object Builder To build a custom shared object use the following command make target lt options gt The following table lists possible values of target and explains what the command does for each value Value Comment peered The command builds static libraries for processors that use the IA 32 architecture The builder uses the interface threading and computational libraries To ntorpI The command builds static libraries for processors that use the Intel R 64 architecture The builder uses the interface threading and computational libraries Sorang The command builds dynamic libraries for processors that use the IA 32 architecture The builder uses the Single Dynamic Library SDL interface library libmkl_rt so Borne ee The command builds dynamic libraries for processors that use the Intel R 64 architecture The builder uses the SDL interface library libmkl_rt so help The command prints Help on the custom shared object builder The lt options gt placeholder stands for the list of parameters that define macros to be used by the makefile The following table describes these parameters Parameter Values Description interface 1p641ilp64 Defines whether to use LP64 or ILP64 programming interface for the Intel 64 architecture The de
151. unction calls you may need to make several calls which in this example are as follows mkl_domain_set_num_threads 4 MKL_BLAS mk1l_domain_set_num_threads 2 MKL_FFT Setting the Environment Variables for Threading Control To set the environment variables used for threading control in the command shell in which the program is going to run enter the export or set commands depending on the shell you use For example for a bash shell use the export commands export lt VARIABLE NAME gt lt value gt For example export MKL_NUM_THREADS 4 export MKL_DOMAIN_NUM_THREADS MKL_ALL 1 MKL_BLAS 4 export MKL_DYNAMIC FALSE For the csh or tcsh shell use the set commands set lt VARIABLE NAME gt lt value gt For example set MKL_NUM_THREADS 4 set MKL_DOMAIN_NUM_THREADS MKL_ALL 1 MKL_BLAS 4 set MKL_DYNAMIC FALSE Tips and Techniques to Improve Performance Coding Techniques To obtain the best performance with the Intel R Math Kernel Library Intel R MKL ensure the following data alignment in your source code e Align arrays on 16 byte boundaries See Aligning Addresses on 16 byte Boundaries for how to do it e Make sure leading dimension values n element_size of two dimensional arrays are divisible by 16 where element_size is the size of an array element in bytes e For two dimensional arrays avoid leading dimension values divisible by 2048 bytes For example for a double pre
152. wrapper provides the handler class to hold the native descriptor of the stream state 71 6 Intel Math Kernel Library for Linux OS User s Guide The wrapper for the convolution and correlation functions mitigates the same difficulty of the VSL interface which assumes a similar lifecycle for task descriptors The wrapper utilizes the ESSL like interface for those functions which is simpler for the case of 1 dimensional data The JNI stub additionally encapsulates the MKL functions into the ESSL like wrappers written in C and so packs the lifecycle of a task descriptor into a single call to the native method The wrappers meet the JNI Specification versions 1 1 and 5 0 and should work with virtually every modern implementation of Java The examples and the Java part of the wrappers are written for the Java language described in The Java Language Specification First Edition and extended with the feature of inner classes this refers to late 1990s This level of language version is supported by all versions of the Sun Java Development Kit JDK developer toolkit and compatible implementations starting from version 1 1 5 or by all modern versions of Java The level of C language is Standard C that is C89 with additional assumptions about integer and floating point data types required by the Intel MKL interfaces and the JNI header files That is the native float and double data types must be the same as JNI jfloat and jdoubl
153. x c2c transforms of size N using interleaved complex data layout are threaded under the following conditions depending on the architecture Architecture Conditions Intel R 64 nis a power of 2 log2 N gt 9 the transform is double precision out of place and input output strides equal 1 IA 32 Nis a power of 2 log2 N gt 13 and the transform is single precision Nis a power of 2 logo N gt 14 and the transform is double precision Any nis composite log gt N gt 16 and input output strides equal 1 1D real to complex and complex to real transforms are not threaded 47 5 Intel Math Kernel Library for Linux OS User s Guide 1D complex to complex transforms using split complex layout are not threaded Prime size complex to complex 1D transforms are not threaded Multidimensional transforms All multidimensional transforms on large volume data are threaded Avoiding Conflicts in the Execution Environment Certain situations can cause conflicts in the execution environment that make the use of threads in Intel R Math Kernel Library Intel R MKL problematic This section briefly discusses why these problems exist and how to avoid them If you thread the program using OpenMP directives and compile the program with Intel R compilers Intel MKL and the program will both use the same threading library Intel MKL tries to determine if it is in a parallel region in the program and if it is it does not spread its oper

Intel(R) Math Kernel Library for Linux* OS User's Guide

Contents

Download Pdf Manuals

Related Search

Related Contents