Home

Fortran Programming Guide

1. Example Initialize variables with Hollerith demo cat FourA8 f double complex x 2 data x 16Habcdefghijklmnop l16Hqrstuvwxyz012345 write 6 4A8 x end demo 77 silent o FourA8 FourA8 f demo FourA8s abcdefghijklmnopqrstuvwxyz012345 demo If you pass Hollerith constants as arguments or if you use them in expressions or comparisons they are interpreted as character type expressions If needed you can initialize a data item of a compatible type with a Hollerith and then pass it to other routines Chapter 7 Porting 113 program respond integer yes no integer ask data yes no 3hyes 2hno if ask eq yes then print You may proceed else print Request Rejected endif end integer function ask double precision solaris response integer yes no data yes no 3hyes 2hno data solaris 7hSOLARIS format What system format 88 write 6 10 read 5 20 response ask no if response eq solaris ask yes return end Example 10 20 Nonstandard Coding Practices Fortran Programming Guide May 2000 As a general rule porting an application program from one system and compiler to another can be made easier by eliminating any nonstandard coding Optimizations or work arounds that were successful on one system might only obscure and confuse compilers on other systems In particular optimized h
2. k end do return end In the preceding example DOSERIAL applies only to the i loop and not to the j loop regardless of whether the call to the subroutine 681 166 is inlined Default Scoping Rules for Sun Style Directives For Sun style C PAR explicit directives the compiler uses default rules to determine whether a scalar or array is shared or private You can override the default rules to specify the attributes of scalars or arrays referenced inside a loop With Cray style MIC directives all variables that appear in the loop must be explicitly declared either shared or private on the DOALL directive The compiler applies these default rules All scalars are treated as private A local copy of a scalar is made available for each thread executing the loop and that local copy is used by that thread only All array references are treated as shared references Any write of an array element by one thread is visible to all threads No synchronization is performed on accesses to shared variables If inter iteration dependencies exist in a loop then the execution may result in erroneous results You must ensure that these cases do not arise The compiler may sometimes be able to detect such a situation at compile time and issue a warning but it does not disable parallelization of such loops Chapter 10 SPARC Parallelization 175 Example Potential problem through equivalence equivalence a 1 y CSPA
3. In the example variables x y and i are STOREBACK variables REDUCTION varlist The REDUCTION varlist qualifier specifies that all variables in the list varlist are reduction variables for the DOALL loop A reduction variable or array is one whose partial values can be individually computed on various processors and whose final value can be computed from all its partial values The presence of a list of reduction variables requests the compiler to handle a DOALL loop as reduction loop by generating parallel reduction code for it Example Specify a reduction variable CSPAR DOALL REDUCTION x do i 1 n a i end do In the preceding example the variable x is a sum reduction variable the i loop is a sum reduction loop Chapter 10 SPARC Parallelization 1 SCHEDTYPE 1 SCHEDTYPE t specifies the scheduling type t be used to schedule the DOALL loop TABLE 10 6 DOALL SCHEDTYPE Qualifiers Action Use static scheduling for this DO loop This is the default scheduling for Sun style DOALL for both 77 and 95 Distribute all iterations uniformly to all available threads Example With 1000 iterations and 4 processors each thread gets one chunk of 250 contiguous iterations Use self scheduling for this DO loop Each thread gets one chunk of chunksize iterations at a time distributed in a nondeterministic order until all iterations are processed Chu
4. Fortran Programming Guide May 2000 66 7 Example Use XlistE to show errors and warnings demo 77 XlistE silent Repeat f demo cat Repeat lst FILE Repeat f program repeat 4 CALL nwfrk pni A ERR 418 argument pnl is real but dummy argument is integer 4 See Repeat f line 14 4 CALL nwfrk pni A ERR 317 variable pnl referenced as integer 4 across repeat nwfrk prnok in line 21 but set as real by repeat in line 2 subroutine subrl 10 CALE x 0 5 WAR 348 recursive call for subr1 See dynamic calls Repeat f line 3 subroutine nwfrk I PRINT prnok ix fork ERR 418 argument ix is integer 4 but dummy argument is real See Repeat f line 20 subroutine unreach_sub 24 SUBROUTINE unreach_sub WAR 338 subroutine unreach_sub isn t called from program Date Wed Feb 24 10 40 32 1999 Files 2 Sources 1 libraries 1 Lines 26 Sources 26 Library subprograms 2 Routines 5 MAIN 1 Subroutines 3 Functions 1 Messages 5 Errors 3 Warnings 2 demo Chapter5 Program Analysis and Debugging Compiling the same program with lt 1 1 5 also produces a cross reference table on Initialization DATA or extended declaration TABL iw 153 21 16 20 LI NC M DC DC DM C DC standard outp
5. Xlist The Xlist options provide a valuable way to analyze a source program for inconsistencies and possible runtime problems The analysis performed by the compiler is global across subprograms Xlist reports errors in alignment agreement in number and type for subprogram arguments common block parameter and various other kinds of errors Xlist also can be used to make detailed source code listings and cross reference tables Note Not all the xlist suboptions are available with 95 GPC Overview Global program checking GPC invoked by the Xlistx option does the following Enforces type checking rules of Fortran more stringently than usual especially between separately compiled routines Enforces some portability restrictions needed to move programs between different machines or operating systems m Detects legal constructions that nevertheless might be suboptimal or error prone 63 Reveals other potential bugs and obscurities In particular global checking reports problems such as Interface problems Conflicts in number and type of dummy and actual arguments Wrong types of function values a Possible conflicts due to data type mismatches in common blocks between different subprograms Usage problems a Function used as a subroutine or subroutine used as a function Declared but unused functions subroutines variables and labels a Referenced but not declared func
6. return end TABLE 11 4 Passing Simple Data Types Fortran calls C integer i real r external CSim i 100 call CSim i r void csim_ int 1 float r er i Fortran Programming Guide May 2000 192 COMPLEX Data Pass a Fortran COMPLEX data item as a pointer to a C struct of two float or two double data types TABLE 11 5 Passing COMPLEX Data Types Fortran calls C C calls Fortran complex w struct iloat ry 19 y double complex z struct cpx dl external CCmplx struct cpx w 2 call CCmp1x w z struct dpx double r i are struct dpx d2 SSS aS ee ae Tee struct dpx z amp d2 struct cpx float r i fomplx_ w z struct dpx double r i void cemplx_ He Struct cpx w subroutine FCmplx w z struct dpx z complex w double complex z w gt r 2 w 32 007 w lt i 007 z 66 67 94 1 z gt r 66 67 return lt i 94 1 end In 64 bit environments and compiling with xarch v9 COMPLEX values are returned in registers Character Strings Passing strings between C and Fortran routines is not recommended because there is no standard interface However note the following All C strings are passed by reference Fortran calls pass an additional argument for every argument with character type in the argument list The extra argument gives the length of the string and is equivalent to a C long in
7. 77 Xlisth Halt on errors With Xlisth compilation stops if errors are detected while cross checking the program In this case the report is redirected to stdout instead of the 1st file Fortran Programming Guide May 2000 72 XlistI List and cross check include files If XlistI is the only suboption used include files are shown or scanned along with the standard Xlist output line numbered listing error messages and a cross reference table Listing lIf the listing is not suppressed then the include files are listed in place Files are listed as often as they are included The files are a Source files a include files a INCLUDE files m Cross Reference Table If the cross reference table is not suppressed the following files are all scanned while the cross reference table is generated Source files a include files a INCLUDE files The default is not to show include files Show listing and cross routine errors 861 11 Use XlistL to produce only a listing and 8 list of cross routine errors This suboption by itself does not show a cross reference table The default is to show the listing and cross reference table Xlist1n Set the page length for pagination to n lines Use Xlist1 to set the page length to something other than the default page size For example 11 561 45 sets the page length to 45 lines The default is 66 With n 0 Xlist10 this option shows listing
8. If you pass bad handles libFposix_c returns an error code ENOHANDLE libFposix core dumps with a segmentation fault Of course the checking is time consuming and 1ibFposix_c is several times slower Both POSIX libraries come in static and dynamic forms The POSIX bindings provided are for IEEE Standard 1003 9 1992 IEEE 1003 9 is a binding of 1003 1 1990 to FORTRAN X3 8 1978 For more information see these POSIX 1 documents ISO IEC 9945 1 1990 IEEE Standard 1003 1 1990 IEEE Order number SH13680 IEEE CS Catalog number 1019 To find out precisely what POSIX is you need both the 1003 9 and the POSIX 1 documents The POSIX library for 95 is libposix9 Chapter 4 Libraries 61 Shippable Libraries If your executable uses a Sun dynamic library that is listed in the runtime libraries README file your license includes the right to redistribute the library to your customer This README file is located in the READMEs directory lt install point gt SUNWspro READMEs Do not redistribute or otherwise disclose the header files source code object modules or static libraries of object modules in any form Refer to your software license for more details 62 Fortran Programming Guide May 2000 CHAPTER 5 Program Analysis and Debugging This chapter presents a number of Sun Fortran compiler features that facilitate program analysis and debugging Global Program Checking
9. demo 95 DetExcFlg F demo a out Highest priority exception is underflow invalid divide overflo underflo inexact 0 0 0 1 1 1 exception is raised 0 it is not demo IEEE Extreme Value Functions The compilers provide a set of functions that can be called to return a special IEEE extreme value These values such as infinity or minimum normal can be used directly in an application program Example A convergence test based on the smallest number supported by the hardware would look like IF delta LE r_min_normal RETURN The values available are listed in the following table TABLE 6 3 Functions Returning IEEE Values IEEE Value Double Precision Single Precision infinity d_infinity r_infinity quiet NaN d_quiet_nan r_quiet_nan signaling NaN d_signaling_nan r_signaling_nan min normal d_min_normal r_min_normal min subnormal d_min_subnormal r_min_subnormal max subnormal d_max_subnormal r_max_subnormal max normal d_max_normal r_max_normal The two NaN values quiet and signaling are unordered and should not be used in comparisons such as IF X ne r_quiet_nan THEN To determine whether some value is 8 NaN use the function ir_isnan r or id_isnan d 90 Fortran Programming Guide May 2000 The Fortran names for these functions are listed in these man pages m libm_double 3f libm_single 3f ieee_functions 3m Also se
10. 17 PRINT prnok ix fork ERR 418 argument ix is integer 4 but dummy argument is real See Repeat f line 20 Date Wed Feb 24 10 40 32 1999 Files 2 Sources 1 libraries 1 Lines 26 Sources 26 Library subprograms 2 Routines 5 MAIN 1 Subroutines 3 Functions 1 Messages 5 Errors 3 Warnings 2 demo Chapter5 Program Analysis and Debugging 5 Example Explain a message and find a type mismatch in program 506660 f demo cat ShoGetc f CHARACTER 1 6 1 getc c END demo 77 silent ShoGetc f Compile program demo a out Program waits for input 2 Type Z keyboard This causes run time message Why Note IEEE floating point exception flags raised Invalid Operation See the Numerical Computation Guide ieee_flags 3M demo 77 XlistE silent ShoGetc f Compile with Global Program Checking demo cat ShoGetc 1st and view listing FILE ShoGetc f program MAIN 2 1 getc c WAR 320 variable i set but never referenced 2 1 getc c ERR 412 function 8660 used as real but declared as integer 4 Here is the error function must be declared INTEGER 2 1 getc c WAR 320 variable c set but never referenced demo cat ShoGetc f Modify program to declare getc INTEGER and run again CHARACTER 1 c INTEGER getc 1 getc c END demo 77 silent ShoGetc f demo a out 2 Type Z on ke
11. Allocate local variables on stack stackvar Enable Sun style MP directives mp sun Enable Cray style MP directives mp cray Enable OpenMP directives mp openmp Compile for OpenMP parallelization openmp Notes on these options reduction requires autopar autopar includes depend and loop structure optimization parallel is equivalent to autopar explicitpar noautopar noexplicitpar noreduction are the negations Parallelization options can be in any order but they must be all lowercase Reduction operations are not analyzed for explicitly parallelized loops Use of any of the parallelization options requires a Sun WorkShop HPC license openmp is a macro for the combination of options mp openmp stackvar explicitpar The options loopinfo vpara and mp must be used in conjunction with one of the parallelization options autopar explicitpar or parallel The following table summarizes the 77 and 95 Sun style parallel directives TABLE 10 2 Sun Style Parallel Directives Parallel Directive Purpose CSPAR TASKCOMMON Declares a common block private to each thread CSPAR DOALL optional qualifiers Parallelizes next loop if possible CSPAR DOSERIAL Inhibits parallelization of next loop CSPAR DOSERIAL Inhibits parallelization of loop nest Fortran Programming Guide May 2000 150 Cray style directives are similar see page 176 but use a 011165 sentinel instead of CSPAR and with different optiona
12. The linker searches for libraries at several locations and in a certain prescribed order Some of these locations are standard paths while others depend on the compiler options Rpath 1library and Ldir and the environment variable LD_LIBRARY_PATH Search Order for Standard Library Paths The standard library search paths used by the linker are determined by the installation path and they differ for static and dynamic loading lt install point gt is the path to where the Fortran compilers have been installed In a standard install of the software this is opt Static Linking While building the executable file the static linker searches for any libraries in the following paths among others in the specified order lt install point gt SUNWspro lib Sun shared libraries usr ccs lib Standard location for SVr4 software usr lib Standard location for UNIX software These are the default paths used by the linker Dynamic Linking The dynamic linker searches for shared libraries at runtime in the specified order Paths specified by user with Rpath lt install point gt SUNWspro lib usr lib standard UNIX default The search paths are built into the executable Chapter 4 Libraries 47 LD_LIBRARY_PATH Environment Variable Use the LD_LIBRARY_PATH environment variable to specify directory paths that the linker should search for libraries specified with the 1library option Multiple directories can be
13. format edit descriptors 109 Fortran features and extensions 12 libraries 60 utilities 13 fpp command 13 free format 4 fsimple option 139 IOINIT library routine 4 L 1x option 49 labels unused Xlist 64 Ldir option 49 110877 60 libFposix 60 1ibM77 60 libraries 43 to 62 dynamic creating 55 naming 58 position independent code 56 specifying 50 tradeoffs 56 in general 43 linking 44 load map 44 math 60 optimized 141 POSIX 61 provided with Sun WorkShop Fortran 60 redistributable 62 search order command line options 49 LD_LIBRARY_PATH 48 paths 47 shared See dynamic static creating 51 on SPARC V9 58 ordering routines 55 recompile and replace module 55 tradeoffs 51 Sun Performance Library 14 142 VMS 60 libv77 61 line width output Xlist 74 line numbered listing Xlist 65 linking binding options B d 57 consistent compile and link 46 libraries 44 ieee_functions 85 ieee_handler 85 91 1 _retrospective 83 97 ieee_values 85 INCLUDE 26 include files list and cross checking with 11561 73 inconsistency arguments checking Xlist 64 named common blocks checking Xlist 64 indirect addressing data dependency 149 inexact floating point arithmetic 83 information files READMEs 16 initialization 191 inlining calls with 04 7 input output 19 to 34 accessing files 19 binary 28 comparing Fortran and CI O 189 dd c
14. Contents ix Using Optimized Libraries 1 Eliminating Performance Inhibitors 142 Further Reading 144 10 SPARC Parallelization 145 Essential Concepts 145 Speedups What to Expect 146 Steps to Parallelizing a Program 147 Data Dependency Issues 148 Parallel Options and Directives Summary 149 Number of Threads 1 Stacks Stack Sizes and Parallelization 152 Automatic Parallelization 3 Loop Parallelization 153 Arrays Scalars and Pure Scalars 154 Automatic Parallelization Criteria 4 Automatic Parallelization With Reduction Operations 156 Explicit Parallelization 159 Parallelizable Loops 160 Sun Style Parallelization Directives 165 Cray Style Parallelization Directives 176 Environment Variables 178 PARALLEL and OMP_NUM_THREADS 179 SUNW_MP_THR_IDLE 179 Debugging Parallelized Programs 179 First Steps at Debugging 179 Debugging Parallel Code With dbx 181 Fortran Programming Guide May 2000 C Fortran Interface 3 Compatibility Issues 3 Function or Subroutine 184 Data Type Compatibility 184 Case Sensitivity 186 Underscores in Routine Names 7 Argument Passing by Reference or Value 188 Argument Order 188 Array Indexing and Order 8 File Descriptors and stdio 189 File Permissions 190 Libraries and Linking With the 77 or 95 Command 191 Fortran Initialization Routines 191 Passing Data Arguments by Reference 2 Simple Data Types 2 COMPLEX Data 193 Character Strings 3 One Dimensional Arrays 194
15. For instance prior to the IEEE standard if you multiplied two very small numbers on a computer you could get zero Most mainframes and minicomputers behaved that way With IEEE arithmetic gradual underflow expands the dynamic range of computations For example consider a 32 bit processor with 1 0E 38 as the machine s epsilon the smallest representable value on the machine Multiply two small numbers In older arithmetic you would get 0 0 but with IEEE arithmetic and the same word length you get 1 40130E 45 Underflow tells you that you have an answer smaller than the machine naturally represents This result is accomplished by stealing some bits from the mantissa and shifting them over to the exponent The result a denormalized number is less precise in some sense but more precise in another The deep implications are beyond this discussion If you are interested consult Computer January 1980 Volume 13 Number 1 particularly J Coonen s article Underflow and the Denormalized Numbers Most scientific programs have sections of code that are sensitive to roundoff often in an equation solution or matrix factorization Without gradual underflow programmers are left to implement their own methods of detecting the approach of an inaccuracy threshold Otherwise they must abandon the quest for a robust stable implementation of their algorithm For more details on these topics see the Sun WorkShop Numerical Comp
16. Fortran Programming Guide May 2000 156 Recognized Reduction Operations The following table lists the reduction operations that are recognized by 77 and 95 TABLE 10 3 Recognized Reduction Operations Mathematical Operations Fortran Statement Templates Sum 5 8 v i Product 5 8 v i Dot product 5 8 v i u i Minimum 5 amin 8 v i Maximum 5 amax 8 v i OR do 1 1 n b b or v i end 0 AND b true do 1 1 n b b wand v i end do Count of non zero elements k 0 do 1 1 n if v i ne 0 k k 1 end do All forms of the MIN and MAX function are recognized Numerical Accuracy and Reduction Operations Floating point sum or product reduction operations may be inaccurate due to the following conditions m The order in which the calculations are performed in parallel is not the same as when performed serially on a single processor m The order of calculation affects the sum or product of floating point numbers Hardware floating point addition and multiplication are not associative Roundoff overflow or underflow errors may result depending on how the operands associate For example Z and X Y Z may not have the same numerical significance In some situations the error may not be acceptable Chapter 10 SPARC Parallelization 157 Example Overflow and underflow with and without reduction demo cat t3 f real A 10002 result MAXFLOAT MAXFLOAT r_m
17. Ldir Options The Ldir option adds the dir directory path to the library search list The linker searches for libraries first in any directories specified by the L options and then in the standard directories This option is useful only if it is placed preceding the llibrary options to which it applies Library Search Path and Order Dynamic Linking With dynamic libraries changing the library search path and order of loading differs from the static case Actual linking takes place at runtime rather than build time Chapter 4 Libraries 9 Specifying Dynamic Libraries at Build Time When building the executable file the linker records the paths to shared libraries in the executable itself These search paths can be specified using the Rpath option This is in contrast to the Ldir option which indicates at buildtime where to find the library specified by a 1library option but does not record this path into the binary executable The directory paths that were built in when the executable was created can be viewed using the dump command Example List the directory paths built into a out 77 program f R home proj libs L home proj libs lmylib dump Lv a out grep RPATH RPATH home proj libs opt SUNWspro lib Specifying Dynamic Libraries at Runtime At runtime the linker determines where to find the dynamic libraries that an executable needs from The value of LD_LIBRARY_PATH at runtime The paths that
18. Use of the OPEN statement is optional in those cases where default conventions can be assumed If the first operation on a logical unit is an I O statement other than OPEN or INQUIRE the file fort n is referenced where n is the logical unit number except for 0 5 and 6 which have special meaning These files need not exist before program execution If the first operation on the file is not an OPEN or INQUIRE statement they are created Example The WRITE in the following code creates the file fort 25 if it is the first input output operation on that unit demo cat TestUnit f IU 25 WRITE IU I4 IU END demo The preceding program opens the file fort 25 and writes a single formatted record onto that file demo 77 silent o testunit TestUnit f demo testunit demo cat fort 25 25 demo Passing File Names to Programs The file system does not have any automatic facility to associate a logical unit number in a Fortran program with a physical file However there are several satisfactory ways to communicate file names to a Fortran program Fortran Programming Guide May 2000 22 Via Runtime Arguments and GETARG The library routine get arg 3F can be used to read the command line arguments at runtime into a character variable The argument is interpreted as a file name and used in the OPEN statement FILE specifier demo cat testarg f CHARACTER outfile
19. but must be used with the xmaxopt n flag to set a maximum optimization level See the 77 1 and 95 1 man pages for details Optimization With Runtime Profile Feedback The compiler applies its optimization strategies at level 03 and above much more efficiently if combined with xprofile use With this option the optimizer is directed by a runtime execution profile produced by the program compiled with xprofile collect with typical input data The feedback profile indicates to the compiler where optimization will have the greatest effect This may be particularly important with 05 Here s a typical example of profile collection with higher optimization levels demo 95 o prg fast xprofile collect prg f demo prg demo 595 o prgx fast 05 xprofile use prg profile prg f demo prgx The first compilation in the example generates an executable that produces statement coverage statistics when run The second compilation uses this performance data to guide the optimization of the program See the Fortran User s Guide for details on xprofile options dalign With dalign the compiler is able to generate double word load store instructions whenever possible Programs that do much data motion may benefit significantly when compiled with this option It is one of the options selected by fast The double word instructions are almost twice as fast as the equivalent single word operations Fortran Programmin
20. environment variables 178 explicit criteria 160 Index 209 specifying static or dynamic 57 mixing C and Fortran 191 search order 47 1 Ldir 49 troubleshooting errors 50 lint like checking across routines Xlist 63 listing cross references with 11 56 75 line numbered with diagnostics Xlist 63 XlistL 73 logical unit 19 attached at runtime 24 loop unrolling and portability 117 with unroll 139 1V77 option 61 M m linker option for load map 45 macros with make 37 make 35 38 command 37 macros 37 makefile 35 suffix rules 38 makefile 35 man pages 14 MANPATH path to man pages 14 maps common blocks Xlist 74 equivalence blocks 11 5 74 MAXCPUS directive qualifier 168 177 measuring program performance See performance profiling monitor variables graphically dbx 78 multifile tapes 33 multithreading See parallelization N nonstandard_arithmetic 85 non stopping I O 28 nonstandard coding 4 obscure optimizations 115 precision considerations 110 strip mining 116 time functions 105 troubleshooting guidelines 118 uninitialized variables 115 unrolled loops 117 position independent code pic 56 POSIX bindings libFposix 60 Library 61 pragma See directives preattached logical units 24 preconnected units 21 preprocessor 13 preserve case 186 preserving precision 110 print asa command 13 PRIVATE directive qualifier 168 177
21. for more information Sun compilers can automatically generate multithreaded object code to run on multiprocessor systems The Fortran compilers focus on DO loops as the primary language element supporting parallelism Parallelization distributes the computational work of a loop over several processors without requiring modifications to the Fortran source program 145 The choice of which loops to parallelize and how to distribute them can be left entirely up to the compiler autopar specified explicitly by the programmer with source code directives explicitpar or done in combination parallel Note Programs that do their own explicit thread management should not be compiled with any of the compiler s parallelization options Explicit multithreading calls to libthread primitives cannot be combined with routines compiled with these parallelization options Not all loops in a program can be profitably parallelized Loops containing only a small amount of computational work compared to the overhead spent starting and synchronizing parallel tasks may actually run more slowly when parallelized Also some loops cannot be safely parallelized at all they would compute different results when run in parallel due to dependencies between statements or iterations Implicit loops IF loops and Fortran 95 array syntax for example as well as explicit DO loops are candidates for automatic parallelization by the Fortran com
22. from a particular 50 line computation is the voltage Further assume that the only values that are possible are 5v 0 5v It is possible to carefully arrange each part of the calculation to coerce each sub result to the correct range if computed value is greater than 4 0 return 5 0 if computed value is between 4 0 and 4 0 return 0 if computed value is less than 4 0 return 5 0 Furthermore since Inf is not an allowed value you need special logic to ensure that big numbers are not multiplied Chapter 6 Floating Point Arithmetic 1 IEEE arithmetic allows the logic to be much simpler The computation can be written in the obvious fashion and only the final result need be coerced to the correct value since Inf can occur and can be easily tested Furthermore the special case of 0 0 can be detected and dealt with as you wish The result is easier to read and faster in executing since you don t do unneeded comparisons SPARC Excessive Underflow If two very small numbers are multiplied the result underflows If you know in advance that the operands in a multiplication or subtraction may be small and underflow is likely run the calculation in double precision and convert the result to single precision later For example a dot product loop like this real sum a maxn b maxn do i l n sum sum a i b i enddo where the and b are known to have small elements should
23. sccs create makefile commonblock startupcore f computepts f pattern f demo Checking Files Out and In Once your source code is under SCCS control you use SCCS for two main tasks to check out a file so that you can edit it and to check in a file you have finished editing Check out a file with the sccs edit command For example demo 58008 edit computepts f SCCS then makes a writable copy of computepts 5 in the current directory and records your login name Other users cannot check the file out while you have it checked out but they can find out who has checked it out When you have completed your editing check in the modified file with the sccs delget command For example demo 8008 delget computepts f This command causes the SCCS system to Make sure that you are the user who checked out the file by comparing login names Prompt for a comment from you on the changes Make a record of what was changed in this editing session Delete the writable copy of computepts f from the current directory Replace it by a read only copy with the SCCS keywords expanded The sccs delget command is a composite of two simpler SCCS commands delta and get The delta command performs the first three tasks in the list above the get command performs the last two tasks Fortran Programming Guide May 2000 42 Libraries This chapter describes how to use and create libraries of subprograms Both static and dynamic librari
24. useful introduction to these issues A list of books that cover the subject much more deeply appears at the end of the chapter Optimization and performance tuning is an art that depends heavily on being able to determine what to optimize or tune CHAPTER 9 Choice of Compiler Options Choice of the proper compiler options is the first step in improving performance Sun compilers offer a wide range of options that affect the object code In the default case where no options are explicitly stated on the compile command line most options are off To improve performance these options must be explicitly selected Performance options are normally off by default because most optimizations force the compiler to make assumptions about a user s source code Programs that conform to standard coding practices and do not introduce hidden side effects 135 should optimize correctly However programs that take liberties with standard practices might run afoul of some of the compiler s assumptions The resulting code might run faster but the computational results might not be correct Recommended practice is to first compile with all options off verify that the computational results are correct and accurate and use these initial results and performance profile as a baseline Then proceed in steps recompiling with additional options and comparing execution results and performance against the baseline If numerical results change the
25. 1 end do In the example the variable y has been specified as a variable whose value should be shared among the iterations of the i loop READONLY varlist The READONLY varlist qualifier specifies that all scalars and arrays in the list varlist are read only for the DOALL loop Read only scalars and arrays are a special class of shared scalars and arrays that are not modified in any iteration of the DOALL loop Specifying scalars and arrays as READONLY indicates to the compiler that it does not need to use a separate copy of that scalar variable or array for each thread of the DOALL loop Example Specify a read only variable x 3 CSPAR DOALL SHARE EADONLY x do i 1 b i end do In the preceding example x is a shared variable but the compiler can rely on the fact that its value will not be modified in any iteration of the i loop because of its READONLY specification Chapter 10 SPARC Parallelization 169 STOREBACK varlist A STOREBACK scalar variable or array is one whose value is computed in a DOALL loop The computed value can be used after the termination of the loop In other words the last loop iteration values of storeback scalars or arrays are visible after the DOALL loop Example Specify the loop index variable as storeback CSPAR DOALL PRIVATE x STOREBACK x i do i 1 n x end do 1 Xx In the preceding example both t
26. 8192 coredump blocks 0 nofiles descriptors 64 vmemory kbytes unlimited demo ulimit s 65536 demo ulimit s 65536 Fortran Programming Guide May 2000 152 Each helper thread of a multithreaded program has its own thread stack This stack mimics the initial thread stack but is unique to the thread The thread s PRIVATE arrays and variables local to the thread are allocated on the thread stack The default size is 2 Megabytes on SPARC V9 UltraSPARC platforms 1 Megabyte otherwise The size is set with the STACKSIZE environment variable demo setenv STACKSIZE 8192 lt Set thread stack size to 8 Mb C shell Sori demo STACKSIZE 8192 Bourne Korn Shell demo export STACKSIZE Setting the thread stack size to a value larger than the default may be necessary for some parallelized Fortran codes However it may not be possible to know just how large it should be except by trial and error especially if private local arrays are involved If the stack size is too small for a thread to run the program will abort with a segmentation fault Automatic Parallelization With the autopar and parallel options the 77 and 95 compilers automatically find DO loops that can be parallelized effectively These loops are then transformed to distribute their iterations evenly over the available processors The compiler generates the thread calls needed to make this happen Loop Parallelization The compi
27. Alignments in Bytes Pass by Reference 77 andcc 185 Data Sizes and Alignment in Bytes Pass by Reference 95 andcc 186 Comparing I O Between Fortran and C 190 Passing Simple Data Types 2 Passing COMPLEX Data Types 193 Passing a CHARACTER string 194 Passing a One Dimensional Array 194 Passing a Two Dimensional Array 195 Passing FORTRAN 77 STRUCTURE Records 196 Passing Fortran 95 Derived Types 197 Passing a FORTRAN 77 POINTER 8 Passing Simple Data Arguments by Value FORTRAN 77 Calling C 199 Functions Returning a REAL or float Value 0 Function Returning COMPLEX Data 201 A Function Returning a CHARACTER String 202 Emulating Labeled COMMON 3 Alternate Returns 204 TABLE 10 7 TABLE 10 8 TABLE 11 1 TABLE 11 2 TABLE 11 3 TABLE 11 4 TABLE 11 5 TABLE 11 6 TABLE 11 7 TABLE 11 8 TABLE 11 9 TABLE 11 10 TABLE 11 11 TABLE 11 12 TABLE 11 13 TABLE 11 14 TABLE 11 15 TABLE 11 16 TABLE 11 17 Fortran Programming Guide May 2000 xiv Preface This guide presents the essential information programmers need to develop efficient applications using the Sun WorkShop Fortran compilers 77 Fortran 77 and 95 Fortran 95 It presents issues relating to input output program development use and creation of software libraries program analysis and debugging numerical accuracy porting performance optimization parallelization and the C Fortran interface Discussion of
28. Fortran routine with alternate returns The Sun Fortran implementation returns the int value of the expression on the RETURN statement This is implementation dependent and its use should be avoided TABLE 11 17 Alternate Returns C calls Fortran Running the Example int altret_ int demo ce c tst c main demo 77 o alt alt f tst o alt f int k m altret k 0 demo alt m altret_ amp k 1 2 printf Sd 6 k m The C routine receives the return value 2 from the Fortran routine because it executed the SUBROUTINE ALTRET I RETURN 2 statement INTEGER 1 gt 1 1 1 EQ 0 RETURN 1 IF I GT 0 RETURN 2 RETURN END 204 Fortran Programming Guide May 2000 6 C directive 187 C option 77 CSPAR Sun style directives 165 call in parallelized loops 160 inhibiting optimization 142 passing arguments by reference or value 188 call graph profile gprof 125 call graphs with Xlistc option 72 carriage control 109 case sensitivity 186 C Fortran interface array indexing 188 call arguments and ordering 188 case sensitivity 186 comparing I O 189 compatibility issues 183 function compared to subroutine 184 function names 187 192 passing data by value 198 199 203 sharing I O 203 CHUNKSIZE directive qualifier 178 CMIC Cray style directives 176 command line passing runtime arguments 23 redirection and piping 25 com
29. GETARG library routine 20 23 GETC library routine 33 GETENV library routine 20 23 global program checking strictness setting 74 global program checking See Xlist option gprof command 123 graphically monitor variables dbx 78 GSS directive qualifier 172 GUIDED directive qualifier 178 H help command line 17 Hollerith data 111 IDATE VMS routine 1 IEEE Institute of Electronic and Electrical Engineers 82 IEEE arithmetic standard 82 754 continue with wrong answer 101 exception handling 84 exceptions 83 excessive overflow 102 gradual underflow 84 100 interfaces 85 signal handler 94 underflow handling 84 ieee_flags 83 7 Index 7 extensions and features 12 external C functions 187 names 186 F f77_init 191 f90_init 191 FACTORING directive qualifier 172 fast option 137 features and extensions 12 feedback performance profiling 138 file names on INCLUDE statements 26 passing to programs 22 files internal 29 opening scratch files 21 passing file names to programs 22 110 permissions C Fortran interface 190 preconnected 21 standard error 21 standard input 21 standard output 21 tape 32 fix and continue dbx 78 fln files 65 floating point arithmetic 81 to 111 considerations 100 denormalized number 100 exceptions 83 IEEE 82 underflow 100 See also IEEE arithmetic 82 fns disable underflow 85
30. STRUCTURE ECORD siginfo sip location sip fault address actions you take H H H END This 77 example would have to be modified to run on SPARC V9 architectures xarch v9 or v9a by replacing all INTEGER declarations within each STRUCTURE with INTEGER 8 If the handler routine enabled by ieee_handler is in Fortran as shown in the example the routine should not make any reference to its first argument sig This first argument is passed by value to the routine and can only be referenced as loc sig The value is the signal number Detecting an Exception by Handler The following examples show how to create handler routines to detect floating point exceptions Chapter 6 Floating Point Arithmetic 3 Example Detect exception and abort demo cat DetExcHan f EXTERNAL myhandler REAL r 14 2 8 0 0 1 1 _handler set division myhandler t 8 D EGER FUNCTION myhandler sig code context EGER sig code context 5 ALL abort END demo 77 silent DetExcHan f demo a out abort called Abort core dumped demo HAHHA SIGFPE is generated whenever that floating point exception occurs When the SIGFPE is detected control passes to the myhandler function which immediately aborts Compile with g and use dbx
31. TABLE 4 1 Major Libraries Provided With the Compilers Name libF77 LibF77_mt 1ibM77 libfsu libfui libf ai Llibf77compat libV77 libpfc libsunmath libFposix libposix9 libFposix_c Library 77 functions nonmath 77 functions nonmath multithread safe 77 math library 95 support intrinsics 95 interface 95 array intrinsics libraries 95 77 I O compatibility library VMS library Library used with Pascal Fortran and C Library of Sun math functions POSIX bindings 95 POSIX interface POSIX bindings for extra runtime checking See also the math_libraries README file for more information Fortran Programming Guide May 2000 60 VMS Library The 11077 library is the VMS library which contains two special VMS routines idate and time To use either of these routines include the 1V77 option For idate and time there is a conflict between the VMS version and the version that traditionally is available in UNIX environments If you use the 1V77 option you get the VMS compatible versions of the idate and time routines See the Fortran Library Reference Manual and the FORTRAN 77 Language Reference Manual for details on these routines POSIX Library There are two versions of POSIX bindings provided with Fortran 77 libFposix which is just the bindings 1Fposix libFposix_c which does some runtime checking to make sure you are passing correct handles 1Fposix_c
32. Two Dimensional Arrays 5 Structures 196 Pointers 198 Passing Data Arguments by Value 198 Functions That Return a Value 9 Returning 8 Simple Data Type 0 Returning COMPLEX Data 0 Returning a CHARACTER String 1 Labeled COMMON 203 Contents xi 11 Sharing I O Between Fortran and C 3 Alternate Returns 204 Index 205 Fortran Programming Guide May 2000 Tables READMEs of Interest 6 csh sh ksh Redirection and Piping on the Command Line 26 Major Libraries Provided With the Compilers 0 Xlist Suboptions 70 Summary of xlist Suboptions 71 ieee_flags action mode in out Argument Values 7 ieee_flags Argument Meanings 87 Functions Returning IEEE Values 90 Arguments for ieee_handler action exception handler 92 Fortran Time Functions 106 Summary Nonstandard VMS Fortran System Routines 7 77 Maximum Characters in Data Types 112 Some Effective Performance Options 136 Parallelization Options 9 Sun Style Parallel Directives 0 Recognized Reduction Operations 7 Explicit Parallelization Problems 162 DOALL Qualifiers 168 DOALL SCHEDTYPE Qualifiers 172 xiii TABLE 1 1 TABLE 2 1 TABLE 4 1 TABLE 5 1 TABLE 5 2 TABLE 6 1 TABLE 6 2 TABLE 6 3 TABLE 6 4 TABLE 7 1 TABLE 7 2 TABLE 7 3 TABLE 9 1 TABLE 10 1 TABLE 10 2 TABLE 10 3 TABLE 10 4 TABLE 10 5 TABLE 10 6 DOALL Qualifiers Cray Style 177 DOALL Cray Scheduling 178 Data Sizes and
33. actually linker options but they are recognized by the compiler and passed on to the linker Bdynamic Bstatic Bdynamic sets the preference for shared dynamic binding whenever possible Bstatic restricts binding to static libraries only When both static and dynamic versions of a library are available use this option to toggle between preferences on the command line 77 prog f Bdynamic lwells Bstatic lsurface dy dn Allows or disallows dynamic linking for the entire executable This option may appear on the command line only once dy allows dynamic shared libraries to be linked dn does not allow linking of dynamic libraries Chapter 4 Libraries 57 Binding in 64 Bit Environments Some static system libraries such as libm a and libc a are not available on 64 bit Solaris operating environments These are supplied as dynamic libraries only Use of dn in these environments will result in an error indicating that some static system libraries are missing Also ending the compiler command line with Bstatic will have the same effect To link with static versions of specific libraries use a command line that looks something like 77 o prog prog f Bstatic labc lxyz Bdynamic Here the user s 1180 8 and libxyz a file are linked rather than libabc so or libxyz so and the final Bdynamic insures that the remaining libraries including system libraries and dynamically linked In more complicated
34. address 2 stopi at amp MAIN 0x68 dbx run Run program Running a out process id 18803 stopped in MAIN at 0x11230 MAIN 0x68 fdivs 13 sf 2 2 dbx where Shows the line number of the exception lt 1 MAIN line 7 in LocExcHan F dbx list 7 Displays the source code line 7 CS bes dbx cont Continue after breakpoint enter handler routine Signal 8 code 3 at hex address 11230 abort called signal ABRT Abort in _kill at Oxef6el18a4 _kill 0x8 bgeu _kil1 0x30 Current function is exhandler 24 CALL abort dbx quit demo Of course there are easier ways to determine the source line that caused the error However this example does serve to show the basics of exception handling Disabling All Signal Handlers With 77 some system signal handlers for trapping interrupts bus errors segmentation violations or illegal instructions are automatically enabled by default Although generally you would not want to turn off this default behavior you can do so by compiling a C program that sets the global C variable 77_no_handlers to 1 and linking into your executable program demo cat NoHandlers c int 77_no_handlers 1 demo cc c NoHandlers c demos 77 NoHandlers o MyProgram f Otherwise by default 77_no_handlers is 0 The setting takes effect just before execution is transferred to the user program 96 Fortran Programming Guide May 2000 This variable is in the global n
35. be run in double precision to preserve numeric accuracy real a maxn b maxn double sum do i l n sum sum a i dble b i enddo Doing so may also improve performance due to the software resolution of excessive underflows caused by the original loop However there is no hard and fast rule here experiment with your intensely computational code to determine the most profitable solutions Fortran Programming Guide May 2000 102 Interval Arithmetic The Sun WorkShop 6 Fortran 95 compiler 95 supports intervals as an intrinsic data type An interval is the closed compact set a b z a gt 2 gt b defined by a pair of numbers a gt b Intervals can be used to m Solve nonlinear problems m Perform rigorous error analysis m Detect sources of numerical instability By introducing intervals as an intrinsic data type to Fortran 95 all of the applicable syntax and semantics of Fortran 95 become immediately available to the developer Besides the INTERVAL data types 95 includes the following interval extensions to Fortran 95 m Three classes of INTERVAL relational operators a Certainly Possibly Set Intrinsic INTERVAL specific operators such as INF SUP WID and HULL INTERVAL input output edit descriptors including single number input output Interval extensions to arithmetic trigonometric and other mathematical functions Expression context dependent INTERVAL constants Mixed
36. be worthwhile to evaluate the performance of your program with each Position Independent Code and pic Position independent code PIC can be bound to any address in a program without requiring relocation by the link editor Such code is inherently sharable between simultaneous processes Thus if you are building a dynamic shared library you must compile the component routines to be position independent by using compiler options pic or PIC Fortran Programming Guide May 2000 56 In position independent code each reference to a global item is compiled as a reference through a pointer into a global offset table Each function call is compiled in a relative addressing mode through a procedure linkage table The size of the global offset table is limited to 8 Kbytes on SPARC processors The PIC compiler option is similar to pic but PIC allows the global offset table to span the range of 32 bit addresses There is a more flexible compiler flag with 77 and 95 xcode v for specifying the code address space of a binary object With this compiler flag 32 44 or 64 bit absolute addresses can be generated as well as small and large model position independent code xcode pic13 is equivalent to 016 and xcode pic32 is equivalent to PIC See the 77 1 and 95 1 man pages or the Fortran User s Guide for details Binding Options You can specify dynamic or static library binding when you compile These options are
37. computepts o computepts f commonblock startupcore o startupcore f make uses default rules to compile computepts f and startupcore f Similarly suffix rules for 90 files will also invoke the 95 compiler However there are no suffix rules currently defined for 95 Fortran 95 source files or mod Fortran 95 module files Version Tracking and Control With SCCS SCCS stands for Source Code Control System SCCS provides a way to Keep track of the evolution of a source file its change history a Prevent a source file from being simultaneously changed by other developers m Keep track of the version number by providing version stamps The basic three operations of SCCS are m Putting files under SCCS control Checking out a file for editing Checking in a file This section shows you how to use SCCS to perform these tasks using the previous program as an example Only basic SCCS is described and only three SCCS commands are introduced create edit and delget Controlling Files With SCCS Putting files under SCCS control involves m Making the SCCS directory m Inserting SCCS ID keywords into the files this is optional m Creating the SCCS files Chapter 3 Program Development 9 Making the SCCS Directory To begin you must create the SCCS subdirectory in the directory in which your program is being developed Use this command demo mkdir SCCS SCCS must be in uppercase Inserting SCCS
38. cross checking with summaries line numbers common block maps and equivalence block maps This is the strictest level of checking with maximum error detection 77 Xlistwlnnn Set width of output line to n columns Use Xlistw to set the width of the output line For example lt 11 5601 32 sets the page width to 132 columns The default is 79 Xlistwar nnn Suppress warning nnn in the report Use Xlistwar to suppress a specific warning message from the output reports If nnn is not specified then all warning messages are suppressed from printing For example Xlistwar338 suppresses warning message number 338 To suppress more than one but not all warnings use this option repeatedly Fortran Programming Guide May 2000 74 Show cross reference table and cross routine 56 11 errors XlistX produces a cross reference table and cross routine error list but no source listing Some Examples Using Suboptions Example Use Xlistwarnnn to suppress two warnings from a preceding example demo 77 Xlistwar338 Xlistwar348 XlistE silent Repeat f demo cat Repeat lst FILE Repeat f program repeat 4 CALL nwfrk pni A ERR 418 argument pnl is real but dummy argument is integer 4 See Repeat f line 14 4 CALL nwfrk pni A ERR 317 variable pnl referenced as integer 4 across repeat nwfrk prnok in line 21 but set as real by repeat in line 2 subroutine nwfrk
39. distributed to all available threads At least m iterations must be assigned to each thread There can be one final smaller residual chunk If m is not provided the compiler selects a value Example With 1000 iterations and 655 10 and 4 threads distribute 250 iterations to the first thread then 187 to the second thread then 140 to the third thread and so on Fortran Programming Guide May 2000 Scheduling Type STATIC SELF chunksize FACTORING m GSS m 172 Multiple Qualifiers Qualifiers can appear multiple times with cumulative effect In the case of conflicting qualifiers the compiler issues a warning message and the qualifier appearing last prevails Example A three line Sun style directive note conflicting MAXCPUS SHARED and PRIVATE qualifiers CSPAR DOALL MAXCPUS 4 READONLY S PRIVATE A B X MAXCPUS 2 CSPAR DOALL SHARED B X Y PRIVATE Y Z CSPAR DOALL READONLY T Example A one line equivalent of the preceding three lines CSPAR DOALL MAXCPUS 2 PRIVATE A Y 2Z SHARED B X READONLY S T DOSERIAL Directive The DOSERIAL directive disables parallelization of the specified loop This directive applies to the one loop immediately following it Example Exclude one loop from parallelization do i 1 n CSPAR DOSERIAL do j 1 n do k 1 n end do end do end do In the exampl
40. environment variable will have its search paths altered You might see unexpected results or a degradation in performance Fortran Programming Guide May 2000 48 Library Search Path and Order Static Linking Use the 1library compiler option to name additional libraries for the linker to search when resolving external references For example the option lmy1ib adds the library libmylib so or libmylib a to the search list The linker looks in the standard directory paths to find the additional Libmylib library The L option and the LD_LIBRARY_PATH environment variable creates a list of paths that tell the linker where to look for libraries outside the standard paths Were libmylib a in directory home proj 1libs then the option L home proj libs would tell the linker where to look when building the executable demo 77 o pgram partl o part2 o L home proj libs lmylib Command Line Order for 1library Options For any particular unresolved reference libraries are searched only once and only for symbols that are undefined at that point in the search If you list more than one library on the command line then the libraries are searched in the order in which they are found on the command line Place 1library options as follows m Place the llibrary option after any f for F 95 or o files If you call functions in 1ibx and they reference functions in 11 then place 1x before 1y Command Line Order for
41. had been specified by R at the time the executable file was built As noted earlier use of LD_LIBRARY_PATH can have unexpected side effects and is not recommended Fixing Errors During Dynamic Linking When the dynamic linker cannot locate a needed library it issues this error message ld so prog fatal libmylib so can t open file The message indicates that the libraries are not where they are supposed to be Perhaps you specified paths to shared libraries when the executable was built but the libraries have subsequently been moved For example you might have built a out with your own dynamic libraries in my libs and then later moved the libraries to another directory Fortran Programming Guide May 2000 50 Use 1dd to determine where the executable expects to find the libraries demo ldd a out libsolib so gt export home proj libsolib so libF77 s0 4 gt opt SUNWspro lib libF77 s0 4 libc so 1 gt usr lib libc so 1 libdl so 1 gt usr lib libdl so 1 If possible move or copy the libraries into the proper directory or make a soft link to the directory using 1n s in the directory that the linker is searching Or it could be that LD_LIBRARY_PATH is not set correctly Check that LD_LIBRARY_PATH includes the path to the needed libraries at runtime Creating Static Libraries Static library files are built from precompiled object files 0 files using the ar 1 utility The linker extracts
42. if only 60 of a program s execution runs in parallel the maximum increase in speed is 2 5 independent of the Fortran Programming Guide May 2000 146 number of processors And with just four processors the theoretical speedup for this program assuming maximum efficiency would be just 1 8 and not 4 With overhead the actual speedup would be less As with any optimization choice of loops is critical Parallelizing loops that participate only minimally in the total program execution time has only minimal effect To be effective the loops that consume the major part of the runtime must be parallelized The first step therefore is to determine which loops are significant and to start from there Problem size also plays an important role in determining the fraction of the program running in parallel and consequently the speedup Increasing the problem size increases the amount of work done in loops A triply nested loop could see a cubic increase in work If the outer loop in the nest is parallelized a small increase in problem size could contribute to a significant performance improvement compared to the unparallelized performance Steps to Parallelizing a Program Here is a very general outline of the steps needed to parallelize an application 1 Optimize Use the appropriate set of compiler options to get the best serial performance on a single processor 2 Profile Using typical test data determine the performance profi
43. information about the new features in the current and previous release of Sun WorkShop Contains installation details and other information that was not available until immediately before the final release of Sun WorkShop 6 This document complements the information that is available in the component readme files Explains how to use the new Sampling Collector and Sampling Analyzer with examples and a discussion of advanced profiling topics and includes information about the command line analysis tool er_print the LoopTool and LoopReport utilities and UNIX profiling tools prof gprof and tcov Provides information on using dbx commands to debug a program with references to how the same debugging operations can be performed using the Sun WorkShop Debugging window Acquaints you with the basic program development features of the Sun WorkShop integrated programming environment Document Title About Sun WorkShop 6 Documentation What s New in Sun WorkShop 6 Sun WorkShop 6 Release Notes Analyzing Program Performance With Sun WorkShop 6 Debugging a Program With dbx Introduction to Sun WorkShop TABLE P 3 Document Collection Forte Developer 6 Sun WorkShop 6 Release Documents Forte Developer 6 Sun WorkShop 6 Fortran Programming Guide May 2000 TABLE P 3 Related Sun WorkShop 6 Documentation by Document Collection Continued Description Describes the C compiler opti
44. is no way to tell where one record begins or ends Thus it is impossible to BACKSPACE 8 FORM BINARY file because there is no way of telling where to backspace to A READ ona BINARY file will read as much data as needed to fill the variables on the input list WRITE statement Data is written to the file in binary with as many bytes transferred as specified by the output list Fortran Programming Guide May 2000 28 READ statement Data is read into the variables on the input list transferring as many bytes as required by the list Because there are no record marks on the file there will be no end of record error detection The only errors detected are end of file or abnormal system errors INQUIRE statement INQUIRE on a file opened with FORM BINARY returns FORM BINARY ACCESS SEQUENTIAL DIRECT NO FORMATTED NO UNFORMATTED YES RECL AND NEXTREC are undefined BACKSPACE statement Not allowed returns an error DFILE statement Truncates file at current position as usual EWIND statement Repositions file to beginning of data as usual Internal Files An internal file is an object of type CHARACTER such as a variable substring array element of an array or field of a structured record Internal file READ can be from a constant character string I O on internal files simulates f
45. local lib ltestlib Replacement in a Static Library It is not necessary to recompile an entire library if only a few elements need recompiling The r option of ar permits replacement of individual elements in a static library Example Recompile and replace a single routine in a static library 77 c point f ar r testlib a point o Ordering Routines in a Static Library To order the elements in a static library when it is being built by ar use the commands lorder 1 and tsort 1 demo ar cr mylib a lorder exg o fofx o diffz o tsort Creating Dynamic Libraries Dynamic library files are built by the linker 1d from precompiled object modules that can be bound into the executable file after execution begins Another feature of a dynamic library is that modules can be used by other executing programs in the system without duplicating modules in each program s memory For this reason a dynamic library is also a shared library A dynamic library offers the following features The object modules are not bound into the executable file by the linker during the compile link sequence such binding is deferred until runtime Chapter 4 Libraries 55 a A shared library module is bound into system memory when the first running program references it If any subsequent running program references it that reference is mapped to this first copy Maintaining programs is easier with dynamic libraries Installing an upd
46. m dbx includes a data collection feature that has the same functionality as the Collector The command line utility er_print 1 which prints out an ASCII version of the various Analyzer displays operates as a command line sampling analyzer 121 Details can be found in the Sun WorkShop manual Analyzing Program Performance With Sun WorkShop The time Command The simplest way to gather basic data about program performance and resource utilization is to use the time 1 command or in csh the set time command Running the program with the time command prints a line of timing information on program termination demo time myprog The Answer is 543 01 6 5u 17 18 1 16 31 11 21 354 21010 135pf 0w demo The interpretation is user system wallclock resources memory I O paging user 6 5 seconds in user code approximately system 17 1 seconds in system code for this task approximately wallclock 1 minute 16 seconds to complete resources 31 of system resources dedicated to this program memory 11 Kilobytes of shared program memory 21 kilobytes of private data memory 1 0 354 reads 210 writes m paging 135 page faults 0 swapouts Multiprocessor Interpretation of time Output Timing results are interpreted in a different way when the program is run in parallel in a multiprocessor environment Since bin time accumulates the user time on different threads only wall clock time is used
47. might not be aware that I O could take place within a parallelized loop Consider a user supplied exception handler that prints output when it catches an arithmetic exception like divide by zero If a parallelized loop provokes an exception the implicit I O from the handler may cause I O deadlocks and a system hang In general The library 1ibF77_mt is MT safe but mostly not MT hot m You cannot do recursive nested I O if you compile with mt 164 Fortran Programming Guide May 2000 As an informal definition an interface is MT safe if It can be simultaneously invoked by more than one thread of control The caller is not required to do any explicit synchronization before calling the function The interface is free of data races A data race occurs when the content of an address in memory is being updated by more than one thread and that address is not protected by a lock The value of that memory address is therefore nondeterministic the two threads race to update the thread but in this case the one who gets there last wins An interface is generally called MT hot if the implementation has been tuned for performance advantage using the techniques of multithreading See the Solaris Multithreaded Programming Guide for details Sun Style Parallelization Directives Sun style directives are enabled by default or with the mp sun option when compiling with the explicitpar or parallel options Sun Par
48. of Sun Microsystems Inc in the U S and other countries All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International Inc in the U S and other countries Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems Inc The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems Inc for its users and licensees Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry Sun holds a non exclusive license from Xerox to the Xerox Graphical User Interface which license also covers Sun s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun s written license agreements Sun 90 95 is derived from Cray CF90 a product of Silicon Graphics Inc Federal Acquisitions Commercial Software Government Users Subject to Standard License Terms and Conditions DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS REPRESENTATIONS AND WARRANTIES INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT ARE DISCLAIMED EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID Copyright 2000 Sun Microsystems Inc 901 San Antonio Road Palo Alto CA 94303 4900 Etats Unis Tous droits r serv s Ce produit ou document est distribu avec des l
49. or the man page ieee_handler 3m for instructions on how to trap the various exceptions On most machines these exceptions simply abort the run Two numbers can differ by 6 10 and still have the same floating point form Here is an example of different numbers with the same representation real 4 x y X 99999990e 29 y 99999996e 29 write 10 x x 10 format 99 999 990 10929 e14 8 z8 write 20 y y 20 format 99 999 996 10929 614 8 z8 end Fortran Programming Guide May 2000 118 The output is 99 999 990 10529 0 99999993E 7CFOBDC1 99 999 996 10729 0 99999993E 7CFOBDC1 In this example the difference is 6 x 107 The reason for this indistinguishable wide gap is that in IEEE single precision arithmetic you are guaranteed only six decimal digits for any one decimal to binary conversion You may be able to convert seven or eight digits correctly but it depends on the number Program Fails Without Warning If the program fails without warning and runs different lengths of time between failures then Compile with minimal optimization 01 If the program then works compile only selective routines with higher optimization levels Understand that optimizers must make assumptions about the program Nonstandard coding or constructs can cause problems Almost no optimizer handles all programs at all levels of optimiza
50. program might have questionable code which needs careful analysis to locate and reprogram If performance does not improve significantly or degrades as a result of adding optimization options the coding might not provide the compiler with opportunities for further performance improvements The next step would then be to analyze and restructure the program at the source code level to achieve better performance Performance Option Reference The compiler options listed in the following table provide the user with a repertoire of strategies to improve the performance of a program over default compilation Only some of the compilers more potent performance options appear in the table A more complete list can be found in the Fortran User s Guide TABLE 9 1 Some Effective Performance Options Action Option Uses a combination of optimization options together fast Sets compiler optimization level to n On 0 03 Specifies general target hardware xtarget sys Specifies a particular Instruction Set Architecture xarch isa Optimizes using performance profile data with 05 xprofile use Unrolls loops by n unroll n Permits simplifications and optimization of floating point fsimple 1 2 Performs dependency analysis to optimize loops depend Some of these options increase compilation time because they invoke a deeper analysis of the program Some options work best when routines are collected into files along with the routines that call t
51. specified separated by a colon Typically the LD_LIBRARY_PATH variable contains two lists of colon separated directories separated by a semicolon dirlist1 dirlist2 The directories in dirlist1 are searched first followed by any explicit Ldir directories specified on the command line followed by dirlist2 and the standard directories That is if the compiler is called with any number of occurrences of L as in 77 Lpath1 Lpathn then the search order is dirlist1 path1 pathn dirlist2 standard_paths When the LD_LIBRARY_PATH variable contains only one colon separated list of directories it is interpreted as dirlist2 In the Solaris operating environment a similar environment variable LD_LIBRARY_PATH_64 can be used to override LD_LIBRARY_PATH when searching for 64 bit dependencies See the Solaris Linker and Libraries Guide and the Id 1 man page for details Ona 32 bit SPARC processor LD_LIBRARY_PATH_64 is ignored If only LD_LIBRARY_PATH is defined it is used for both 32 bit and 64 bit linking If both LD_LIBRARY_PATH and LD_LIBRARY_PATH_64 are defined 32 bit linking will be done using LD_LIBRARY_PATH and 64 bit linking with LD_LIBRARY_PATH _64 Note Use of the LD_LIBRARY_PATH environment variable with production software is strongly discouraged Although useful as a temporary mechanism for influencing the runtime linker s search path any dynamic executable that can reference this
52. the compiler command line options and their use can be found in the companion book the Fortran User s Guide This guide is intended for scientists engineers and programmers who have a working knowledge of the Fortran language and wish to learn how to use the Sun Fortran compilers effectively Familiarity with the Solaris operating environment or UNIX in general is also assumed Multiplatform Release This Sun WorkShop Fortran release supports versions 2 6 7 and 8 of the Solaris SPARC Platform Edition Operating Environment See the README files fortran_77 and fortran_95 in the Sun WorkShop READMEs directory for information regarding availability of this release of the 77 and 95 compilers on specific platforms See page 16 Access to Sun WorkShop Development Tools Because Sun WorkShop product components and man pages do not install into the standard usr bin and usr share man directories you must change your PATH and MANPATH environment variables to enable access to Sun WorkShop compilers and tools To determine if you need to set your PATH environment variable Display the current value of the PATH variable by typing echo PATH Review the output for a string of paths containing opt SUNWspro bin If you find the paths your PATH variable is already set to access Sun WorkShop development tools If you do not find the paths set your PATH environment variable by following the ins
53. the form tmp FAAAxnnnnn where nnnnn is replaced by the current process ID AAA is a string of three characters and x is a letter the AAA and x make the file name unique This file is deleted upon termination of the program or execution of a CLOSE statement unless with 77 STATUS KEEP is specified in the CLOSE statement Already Open If the file has already been opened by the program you can use a subsequent OPEN statement to change some of the file s characteristics for example BLANK and FORM In this case you would specify only the file s logical unit number and the parameters to change Preconnected Units Three unit numbers are automatically associated with specific standard I O files at the start of program execution These preconnected units are standard input standard output and standard error m Standard input is logical unit 5 also Fortran 95 unit 100 m Standard output is logical unit 6 also Fortran 95 unit 101 m Standard error is logical unit 0 also Fortran 95 unit 102 Chapter 2 Fortran Input Output 1 Typically standard input receives input from the workstation keyboard standard output and standard error display output on the workstation screen In all other cases where a logical unit number but no FILE name is specified on an OPEN statement a file is opened with a name of the form fort n where n is the logical unit number Opening Files Without an OPEN Statement
54. these basic Xlist suboptions alone TABLE 5 1 Xlist Suboptions Generated Report Option Errors listing cross reference Xlist Errors only XlistE Fortran Programming Guide May 2000 70 Xlist Suboptions Continued TABLE 5 1 Generated Report Option Errors and source listing only XlistL Errors and cross reference table only XlistX Errors and call graph only Xliste The following table summarizes all Xlist suboptions Summary of Xlist Suboptions Action Shows errors listing and cross reference table Shows call graphs and errors f77 only Shows errors Suppresses error nnn in the verification report Produces fast output Puts the f1n files in dir 77 only Shows errors from cross checking stop compilation f77 only Lists and cross checks include files Shows the listing and errors Sets page breaks Renames the Xlist output report file Suppresses unreferenced symbols from cross reference f77 only Sets checking strictness level f77 only Sets the width of output lines f77 only Suppresses warning nnn in the report Shows just the cross reference table and errors TABLE 5 2 Option list no suboption liste listE listerr nnn list listflndir listh listI listL listln to name tS listvn tw nnn twar nnn tX TES lis lis lis Tis Xlist Suboption Reference This section describes the Xlist suboptions As noted some are only a
55. to find the location of the exception Locating an Exception by Handler Example Locate an exception print address and abort 94 Fortran Programming Guide May 2000 95 demo cat LocExcHan F include f 77_floatingpoint h XTERNAL Exhandler INTEGER Exhandler i ieee_handler REAL r 14 2 s 0 0 t C Detect division by zero 1 1 _handler set division Exhandler t 8 END INTEGER FUNCTION Exhandler sig sip uap INTEGER sig STRUCTURE fault INTEGER address END STRUCTURE STRUCTURE siginfo INTEGER si_signo INTEGER si_code INTEGER si_errno RECORD fault fault END STRUCTURI RECORD siginfo sip WRITE 10 sip si_signo sip si_code sip fault address 10 FORMAT Signal i4 code i4 at hex address 28 Exhandler 1 CALL abort END demos 77 silent g LocExcHan F demo a out Signal 8 code 3 at hex address 11230 abort called Abort core dumped demo In SPARC V9 environments replace the INTEGER declarations within each STRUCTURE with INTEGER 8 and the i4 formats with i8 Chapter 6 Floating Point Arithmetic In most cases knowing the actual address of the exception is of little use except with dbx demo dbx a out dbx stopi at 0x11230 Set breakpointat
56. using the 1ibthread primitives do not use any of the compilers parallelization options the compilers cannot parallelize code that has already been parallelized with user calls to the threads library Parallelizable Loops A loop is appropriate for explicit parallelization if It is a DO loop but not a DO WHILE or Fortran 95 array syntax The values of array variables for each iteration of the loop do not depend on the values of array variables for any other iteration of the loop If the loop changes a scalar variable that variable s value is not used after the loop terminates Such scalar variables are not guaranteed to have a defined value after the loop terminates since the compiler does not automatically ensure a proper storeback for them m For each iteration any subprogram that is invoked inside the loop does not reference or change values of array variables for any other iteration The DO loop index must be an integer Scoping Rules Private and Shared A private variable or array is private to a single iteration of a loop The value assigned to a private variable or array in one iteration is not propagated to any other iteration of the loop A shared variable or array is shared with all other iterations The value assigned to a shared variable or array in an iteration is seen by other iterations of the loop If an explicitly parallelized loop contains shared references then you must ensure that
57. want to compile these codes with the following options when porting to an UltraSPARC II platform for example fast xarch v9a xchip ultra2 xtypemap real 64 double 64 integer 64 These options automatically promote all default REAL variables and constants to REAL 8 and COMPLEX to COMPLEX 16 Only undeclared variables or variables declared as simply REAL or COMPLEX are promoted variables declared explicitly for example REAL 4 are not promoted All single precision REAL constants are also promoted to REAL 8 Set xarch and xchip appropriately for the target platform To also promote default DOUBLE PRECISION data to REAL 16 change the double 64 to double 128 in the xtypemap example The xtypemap option is preferred over db1 and r8 and 1 2 See the Fortran User s Guide and the 77 1 or 95 1 man pages for details 110 Fortran Programming Guide May 2000 To further recreate the original mainframe environment it is probably preferable to stop on overflows division by zero and invalid operations Compile the main program with ftrap common to ensure this Data Representation The FORTRAN 77 Language Reference Manual Fortran User s Guide and the Sun Numerical Computation Guide discuss in detail the hardware representation of data objects in Fortran Differences between data representations across systems and hardware platforms usually generate the mo
58. write append to myoutput Redirect standard error to a file Pipe standard output to input of another program Pipe standard error and output to another program See the csh ksh and sh man pages for details on redirection and piping on the command line 77 VAX VMS Logical File Names If you are porting from VMS FORTRAN to FORTRAN 77 the VMS style logical file names in the INCLUDE statement are mapped to UNIX path names The environment variable LOGICALNAMEMAPP ING defines the mapping between the logical names and the UNIX path name If the environment variable LOGICALNAMEMAPP ING is set and the vax 1 or xld compiler options are used the compiler interprets VMS logical file names on the INCLUDE statement The compiler sets the environment variable to a string with the following syntax Inamel path1 Iname2 path2 Fortran Programming Guide May 2000 26 Each Iname is a logical name and each path is the path name of a directory without a trailing All blanks are ignored when parsing this string Any trailing list or nolist is stripped from the file name in the INCLUDE statement Logical names in a file name are delimited by the first colon in the VMS file name The compiler converts file names of the form Iname1 file to path file Uppercase and lowercase are significant in logical names If a logical name is encountered on the INCLUDE statement that was not specified b
59. 0 40 The source files startupcore f computepts f and pattern f can be identified by initialized data of the form CHARACTER 50 SCCSID DATA SCCSID Z M When this file is processed by SCCS compiled and the object file processed by the SCCS what command the following is displayed demos 77 c pattern f demos what pattern pattern pattern f 1 2 96 06 10 You can also create a PARAMETER named CTIME that is automatically updated whenever the file is accessed with get CHARACTER CTIME PARAMETER CTIME INCLUDE files can be annotated with a Fortran comment containing the SCCS stamp Note Use of single letter derived type component names in Fortran 95 source code files can conflict with SCCS keyword recognition For example the Fortran 95 structure component reference X Y Z when passed through SCCS will become xz after an SCCS get Care should be taken not to define structure components with single letters when using SCCS on Fortran 95 programs For example had the structure reference in the Fortran 95 program been to X YY Z the syy would not have been interpreted by SCCS as a keyword reference Alternatively the SCCS get k option will retrieve the file without expanding SCCS keyword IDs Chapter 3 Program Development 1 Creating SCCS Files Now you can put these files under control of SCCS with the SCCS create command demo
60. 0 00 0 00 1 1 init_ 24 0 00 0 00 2 2 output_ 40 0 00 0 00 1 1 input_ 47 On 92 10 99 1000 1000 MAIN_ 3 4 93 2 0 92 10 99 1000 diffr_ 4 1 11 4 52 3000 3000 deriv_ 7 9 21 3000 6000 chebl_ 5 1 17 0 00 3000 3000 dissip_ 8 P29 2 91 3000 6000 deriv_ 7 9 2 91 3000 6000 diffr_ 4 po 65 27 2298 5 81 6000 chebl_ 5 5 81 0 00 6000 6000 fftb_ 6 0 00 0 00 128 321 cos 21 0 00 0 00 128 192 sin 279 568 0 00 6000 6000 chebl_ 5 6 45 5 5 81 0 00 6000 fftb_ 6 0 00 0 00 64 321 cos 21 0 00 0 00 64 192 sin 279 Chapter 8 Performance Profiling 5 The flat profile overview granularity each sample hit covers 2 byte s for 0 08 of 12 84 seconds cumulative self self total time seconds seconds calls ms call ms call name 45 2 5 81 548 1 6000 0 97 0 97 ftb 6 20 1 8 39 235 6000 0 43 1 40 chebl_ 5 9 1 9 56 7 3000 0 39 0 39 dissip_ 8 8 6 10 67 1 3000 O37 1 88 deriv_ 7 Tal 11 58 0 92 1000 0 92 11 91 diffr_ 4 4 8 0 0 62 2001 0 432 0 31 code_ 9 255 1253 0 33 69000 0 00 0 00 __exp 10 0 9 12 64 0 11 1000 0 11 0 11 shock_ 11 a Function Line The function line 5 in the preceding example reveals that cheb1 was called 6000 times 3000 from deriv 3000 from diffr 2 58 seconds were spent in 0601 1 5 81 seconds were spent in routines called by chebl 65 7 of the execution time of the program was within cheb1 Parent Lines The parent lines above 5 indicate that
61. 265 int 10 int int m 20 10 int sum 426 m amp sum SUBROUTINE QREF A TOTAL INTEGER A 10 20 TOTAL DO I 1 10 DO J 1 20 TOTAL TOTAL A I J END DO END DO Chapter 11 C Fortran Interface Fortran calls C REAL Q 10 20 Q 3 5 1 0 CALL FIXQ Q void fixq_ float a 20 10 Structures C and FORTRAN 77 structures and Fortran 95 derived types can be passed to each other s routines as long as the corresponding elements are compatible TABLE 11 9 Passing FORTRAN 77 STRUCTURE Records C calls Fortran struct point float x Vy 2 void 55110 struct point struct point d struct point ptx amp d fflip_ ptx SUBROUTINE FFLIP P STRUCTURE POINT REAL X Y Z END STRUCTURE RECORD POINT P REAL T T P Y T 2 P Z NK XM P Py P lt 2 Fortran calls 6 STRUCTURE POINT REAL X Y Z END STRUCTURE RECORD POINT BASE EXTERNAL FLIP CALL FLIP BASE struct point float x Yy Z void flip_ struct point float t oS lt lt ey lt lt y lt y AP Oe Fortran Programming Guide May 2000 196 TABLE 11 10 Passing Fortran 95 Derived Types Fortran 95 calls C C calls Fortran 95 TYPE point struct point SEQUENCE float x Yy Z REAL y 2 END TYPE point extern void fflip_ TYPE point base struct point
62. 40 C Get first arg as output file name for unit 51 CALL getarg 1l outfile OPEN 51 FILE outfile WRITE 51 Writing to file outfile END demos 77 silent o tstarg testarg f demo tstarg AnyFileName demo cat AnyFileName Writing to file AnyFileName demo Via Environment Variables and GETENV Similarly the library routine getenv 3F can be used to read the value of any environment variable at runtime into a character variable that in turn is interpreted 23 as a file name demo cat testenv f CHARACTER outfile 40 C Get SOUTFILE as output file name for unit 51 CALL getenv OUTFILE outfile OPEN 51 FILE outfile WRITE 51 Writing to file outfile END demo 77 silent o tstenv testenv f demo setenv OUTFILE EnvFileName demo tstenv demo cat EnvFileName Writing to file EnvFileName demo Chapter 2 Fortran Input Output When using getarg or geteny care should be taken regarding leading or trailing blanks FORTRAN 77 programs can use the library function LNBLNK Fortran 95 programs can use the intrinsic function TRIM Additional flexibility to accept relative path names can be programmed along the lines of the FULLNAME function in the example at the beginning of this chapter 77 Logical Unit Preattachment Using IOINIT The library routine IOINIT can also be used with 77 to attach logical units to specific files at runtime IO
63. 7 and 595 to explicitly indicate which loops to parallelize and what strategy to use The Sun WorkShop 6 Fortran compilers will accept both Sun style and Cray style parallelization directives to facilitate porting explicitly parallelized programs from other platforms The Fortran 95 compiler will also accept the OpenMP Fortran parallelization directives The OpenMP Fortran specification is available at http www openmp org The OpenMP directives library routines and environment variables are summarized in Appendix E of the Fortran User s Guide Explicit parallelization of a program requires prior analysis and deep understanding of the application code as well as the concepts of shared memory parallelization DO loops are marked for parallelization by directives placed immediately before them The compiler options parallel or explicitpar must be used for DO loops to be recognized and parallel code generated Parallelization directives are comment lines that tell the compiler to parallelize or not to parallelize the DO loop that follows the directive Directives are also called pragmas Take care when choosing which loops to mark for parallelization The compiler generates threaded parallel code for all loops marked with parallelization directives even if there are data dependencies that will cause the loop to compute incorrect results when run in parallel Chapter 10 SPARC Parallelization 9 If you do your own multithreaded coding
64. 7 program can OPEN a file READONLY or with E or READWRITE 95 supports the READWRITE E READ or WRIT EADWRIT R specifier but not READONLY Fortran tries to open a file with the maximum permissions possible first for both reading and writing then for each separately This event occurs transparently and is of concern only if you try to perform a READ WRITE or ENDFILE operation but you do not have permission Magnetic tape operations are an exception to this general freedom since you can have write permissions on a file but not have a write ring on the tape Fortran Programming Guide May 2000 190 Libraries and Linking With the 577 or 95 Command To link the proper Fortran and C libraries use the 77 or 95 command to invoke the linker Example 1 Use 77 to link demo cc c someCroutine c demo 95 theF95routine f someCroutine o lt The linking step demo a out 4 0 4 5 8 0 9 0 demo Fortran Initialization Routines Main programs compiled by 77 and 95 call dummy initialization routines 77_init or 90_init in the library at program start up The routines in the library are dummies that do nothing The calls the compilers generate pass pointers to the program s arguments and environment These calls provide software hooks the programmer can use to supply their own routines in C to initialize their program in any cus
65. Analyzing Program Performance with Sun WorkShop Sun Microsystems Inc FORTRAN Optimization by Michael Metcalf Academic Press 1985 High Performance Computing by Kevin Dowd O Reilly amp Associates 1993 144 Fortran Programming Guide May 2000 CHAPTER 10 SPARC Parallelization This chapter presents an overview of multiprocessor parallelization and describes the capabilities of Sun s Fortran compilers Implementation differences between 77 and 95 are noted Note Fortran parallelization features require a Sun WorkShop HPC license Essential Concepts Parallelizing or multithreading an application compiles the program to run on a multiprocessor system or in a multithreaded environment Parallelization enables a single task such as a DO loop to run over multiple processors or threads with a potentially significant execution speedup Before an application program can be run efficiently on a multiprocessor system like the Ultra 60 Sun Enterprise Server 6500 or Sun Enterprise Server 10000 it needs to be multithreaded That is tasks that can be performed in parallel need to be identified and reprogrammed to distribute their computations across multiple processors or threads Multithreading an application can be done manually by making appropriate calls to the Libthread primitives However a significant amount of analysis and reprogramming might be required See the Solaris Multithreaded Programming Guide
66. C Fortran language support The compilers are components of the Sun Performance WorkShop 6 The Fortran 90 compiler 90 of previous releases of the Sun Performance WorkShop has been renamed 95 in Sun WorkShop 6 The 90 command is now an alias for 95 both invoke the Sun Performance WorkShop 6 Fortran 95 compiler CHAPTER 1 Standards Conformance 77 was designed to be compatible with the ANSI X3 9 1978 Fortran standard and the corresponding International Organization for Standardization ISO 1539 1980 as well as standards FIPS 69 1 BS 6832 and MIL STD 1753 m 95 was designed to be compatible with the ANSI X3 198 1992 ISO IEC 1539 1991 and ISO IEC 1539 1997 standards documents Floating point arithmetic for both compilers is based on IEEE standard 754 1985 and international standard IEC 60559 1989 m On SPARC platforms both compilers provide support for the optimization exploiting features of SPARC V8 and SPARC V9 including the UltrasPARC implementation These features are defined in the SPARC Architecture Manuals Version 8 ISBN 0 13 825001 4 and Version 9 ISBN 0 13 099227 5 published by Prentice Hall for SPARC International In this document Standard means conforming to the versions of the standards listed above Non standard or Extension refers to features that go beyond these versions of these standards The responsible standards bodies may revise these standards from
67. CHARACTER 9 out ieeer ieee_flags get exception out PRINT out flag raised Also to determine if the overflow exception flag is raised set the input argument in to overflow On return if out equals overflow then the overflow exception flag is raised otherwise it is not raised ieeer ieee_flags get exception overflow out IF out eq overflow PRINT overflow flag raised Example Clear the invalid exception ieeer ieee 51805 clear exception invalid out Example Clear all exceptions ieeer ieee_flags clear exception 811 out Example Set rounding direction to zero ieeer ieee_flags set direction tozero out Example Set rounding precision to double ieeer ieee_flags set precision double out Turning Off All Warning Messages With ieee_flags Calling ieee_flags with an action of clear as shown in the following example resets any uncleared exceptions Put this call before the program exits to suppress system warning messages about floating point exceptions at program termination Fortran Programming Guide May 2000 88 Example Clear all accrued exceptions with ieee_flags i ieee_flags clear exception all out Detecting an Exception With ieee_flags The following example demonstrates how to determine which floating point exceptions have been raised by earlier
68. Control With SCCS 39 Controlling Files With SCCS 9 Checking Files Out and In 42 4 Libraries 3 Understanding Libraries 43 Specifying Linker Debugging Options 44 Generating a Load Map 44 Listing Other Information 45 Fortran Programming Guide May 2000 Consistent Compiling and Linking 6 Setting Library Search Paths and Order 47 Search Order for Standard Library Paths 47 LD_LIBRARY_PATH Environment Variable 48 Library Search Path and Order Static Linking 49 Library Search Path and Order Dynamic Linking 49 Creating Static Libraries 51 Tradeoffs for Static Libraries 1 Creation of a Simple Static Library 52 Creating Dynamic Libraries 55 Tradeoffs for Dynamic Libraries 56 Position Independent Code and pic 56 Binding Options 57 Naming Conventions 58 A Simple Dynamic Library 58 Libraries Provided with Sun Fortran Compilers 60 VMS Library 61 POSIX Library 61 Shippable Libraries 2 Program Analysis and Debugging 63 Global Program Checking Xlist 63 GPC Overview 63 How to Invoke Global Program Checking 64 Some Examples of Xlist and Global Program Checking 66 Suboptions for Global Checking Across Routines 70 Xlist Suboption Reference 71 Some Examples Using Suboptions 75 Contents vii Special Compiler Options 77 Subscript Bounds C 77 Undeclared Variable Types u 77 Version Checking v 78 Interactive Debugging With dbx and Sun WorkShop 8 77 Viewing Compiler Listing Diagnostics 9 6 Floating Point Arit
69. EXTERNAL flip 8 aa struct point d CALL flip base struct point ptx amp d Sore 2 ee fflip_ ptx struct point ous float x y z SUBROUTINE FFLIP P void flip_ struct point TYPE POINT REALS 0 X Yj 2 float t END TYPE POINT Ee Sy TYPE POINT P Yoo gt Se yy REAL ts T Rome ys E T P X 5 209007 a PSX PSY T P Z F Pee Chapter 11 C Fortran Interface 197 Pointers A FORTRAN 77 pointer can be passed to a C routine as a pointer to a pointer because the Fortran routine passes arguments by reference TABLE 11 11 Passing a FORTRAN 77 POINTER C calls Fortran extern void fpass_ p2x float x float p2x p2x amp x fpass_ p2x SUBROUTINE FPASS P2X REAL X POINTER P2X X 0 Fortran calls 6 REAL X POINTER P2X X EXTERNAL PASS P2X MALLOC64 4 0 CALL PASS P2X void pass_ x ant FEX AS St LO ONL C pointers are compatible with Fortran 95 scalar pointers but not array pointers Passing Data Arguments by Value Call by value is available only for simple data with FORTRAN 77 and only by Fortran routines calling C routines There is no way for a C routine to call a Fortran routine and pass arguments by value It is not possible to pass arrays character strings or structures by value These are best passed by reference Use the nonstandard
70. Fortran function VAL arg as an argument in the call Fortran Programming Guide May 2000 198 In the following example the Fortran routine passes x by value and y by reference The C routine incremented both x and y but only y is changed TABLE 11 12 Passing Simple Data Arguments by Value FORTRAN 77 Calling C Fortran 77 calls C REAL y Ls y 0 T Sry CALL value VAL x y PRINT void value_ float x float y Compiling and running produces output 1 00000 0 x and y from Fortran 1 000000 0 000000 xand y from C 2 000000 1 000000 new x and y from C 1 00000 1 00000 new x and y from Fortran Functions That Return a Value A Fortran function that returns a value of type BYTE FORTRAN 77 only INTEGER REAL LOGICAL DOUBLE PRECISION or REAL 16 SPARC only is equivalent to a C function that returns a compatible type see TABLE 11 1 and TABLE 11 2 There are two extra arguments for the return values of character functions and one extra argument for the return values of complex functions Chapter 11 C Fortran Interface 9 Returning a Simple Data Type The following example returns a REAL or float value BYTE INTEGER LOGICAL DOUBLE PRECISION and REAL 16 are treated in a similar way TABLE 11 13 Functions Returning a REAL or float Value C calls Fortran float r s extern float faddl_ 8 0 s faddl_ amp r real function 18001 p real p fa
71. ID Keywords Some developers put one or more SCCS ID keywords into each file but that is optional These keywords are later identified with a version number each time the files are checked in with an SCCS get or delget command There are three likely places to put these strings Comment lines Parameter statements Initialized data The advantage of using keywords is that the version information appears in the source listing and compiled object program If preceded by the string the keywords in the object file can be printed using the what command Included header files that contain only parameter and data definition statements do not generate any initialized data so the keywords for those files usually are put in comments or in parameter statements In some files like ASCII data files or makefiles the SCCS information will appear in comments SCCS keywords appear in the form keyword and are expanded into their values by the SCCS get command The most commonly used keywords are le 2 oe expands to the identifier string recognized by the what command expands to the name of the source file expands to the version number of this SCCS maintained file expands to the current date oe le oe le 1 o9 ole For example you could identify the makefile with a make comment containing these keywords oe N oe ole 5 oe ole H ole ole ole Fortran Programming Guide May 200
72. INIT looks in the environment for names of a user specified form and then opens the corresponding logical unit for sequential formatted I O Names must be of the general form PREFIXnn where the particular PREFIX is specified in the call to IOINIT and mn is the logical unit to be opened Unit numbers less than 10 must include the leading 0 See the Sun Fortran Library Reference and the IOINIT 3F man page The IOINIT facility is not implemented for 95 Example Associate physical files test inp and test out in the current directory to logical units 1 and 2 First set the environment variables With ksh or sh demo TST01 inil inp demo TSTO2 inil out demo export TSTO1 TSTO2 With csh demo setenv TSTO1 inil inp demo setenv TSTO2 inil out 24 Fortran Programming Guide May 2000 The program 111 5 reads 1 and writes 2 demo cat inil f CHARACTER PRFX 8 LOGICAL CCTL BZRO APND VRBOSE DATA CCTL BZRO APND PRFX VRBOSE amp TRUE FALSE FALSE TST FALSE CALL IOINIT CCTL 2220 APND PRFX VRBOSE READ 1 I B N RITE 2 I B N D AS demo With environment variables and ioinit inil f reads inil inp and writes to inil out demo cat inil inp 12 3 14159012 6 demo 577 silent o tstinit inil f demo tstinit demo cat inil out 12 3 14159 6 demo IOINIT is adequate for most programs as written However it is writte
73. R 4 4 8 4 8 4 8 16 4 8 4 4 4 8 4 8 16 RENE R BRN RENE R BRN C Fortran Interface a o RAR NeR ORANA F amp F OK B CO Chapter 11 FORTRAN 77 and C Data Types TABLE 11 1 shows the sizes and allowable alignments for FORTRAN 77 data types It assumes no compilation options affecting alignment or promoting default data sizes are applied See also the FORTRAN 77 Language Reference Manual C Data Type char x unsigned char x unsigned char x n strict float ripy struct float Hii ay struct double dr di x struct double dr di x struct long double dr di x double x float x float x double x long double x int x short x int x long long int x int x char x short x int x long long int x TABLE 11 1 FORTRAN 77 Data Type X BYTE CHARACTER CHARACTER n X X COMPLEX X COMPLEX 8 DOUBLE COMPLEX X COMPLEX 16 X COMPLEX 32 X DOUBLE PRECISION X x x x x x x REAL X REAL 4 X REAL 8 X REAL 16 X INTEGER X INTEGER 2 INTEGER 4 INTEGER 8 LOGICAL X LOGICAL 1 LOGICAL 2 LOGICAL 4 LOGICAL 8 SPARC Fortran 95 and C Data Types The following table similarly compares the Fortran 95 data types with C TABLE 11 2 Data Sizes and Alignment in Bytes Pass by Reference 95 and cc Fortran 95 Data Type C Data Type Size Alignment CHARACTER x unsigned char x 1 1 CHA
74. R DOALL do i 1 n In the example since the scalar variable y has been equivalenced to a 1 we have a conflict with y as private and a as shared by default leading to possibly erroneous results when the parallelized i loop is executed No diagnostic is issued in these situations You can fix the example by using CSPAR DOALL PRIVATE y Cray Style Parallelization Directives Parallel directives have two forms Sun style and Cray style The 77 and 95 default is Sun style mp sun To use Cray style directives you must compile with mp cray Mixing program units compiled with both Sun and Cray directives can produce incorrect results A major difference between Sun and Cray directives is that Cray style requires explicit scoping of every scalar and array in the loop as either SHARED or PRIVATE The following table shows Cray style directive syntax MICS DOALL IMIC amp SHARED vl v2 IMIC amp PRIVATE ul u2 optional qualifiers Cray Directive Syntax A parallel directive consists of one or more directive lines A directive line is defined with the same syntax as Sun style page 165 except The sentinels are 010 MICS or MICS but only MIC is recognized with 95 free format Fortran Programming Guide May 2000 176 m Every variable or array referenced in the loop appears in a SHARED or PRIVATE qualifier The Cray directives are similar to Sun styl
75. RACTER LEN n x unsigned char x n 1 1 COMPLEX x Struck 51086 r i x 8 4 COMPLEX KIND 4 x struct float r i x 8 4 COMPLEX KIND 8 x struct double dr di x 16 4 8 COMPLEX KIND 16 x struct long double dr di x 32 4 8 16 DOUBLE COMPLEX x struct double dr di x 16 4 8 DOUBLE PRECISION x double x 8 4 REAL x float x 4 4 REAL KIND 4 x float x 4 4 REAL KIND 8 x double x 8 4 8 REAL KIND 16 x long double x 16 4 8 16 INTEGER x int x 4 4 INTEGER KIND 1 x signed char x 1 4 INTEGER KIND 2 x short x 2 4 INTEGER KIND 4 x Int xo 4 4 INTEGER KIND 8 x long long int x 8 4 LOGICAL x Ine 4 4 LOGICAL KIND 1 x signed char x 1 4 LOGICAL KIND 2 x short x 2 4 LOGICAL KIND 4 x ie Zag 4 4 LOGICAL KIND 8 x long long int x 8 4 Case Sensitivity C and Fortran take opposite perspectives on case sensitivity C is case sensitive case matters m Fortran ignores case The 77 and 95 default is to ignore case by converting subprogram names to lowercase It converts all uppercase letters to lowercase letters except within character string constants 186 Fortran Programming Guide May 2000 There are two usual solutions to the uppercase lowercase problem In the C subprogram make the name of the C function all lowercase Compile the Fortran program with the U option which tells the compiler to preserve existing uppercase lowercase di
76. Since the user time displayed includes the time spent on all the processors it can be quite large and is not a good measure of performance A better measure is the real time which is the wall clock time This also means that to get an accurate timing of a parallelized program you must run it on a quiet system dedicated to just your program Fortran Programming Guide May 2000 122 The gprof Profiling Command The gprof 1 command provides a detailed postmortem analysis of program timing at the subprogram level including how many times a subprogram was called who called it whom it called and how much time was spent in the routine and by the routines it called To enable gprof profiling compile and link the program with the pg option demo 77 o Myprog fast pg Myprog f demo Myprog demo gprof Myprog The program must complete normally for gprof to obtain meaningful timing information At program termination the file gnon out is automatically written in the working directory This file contains the profiling data that will be interpreted by gprof Invoking gprof produces a report on standard output An example is shown on the next pages Not only the routines in your program are listed but also the library procedures and the routines they call The report is mostly two profiles of how the total time is distributed across the program procedures the call graph and the flat profile They are preceded by an exp
77. Three major approaches are worth mentioning here Removing I O From Key Loops I O within a loop or loop nest enclosing the significant computational work of a program will seriously degrade performance The amount of CPU time spent in the I O library might be a major portion of the time spent in the loop I O also causes process interrupts thereby degrading program throughput By moving I O out of the computation loop wherever possible the number of calls to the I O library can be greatly reduced Eliminating Subprogram Calls Subroutines called deep within a loop nest could be called thousands of times Even if the time spent in each routine per call is small the total effect might be substantial Also subprogram calls inhibit optimization of the loop that contains them because the compiler cannot make assumptions about the state of registers over the call Automatic inlining of subprogram calls using inline x y z or 04 is one way to let the compiler replace the actual call with the subprogram itself pulling the subprogram into the loop The subprogram source code for the routines that are to be inlined must be found in the same file as the calling routine There are other ways to eliminate subprogram calls Fortran Programming Guide May 2000 142 Use statement functions If the external function being called is a simple math function it might be possible to rewrite the function as a statement function or set of st
78. VATE A This one line directive is equivalent to the three line directive that follows CSPAR DOALL CSPAR amp SHARED I CSPAR amp PRIVATE A K X V TASKCOMMON Directive The TASKCOMMON directive declares variables in a global COMMON block as thread private Every variable declared in a common block becomes a private variable to the thread but remains global within the thread Only named COMMON blocks can be declared TASKCOMMON The syntax of the directive is CSPAR TASKCOMMON common_block_name The directive must appear immediately before or after every COMMON declaration for that named block This directive is effective only when compiled with explicitpar or parallel Otherwise the directive is ignored and the block is treated as a regular COMMON block 166 Fortran Programming Guide May 2000 Variables declared in TASKCOMMON blocks are treated as thread private variables in all the DOALL loops and routines called from within the DOALL loops Each thread gets its own copy of the COMMON block so data written by one thread is not directly visible to other threads During serial portions of the program accesses are to the initial thread s copy of the COMMON block Variables in TASKCOMMON blocks should not appear on any DOALL qualifiers such as PRIVATE SHARED READONLY and so on It is an error to declare a common block as task common in some but not all compilation
79. Viewing Compiler Listing Diagnostics Use the error utility program to view compiler diagnostics merged with the source code error inserts compiler diagnostics above the relevant line in the source file The diagnostics include the standard compiler error and warning messages but not the Xlist error and warning messages Note The error utility rewrites your source files and does not work if the source files are read only or are in a read only directory error 1 is included as part of a developer installation of the Solaris operating environment it can also be installed from the package SUNWbtool Facilities also exist in the Sun WorkShop for viewing compiler diagnostics See Introduction to Sun WorkShop Chapter5 Program Analysis and Debugging 9 80 Fortran Programming Guide May 2000 Floating Point Arithmetic This chapter considers floating point arithmetic and suggests strategies for avoiding and detecting numerical computation errors For a detailed examination of floating point computation on SPARC and x86 processors see the Sun Numerical Computation Guide Introduction Sun s floating point environment on SPARC and x86 implements the arithmetic model specified by the IEEE Standard 754 for Binary Floating Point Arithmetic This environment enables you to develop robust high performance portable numerical applications It also provides tools to investigate any unusual behavior by a numerical program In
80. X and Y Example Direct access formatted OPEN 2 FILE inven db ACCESS DIRECT RECL 200 amp FORM FORMATTED ERR 90 READ 2 FMT 110 F10 3 REC 13 ERR 30 Y This program opens a file for direct access formatted I O with a fixed record length of 200 bytes It then reads the thirteenth record and converts it with the format I10 F10 3 For formatted files the size of the record written is determined by the FORMAT statement In the preceding example the FORMAT statement defines a record of 20 characters or bytes More than one record can be written by a single formatted write if the amount of data on the list is larger than the record size specified in the FORMAT statement In such a case each subsequent record is given successive record numbers Example Direct access formatted multiple record write OPEN 21 ACCESS DIRE ECL 200 FORM FORMATTE WRITE 21 10F10 3 RI X J J 1 100 The write to direct access unit 21 creates 10 records of 10 elements each since the format specifies 10 elements per record these records are numbered 11 through 20 Binary I O Sun Workshop Fortran 95 and Fortran 77 extend the OPEN statement to allow declaration of a binary I O file Opening a file with FORM BINARY has roughly the same effect as FORM UNFORMATTED except that no record lengths are embedded in the file Without this data there
81. accrued 89 debugging 97 to 99 detecting 94 IEEE 83 ieee_handler 91 messages 83 suppressing warnings with 1666 518685 88 trapping with ftrap mode option 83 compile viewing source listing with diagnostics 79 coverage analysis See tcov cross reference table Xlist 75 D dalign option 138 data alignment Fortran 77 vs C 185 Hollerith 111 inspection dbx 78 maximum characters in data types 111 representation 111 sizes C vs Fortran 77 185 data dependency apparent 155 parallelization 148 restructuring to eliminate 149 data race defined 165 date VMS 107 dbx 121 dd conversion utility 32 debug 63 to 79 arguments agree in number and type 63 common blocks agree in size and type 63 compiler options 77 dbx 78 exceptions 97 index check of arrays 77 linker debugging aids 45 parameters agree globally 63 segmentation fault 77 subscript array bounds checking 77 utilities 13 declared but unused checking Xlist 64 denormalized number 100 depend option 139 diagnostics source 79 direct I O 27 to internal files 29 directives C Cinterface 187 OPT n optimization levels 138 parallelization 206 Fortran Programming Guide May 2000 fsplit Fortran utility 3 ftrap mode option 83 function compared to subroutine 184 data type of checking Xlist 64 names Fortran vs C 186 unused checking Xlist 64 used as a subroutine checking Xlist 64 G 6 option 59
82. acter variables You can then use internal I O to fill and empty these buffers This facility does not integrate with the rest of Fortran I O and even has its own set of tape logical units Refer to the man pages for complete information Chapter 2 Fortran Input Output 1 Fortran Formatted I O for Tape The Fortran I O statements provide facilities for transparent access to formatted sequential files on magnetic tape There is no limit on formatted record size and records may span tape blocks Fortran Unformatted I O for Tape Using the Fortran I O statements to connect a magnetic tape for unformatted access is less satisfactory The implementation of unformatted records implies that the size of a record plus eight characters of overhead cannot be bigger than the buffer size As long as this restriction is complied with the I O system does not write records that span physical tape blocks writing short blocks when necessary This representation of unformatted records is preserved even though it is inappropriate for tapes so that files can be freely copied between disk and tapes Since the block spanning restriction does not apply to tape reads files can be copied from tape to disk without any special considerations Tape File Representation A Fortran data file is represented on tape by a sequence of data records followed by an endfile record The data is grouped into blocks with maximum block s
83. age control 109 data representation issues 111 format edit descriptors 109 Hollerith data 111 initializing with Hollerith 111 210 Fortran Programming Guide May 2000 error 21 input 21 output 21 redirection and piping 25 standard_arithmetic 85 standards 11 statement checking 11 8 64 static libraries See libraries static STATIC directive qualifier 172 stdio C Fortran interface 189 STOREBACK directive qualifier 170 strip mining degrades portability 116 subroutine compared to function 184 names 186 unused checking Xlist 64 used as a function checking Xlist 64 suffix rules in make 38 summing and reduction automatic parallelization 156 Sun Performance Library 142 Sun WorkShop Performance Analyzer 121 SUNW_MP_THR_IDLE environment variable 179 suppress unreferenced identifiers 11 56 74 warnings Xlist 74 T tab format 4 tape I O 31 end of file 33 files 32 multifile 33 target specifying hardware 140 task common 166 TASKCOMMON directive 166 tcov 127 and inlining 128 new style xprofile tcov option 130 old style a option 128 Index 1 referenced but not declared checking Xlist 64 retrospective summary of exceptions 97 roundoff with reduction operations 157 runtime arguments to program 23 5 SAVELAST directive qualifier 171 177 scalar defined 154 SCCS checking in files 42 checking out files 42 creating fil
84. ake Command The make command can be invoked with no arguments simply demo make The make utility looks for a file named makefile or Makefile in the current directory and takes its instructions from that file The make utility m Reads makefile to determine all the target files it must process the files they depend on and the commands needed to build them m Finds the date and time each file was last changed m Rebuilds any target file that is older than any of the files it depends on using the commands from makefile for that target Macros The make utility s macro facility allows simple parameterless string substitutions For example the list of relocatable files that make up the target program pattern can be expressed as a single macro string making it easier to change A macro string definition has the form NAME string Use of a macro string is indicated by NAME which is replaced by make with the actual value of the macro string This example adds a macro definition naming all the object files to the beginning of makefile OBJ pattern o computepts o startupcore o Chapter 3 Program Development 7 Now the macro can be used in both the list of dependencies as well as on the 7 link command for target pattern in makefile pattern 5 050 77 S OBJ lcore77 lcor lsunwindow lpixrect o pattern For macro strings with single letter names the parentheses may be omitted Ove
85. allelization Directives Syntax A parallel directive consists of one or more directive lines A Sun style directive line is defined as follows CSPAR Directive Qualifiers lt Initial directive line CSPAR amp More_Qualifiers lt Optional continuation lines A directive line is case insensitive A directive line begins with a five character sentinel CSPAR PAR or SPAR With 77 and 95 fixed format a An initial directive line has a blank in column 6 a A continuation directive line has a nonblank in column 6 a Columns beyond 72 are ignored unless the e option is specified With 95 free format Leading blanks are allowed before the setinel The only sentinel recognized is SPAR Qualifiers if any follow directives on the same line or continuation lines Multiple qualifiers on one line are separated by commas Spaces before after or within a directive or qualifier are ignored Chapter 10 SPARC Parallelization 165 The Sun style parallel directives are Directive Action TASKCOMMON Declares variables in a COMMON block to be thread private DOALL Parallelizes the next loop DOSERIAL Does not parallelize the next loop DOSERIAL Does not parallelize the next nest of loops Examples of Sun style parallel directives CSPAR TASKCOMMON ALPHA Declare block private COMMON ALPHA BZ BY 100 CSPAR DOALL No qualifiers CSPAR DOSERIAL CSPAR DOALL SHARED I K X V PRI
86. ame space of the program do not use 77_no_handlers as the name of a variable anywhere else in the program With 95 no signal handlers are on by default Retrospective Summary The ieee_retrospective function queries the floating point status registers to find out which exceptions have accrued and a message is printed to standard error to inform you which exceptions were raised but not cleared This function is automatically called by Fortran 77 programs at normal program termination CALL EXIT The message typically looks like this the format may vary with each compiler release Note IEEE floating point exception flags raised Division by Zero IEEE floating point exception traps enabled inexact underflow overflow invalid operation See the Numerical Computation Guide 16066 118889 3M ieee_handler 3M Fortran 95 programs do not call 1666 2662080606 16 automatically A Fortran 95 program would need to call ieee_retrospective explicitly and link with 1 77compat Debugging IEEE Exceptions In most cases the only indication that any floating point exceptions such as overflow underflow or invalid operation have occurred is the retrospective summary message at program termination Locating where the exception occurred requires exception trapping be enabled This can be done by either compiling with the ftrap common option or by establishing an exception handler rou
87. an Programming Guide May 2000 82 Also five types of floating point exception are identified Invalid Operations with mathematically invalid operands for example 0 0 0 0 sqrt 1 0 and log 37 8 Division by zero Divisor is zero and dividend is a finite nonzero number for example 9 9 0 0 Overflow Operation produces a result that exceeds the range of the exponent for example MAXDOUBLE 0 0000000000001e308 Underflow Operation produces a result that is too small to be represented as a normal number for example MINDOUBLE MINDOUBLE m Inexact Operation produces a result that cannot be represented with infinite precision for example 2 0 3 0 log 1 1 and 0 1 in input The implementation of the IEEE standard is described in the Sun Numerical Computation Guide ftrap mode Compiler Options The 56 rap mode option enables trapping for floating point exceptions If no signal handler has been established by an ieee_handler call the exception terminates the program with a memory dump core file See Fortran User s Guide for details on this compiler option For example to enable trapping for overflow division by zero and invalid operations compile with 6 rap common Note You must compile the application s main program with ftrap for trapping to be enabled Floating Point Exceptions and Fortran Programs compiled by 77 automatically display a list of accrued floating point
88. and tuning for one particular architecture can cause degradations in performance elsewhere This is discussed later in the chapters on performance and tuning However the following issues are worth considering with regards to porting in general 114 Uninitialized Variables Some systems automatically initialize local and COMMON variables to zero or some not a number NaN value However there is no standard practice and programs should not make assumptions regarding the initial value of any variable To assure maximum portability a program should initialize all variables Aliasing Across Calls Aliasing occurs when the same storage address is referenced by more than one name This happens when actual arguments to a subprogram overlap between themselves or between COMMON variables within the subprogram For example arguments X and Z refer to the same storage locations as do B and H COMMON INS B 100 REAL S 100 T 100 CALL SUB S T SB 100 SUBROUTINE SUB X Y 2Z H N REAL X N Y N Z N H N COMMON INS B 100 Avoid aliasing in this manner in all portable code The results on some systems and with higher optimization levels could be unpredictable Obscure Optimizations Legacy codes may contain source code restructurings of ordinary computational DO loops intended to cause older vectorizing compilers to generate optimal code for a particular architecture In most cases these restructurings are no lo
89. andard I O file structure For the stdin stdout and stderr streams the file structure need not be explicitly referenced so it is possible to share them If a Fortran main program calls C to do I O the Fortran I O library must be initialized at program startup to connect units 0 5 and 6 to stderr stdin and stdout respectively The C function must take the Fortran I O environment into consideration to perform I O on open file descriptors Chapter 11 C Fortran Interface 203 However if a C main program calls a Fortran subprogram to do I O the automatic initialization of the Fortran I O library to connect units 0 5 and 6 to stderr stdin and stdout is lacking This connection is normally made by a Fortran main program If a Fortran function attempts to reference the stderr stream unit 0 without the normal Fortran main program I O initialization output will be written to fort 0 instead of to the stderr stream The C main program can initialize Fortran I O and establish the preconnection of units 0 5 and 6 by calling the f _init FORTRAN 77 library routine at the start of the program and optionally 5 681 at termination Remember even though the main program is in C you should link with 77 Alternate Returns Fortran s alternate returns mechanism is obsolescent and should not be used if portability is an issue There is no equivalent in C to alternate returns so the only concern would be for a C routine calling a
90. andler set overflow SIGFPE_ABORT demo 77 g silent myprog F demo dbx a out Reading symbolic information for a out Reading symbolic information for rtld usr lib ld so 1 Reading symbolic information for libF77 so 3 Reading symbolic information for libc so 1 Reading symbolic information for libdl so 1 dbx catch FPE dbx run Running a out process id 19793 signal FPE floating point overflow in MAIN at line 55 in file myprog F 55 w rmax 200 Cause of the overflow dbx cont Continue execution to completion Note IEEE floating point exception flags raised Inexact Division by Zero Underflow There were other exceptions floating point exception traps enabled overflow See the Numerical Computation Guide execution completed exit code is 0 dbx H 1 1 To be selective the example introduces the 1 2611166 which required renaming the source file with a F suffix and calling ieee_handler You could go further and create your own handler function to be invoked on the overflow exception to do some application specific analysis and print intermediary or debug results before aborting Chapter 6 Floating Point Arithmetic 99 Further Numerical Adventures This section addresses some real world problems that involve arithmetic operations that may unwittingly generate invalid division by zero overflow underflow or inexact exceptions
91. ated dynamic library on a system immediately affects all the applications that use it without requiring relinking of the executable Tradeoffs for Dynamic Libraries Dynamic libraries introduce some additional tradeoff considerations m Smaller a out file Deferring binding of the library routines until execution time means that the size of the executable file is less than the equivalent executable calling a static version of the library the executable file does not contain the binaries for the library routines Possibly smaller process memory utilization When several processes using the library are active simultaneously only one copy of the memory resides in memory and is shared by all processes Possibly increased overhead Additional processor time is needed to load and link edit the library routines during runtime Also the library s position independent coding might execute more slowly than the relocatable coding in a static library Possible overall system performance improvement Reduced memory utilization due to library sharing should result in better overall system performance reduced I O access time from memory swapping Performance profiles among programs vary greatly from one to another It is not always possible to determine or estimate in advance the performance improvement or degradation between dynamic versus static libraries However if both forms of a needed library are available to you it would
92. atement functions Statement functions are compiled in line and can be optimized Push the loop into the subprogram That is rewrite the subprogram so that it can be called fewer times outside the loop and operate on a vector or array of values per call Rationalizing Tangled Code Complicated conditional operations within a computationally intensive loop can dramatically inhibit the compiler s attempt at optimization In general a good rule to follow is to eliminate all arithmetic and logical IF s replacing them with block IP s Original Code IF A I DELTA 10 10 11 10 XA I XB I B I I 1 XA I A I GOTO 13 11 XA I 2 1 1 2 1 IF QZDATA LT 0 GOTO 2 ICNT ICNT 1 ROX ICNT XA I DELTA 2 12 SUM SUM 1 13 SUM SUM XA I Untangled Code IF A 1I LE DELTA THEN XA I XB I B I 1I 1 XA I 1 ELSE XA I 2 1 2 1 1 IF QZDATA GE 0 THEN ICNT ICNT 1 ROX ICNT XA I DELTA 2 ENDIF SUM SUM 1 ENDIF SUM SUM 1 Using block IF not only improves the opportunities for the compiler to generate optimal code it also improves readability and assures portability Chapter 9 Performance and Optimization 3 Further Reading The following reference books provide more details a Numerical Computation Guide Sun Microsystems Inc
93. ates the maximum number of characters Chapter 7 Porting 111 that will fit into certain data types In this table boldfaced data types indicate default types subject to promotion by the 77 command line flags db1 28 or ypemap t TABLE 7 3 77 Maximum Characters in Data Types Maximum Number of Standard ASCII Characters 2 4 r8 dbl 1 1 1 1 8 8 16 16 16 16 16 16 32 32 32 32 16 16 32 32 N FF N N A A N A A CO FP N Sf Ooo A N o e Ae gt e N o e e BN co Ae B N o Ae B N 16 16 16 16 No i2 i4 Type r8 dbl E 1 Data BYT COMPLEX 8 PLEX 16 16 PLEX 32 32 COM COM DOUBLE COMPLEX 16 EGER 2 EGER 4 EGER 8 A CO FP N A CO ICAL 1 ICAL 2 ICAL 4 ICAL 8 4 o e Ae A N 8 1 6 16 DOUBLE PRECISION INTEGER IN IN IN LOGICAL OG 0G 0G 0G REAL REA REA REA When storing standard ASCII characters with normal Fortran R and LOGICAL do not hold double ER and LOGICAL do hold double T EG EG With r8 unspecified size INT m With db1 unspecified size INT The storage is allocated with both options but it is unavailable in normal Fortran with r8 Options 12 28 and db1 are now considered obsolete use xt ypemap instead Fortran Programming Guide May 2000 112
94. ation of a subprogram then has its own unique store of local variables maintained on the stack and no two invocations will interfere with each other Local subprogram variables can be made automatic variables that reside on the stack either by listing them on an AUTOMATIC statement or by compiling the subprogram with the stackvar option However local variables initialized in DATA statements must be rewritten to be initialized in actual assignments Note Allocating local variables to the stack can cause stack overflow See Stacks Stack Sizes and Parallelization on page 152 about increasing the size of the stack Inhibitors to Explicit Parallelization In general the compiler parallelizes a loop if you explicitly direct it to There are exceptions some loops the compiler will not parallelize The following are the primary detectable inhibitors that might prevent explicitly parallelizing a DO loop The DO loop is nested inside another DO loop that is parallelized This exception holds for indirect nesting too If you explicitly parallelize a loop that includes a call to a subroutine then even if you request the compiler to parallelize loops in that subroutine those loops are not run in parallel at runtime A flow control statement allows jumping out of the DO loop m The index variable of the loop is subject to side effects such as being equivalenced By compiling with vpara you will get diagnostic messages whe
95. ax_normal do 10 1 1 10000 2 A i MAXFLOAT A it tl MAXFLOAT 10 continue 001 MAXFLOAT 002 MAXFLOAT on A A on do 20 i 1 2 Add up the array RESULT RESULT A i 20 continue write 6 RESUL end demo setenv PARALLEL 2 Number of processors is 2 demo 77 silent autopar t3 f demo a out 0 Without reduction 0 is correct demo 77 silent autopar reduction t3 f demo a out Inf With reduction Inf is not correct demo Example Roundoff get the sum of 100 000 random numbers between 1 and 1 demo cat t4 f parameter n 100000 double precision d_lcrans lb 1 0 s ub 1 0 v n s d_lcrans v n lb ub Getn random nos between 1 and 1 s 0 0 do i 1 n 5 8 v i end do Write HCE 7 2 97 5 end demo 77 autopar reduction t4 f 158 Fortran Programming Guide May 2000 Results vary with the number of processors The following table shows the sum of 100 000 random numbers between 1 and 1 Number of Processors Output 1 s 0 568582080884714E 02 2 5 0 568582080884722E 02 3 s 0 568582080884721E 02 4 s 0 568582080884724E 02 In this situation roundoff error on the order of 10 is acceptable for data that is random to begin with For more information see the Sun Numerical Computation Guide Explicit Parallelization This section describes the source code directives recognized by 7
96. ay Indexing and Order Array indexing and order differ between Fortran and C Array Indexing C arrays always start at zero but by default Fortran arrays start at 1 There are two usual ways of approaching indexing Fortran Programming Guide May 2000 188 You can use the Fortran default as in the preceding example Then the Fortran element B 2 is equivalent to the C element b 1 m You can specify that the Fortran array B starts at B 0 as follows INTEGER B 0 2 This way the Fortran element B 1 is equivalent to the C element b 1 Array Order Fortran arrays are stored in column major order A 3 2 A 1 1 A 2 1 A 3 1 A 1 2 A 2 2 A 3 2 C arrays in row major order A 3 2 A O 0 Af O 1 Af1 0 Af1 1 A 2 0 Al2 1 For one dimensional arrays this is no problem For two dimensional and higher arrays be aware of how subscripts appear and are used in all references and declarations some adjustments might be necessary For example it may be confusing to do part of a matrix manipulation in C and the rest in Fortran It might be preferable to pass an entire array to a routine in the other language and perform all the matrix manipulation in that routine to avoid doing part in C and part in Fortran File Descriptors and stdio Fortran I O channels are in terms of unit numbers The I O system does not deal with unit numbers but with file descriptors The Fortran runtime system trans
97. b so 1 G pic ztext hlibtestlib so 1 f delte f delte q_fixx dropx f dropx etc f 1 q_step q_node q warn evalx f evalx linkz f linkz markx f markx point f point Linking G tells the linker to build a dynamic library ztext warns you if it finds anything other than position independent code such as relocatable text Example Make an executable file a out using the dynamic library demo 77 o trylib R pwd trylib f libtestlib so 1 E MAIN main demo file trylib trylib ELF 32 bit MSB executable SPARC Version 1 dynamically linked not stripped demo ldd trylib libtestlib so 1 gt export home U Tests libtestlib so 1 libF77 s0 4 gt opt SUNWspro lib libF77 so0 4 libc so l gt usr lib libc so 1 libdl so 1 gt usr lib libdl so 1 Chapter 4 Libraries 59 Note that the example uses the R option to bind into the executable the path the current directory to the dynamic library The 5116 command shows that the executable is dynamically linked The 1dd command shows that the executable trylib uses some shared libraries including libtestlib so 1 libf77 libdl and 1100 are included by default by 77 Libraries Provided with Sun Fortran Options Needed None parallel None None None None lL 77compat lv77 None None 1Fposix lposix9 6 1 1 Compilers The table shows the libraries installed with the compilers
98. b values xi o ET_REL debug file direct o ET_REL debug file opt SUNWspro lib libM77 a archive debug file opt SUNWspro lib libF77 so ET_DYN debug file opt SUNWspro lib libsunmath a archive See the Linker and Libraries Guide for further information on these linker options Consistent Compiling and Linking Ensuring a consistent choice of compiling and linking options is critical whenever compilation and linking are done in separate steps Compiling any part of a program with any of the following options requires linking with the same options 8 autopar Bx fast 6 Lpath 1name mt xmemalign nolib norunpath p pg xlibmopt xlic_lib name xprofile p Example Compiling sbr f with a and smain f without it then linking in separate steps 8 invokes tcov old style profiling demo 77 c a sbr f demo 77 c smain f demo f77 a sbr o smain o link step pass a to the linker Also a number of options require that all source files be compiled with that option These include aligncommon autopar dx dalign dbl explicitpar f misalign native parallel pentium xarch a xcache c xchip c xF xtarget t ztext See the 77 1 and 95 1 man pages and the Fortran User s Guide for details on all compiler options Fortran Programming Guide May 2000 46 Setting Library Search Paths and Order
99. block computepts f pattern f startupcore f demo Assume both pattern f and computepts f have an INCLUDE of commonblock and you wish to compile each f file and link the three relocatable files along with a series of libraries into a program called pattern The makefile looks like this demo cat makefile pattern pattern o computepts o startupcore o 77 pattern o computepts o startupcore o lcore77 lcore lsunwindow lpixrect o pattern pattern o pattern f commonblock 77 c pattern f computepts o computepts f commonblock 77 c computepts f startupcore o startupcore f 77 c u startupcore f demo The first line of makefile indicates that making pattern depends on pattern o computepts o and startupcore o The next line and its continuations give the command for making pattern from the relocatable o files and libraries Each entry in makefile is a rule expressing a target object s dependencies and the commands needed to make that object The structure of a rule is target dependencies list TAB build commands m Dependencies Each entry starts with a line that names the target file followed by all the files the target depends on Commands Each entry has one or more subsequent lines that specify the Bourne shell commands that will build the target file for this entry Each of these command lines must be indented by a tab character Fortran Programming Guide May 2000 36 m
100. call point x print value x end demo 77 o trylib trylib f test_lib testlib a trylib f MAIN demo Notice that the main program calls only two of the routines in the library You can verify that the uncalled routines in the library were not loaded into the executable file by looking for them in the list of names in the executable displayed by nm demos nm trylib grep FUNC grep point 146 70016 152 FUNC GLOB 0 8 point_ demos nm trylib grep FUNC grep evalx 165 69848 152 FUNC GLOB 0 8 jevalx_ demos nm trylib grep FUNC grep delte demos nm trylib grep FUNC grep markx demo efc In the preceding example grep finds entries in the list of names only for those library routines that were actually called Another way to reference the library is through the 1library and Lpath options Here the library s name would have to be changed to conform to the libname a convention demo mv test_lib testlib a test_lib libtestlib a demo 77 o trylib trylib f Ltest_lib ltestlib try ADEs MAIN 54 Fortran Programming Guide May 2000 The 1library and Lpath options are used with libraries installed in a commonly accessible directory on the system like usr local 1lib so that other users can reference it For example if you left Libtestlib ain usr local 1lib other users could be informed to compile with the following command demo 77 o myprog myprog f L usr
101. cheb1 was called from two routines deriv and diffr The timings on these lines show how much time was spent in cheb1 when it was called from each of these routines a Descendant Lines The lines below the function line indicate the routines called from chebl1 fftb sin and cos The library sine function is called indirectly Flat Profile Function names appear on the right The profile is sorted by percentage of total execution time Fortran Programming Guide May 2000 126 Overhead and Other Considerations Profiling compiling with the pg option may greatly increase the running time of a program This is due to the extra overhead required to clock program performance and subprogram calls Profiling tools like gprof attempt to subtract an approximate overhead factor when computing relative runtime percentages All other timings shown may not be accurate due to UNIX and hardware timekeeping inaccuracies Programs with short execution times are the most difficult to profile because the overhead may be a significant fraction of the total execution time The best practice is to choose input data for the profiling run that will result in a realistic test of the program s performance If this is not possible consider enclosing the main computational part of the program within a loop that effectively runs the program N times Estimate actual performance by dividing the profile results by N The Fortran library includes two routine
102. computations Bit masks defined in the system include file float ingpoint h are applied to the value returned by ieee_flags Note Fortran 95 95 programs should include the file floatingpoint h Fortran 77 77 programs should include 77_floatingpoint h In this example DetExcF1g F the include file is introduced using the include preprocessor directive which requires us to name the source file with a F suffix Underflow is caused by dividing the smallest double precision number by 2 Example Detect an exception using 1666 11805 and decode it include floatingpoint h CHARACTER 16 out DOUBLE PRECISION d_max_subnormal x INTEGER div flgs inv inx over under x d_max_subnormal 2 0 Cause underflow flgs ieee_flags get exception out Which are raised inx and rshift flgs fp_inexact 1 Decode div and rshift flgs fp_division 1 the value under and rshift flgs fp_underflow 1 returned over and rshift flgs fp_overflow 1 by inv and rshift flgs fp_invalid 1 ieee_flags PRINT Highest priority exception is out PRINT invalid divide overflo underflo inexact PRINT 518 inv div over under inx PRINT 1 exception is raised 0 it is not 1 ieee_flags clear exception 811 out Clear all END Chapter 6 Floating Point Arithmetic 9 Example Compile and run the preceding example DetExcFlg F
103. ction Continued Description Discusses issues relating to input output libraries program analysis debugging and performance Provides information on command line options and how to use the compilers Provides a complete language reference Describes the intrinsic INTERVAL data type supported by the Fortran 95 compiler Describes how to use the Sun WorkShop TeamWare code management tools Describes how to use Visual to create C and Java graphical user interfaces Discusses the optimized library of subroutines and functions used to perform computational linear algebra and fast Fourier transforms Describes how to use the Sun specific features of the Sun Performance Library which is a collection of subroutines and functions used to solve linear algebra problems Describes issues regarding the numerical accuracy of floating point computations Provides details on the Standard C Library Describes how to use the Standard C Library Provides details on the Tools h class library Discusses use of the C classes for enhancing the efficiency of your programs Document Title Fortran Programming Guide Fortran User s Guide FORTRAN 77 Language Reference Interval Arithmetic Programming Reference Sun WorkShop TeamWare 6 User s Guide Sun WorkShop Visual User s Guide Sun Performance Library Reference Sun Performance Library User s Guide Numerical Computation Gui
104. ddl p 0 return end Fortran calls C real ADD1 R S external ADD1 R 8 0 S ADD1 R float 8001 pf 11086 float 5 5 pf return f Returning COMPLEX Data A Fortran function returning COMPLEX or DOUBLE COMPLEX on SPARC V8 platforms is equivalent to a C function with an additional first argument that points to the return value in memory The general pattern for the Fortran function and its C function cf_ return al a2 an struct float r i return corresponding C function is Fortran function COMPLEX FUNCTION CF al a2 an Fortran Programming Guide May 2000 200 TABLE 11 14 Function Returning COMPLEX Data C calls Fortran struct complex float r i struct complex cl c2 struct complex u amp cl v amp c2 extern retfpx_ u gt r 7 0 i 8 0 lt ret fpx U vp COMPLEX FUNCTION RETFPX Z COMPLEX Z RETFPX Z 1 0 1 0 RETURN END Fortran calls C COMPLEX U V RETCPX EXTERNAL RETCPX 7 0 8 0 V RETCPX U struct complex float r i void retcpx_ temp w struct complex temp w temp gt r w gt r 1 0 temp gt i w gt i 1 0 return In 64 bit environments and compiling with xarch v9 COMPLEX values are EX and DOUBLE COMPLEX in 3 0 and f1 3 These registers are not directly returned in floating point registe
105. de Standard C Class Library Reference Standard C Library User s Guide Tools h Class Library Reference Tools h User s Guide TABLE P 3 Document Collection Forte TeamWare 6 Sun WorkShop TeamWare 6 Forte Developer 6 Sun WorkShop Visual 6 Forte Sun Performance Library 6 Numerical Computation Guide Standard Library 2 Tools h 7 Fortran Programming Guide May 2000 TABLE P 4 describes related Solaris documentation available through the docs sun com Web site TABLE 4 Related Solaris Documentation Document Collection Document Title Description Solaris Software Developer Linker and Libraries Guide Describes the operations of the Solaris link editor and runtime linker and the objects on which they operate Programming Utilities Guide Provides information for developers about the special built in programming tools that are available in the Solaris operating environment 10 Fortran Programming Guide May 2000 Introduction The Sun Fortran compilers 77 and 95 described in this book and the companion book Sun Fortran User s Guide are available under the Solaris operating environment on the various hardware platforms that Solaris supports The compilers themselves conform to published Fortran language standards and provide many extended features including multiprocessor parallelization sophisticated optimized code compilation and mixed
106. determine the current working directory CHARACTER 128 FN 128 FULLNAME 128 PRI ENTER FILE NAME READ F FN FULLNAME F PRINT PATH IS FN CHARACTER 128 FUNCTION FULLNAME NAME CHARACTER NAME PREFIX 128 0 This assumes C shell 0 Leave absolute path names unchanged 5 If name starts with replace tilde with home 6 directory otherwise prefix relative path name with 6 path to current directory IF NAME 1 1 EQ THE FULLNAME NAME ELSE IF NAME 1 2 EQ THEN CALL GETENV HOME PREFIX FULLNAME PREFIX LNBLNK PREFIX amp NAME 2 LNBLNK NAME ELSE CALL GETCWD PREFIX FULLNAME PREFIX LNBLNK PREFIX amp NAME LNBLNK NAME ENDIF RETURN END 20 Fortran Programming Guide May 2000 Compiling and running GetFilNam f results in demo pwd home users auser subdir demos 77 silent o getfil GetFilNam f demo getfil anyfile home users auser subdir anyfile demo Opening Files Without a Name The OPEN statement need not specify a name the runtime system supplies a file name according to several conventions Opened as Scratch Specifying STATUS SCRATCH in the OPEN statement opens a file with a name of
107. ds files and directories on screen computer output What you type when contrasted with on screen computer output Book titles new words or terms words to be emphasized Command line placeholder text replace with a real name or value TABLE P 1 Typeface AaBbCc123 AaBbCc123 AaBbCc123 AaBbCc123 The symbol A stands for a blank space where a blank is significant AA36 001 m FORTRAN 77 examples appear in tab format while Fortran 95 examples appear in free format Examples common to both Fortran 77 and 95 use tab format except where indicated m The FORTRAN 77 standard uses an older convention of spelling the name FORTRAN capitalized Sun documentation uses both FORTRAN and Fortran The current convention is to use lower case Fortran 95 m References to online man pages appear with the topic name and section number For example a reference to GETENV will appear as getenv 3F implying that the man command to access this page would be man s 3F getenv System Administrators may install the Sun WorkShop Fortran compilers and supporting material at lt install_point gt SUNWspro where lt install_point gt is usually opt for a standard install This is the location assumed in this book Fortran Programming Guide May 2000 4 Shell Prompts TABLE P 2 shows the default system prompt and superuser prompt for the C shell Bourne shell and Korn shell TABLE P 2 Shell Prompts Shell Promp
108. e Compared With Sun Style different set of qualifiers and scheduling same as Sun style same as Sun style same as Sun style Directive LL KCOMMON ERIAL ERIAL Cray DOA TAS DOS DOS DOALL Qualifiers For Cray style DOALL the PRIVATE qualifier is required Each variable within the 0 loop must be qualified as private or shared and the DO loop index must always be private The following table summarizes available Cray style qualifiers TABLE 10 7 DOALL Qualifiers Cray Style Assertion Share the variables v1 v2 between iterations Do not share the variables x1 x2 between iterations That is each thread has its own private copy of these variables Save the last DO iteration values of all private variables in the loop Use no more than n threads Chapter 10 SPARC Parallelization 177 Qualifier SHARED v1 02 PRIVATE x1 x2 SAVELAST MAXCPUS n For Cray style directives the DOALL directive allows a single scheduling qualifier for example MIC amp CHUNKSIZE 100 TABLE 10 8 shows the Cray style DOALL directive scheduling qualifiers TABLE 10 8 DOALL Cray Scheduling Qualifier Assertion GUIDED Distribute the iterations by use of guided self scheduling This distribution minimizes synchronization overhead with acceptable dynamic load balancing The default chunk size is 64 GUIDED is equivalent to Sun style GSS 64 SINGLE Dis
109. e Material common to both FORTRAN 77 and Fortran 95 is presented in examples that use FORTRAN 77 CHAPTER 1 1 Compatibility Issues Most C Fortran interfaces must agree in all of these aspects Function subroutine definition and call Data types compatibility of types Arguments passing by reference or value Arguments order Procedure name uppercase and lowercase and trailing underscore _ Libraries telling the linker to use Fortran libraries Some C Fortran interfaces must also agree on m Arrays indexing and order m File descriptors and stdio m File permissions 183 Function or Subroutine The word function has different meanings in C and Fortran Depending on the situation the choice is important InC all subprograms are functions however some may return a null void value In Fortran a function passes a return value but a subroutine does not When a Fortran routine calls a C function If the called C function returns a value call it from Fortran as a function m If the called C function does not return a value call it as a subroutine When a C function calls a Fortran subprogram If the called Fortran subprogram is a function call it from C as a function that returns a compatible data type If the called Fortran subprogram is a subroutine call it from C as a function that returns a value of int compatible to Fortran INTEGER 4 or void A value is returned if t
110. e m ieee_values 3m The floatingpoint h and 77_floatingpoint header files Exception Handlers and ieee_handler Typical concerns about IEEE exceptions are What happens when an exception occurs How do I use ieee_handler to establish a user function as an exception handler How do I write a function that can be used as an exception handler How do I locate the exception where did it occur Exception trapping to a user routine begins with the system generating a signal on a floating point exception The standard UNIX name for signal floating point exception is SIGFPE The default situation on SPARC and x86 platforms is not to generate a SIGFPE when an exception occurs For the system to generate a SIGFPE exception trapping must first be enabled usually by a call to ieee_handler Establishing an Exception Handler Function To establish a function as an exception handler pass the name of the function to ieee_handler together with the name of the exception to watch for and the action to take Once you establish a handler a SIGFPE signal is generated whenever the particular floating point exception occurs and the specified function is called Chapter 6 Floating Point Arithmetic 91 The form for invoking 1666 86165 is shown in the following table TABLE 6 4 Arguments for ieee_handler action exception handler Argument Type Possible Values action character get set or clear e
111. e when compiling with parallel the j loop will not be parallelized by the compiler but the i or k loop may be DOSERIAL Directive The DOSERIAL directive disables parallelization of the specified nest of loops This directive applies to the whole nest of loops immediately following it Chapter 10 SPARC Parallelization 3 Example Exclude a whole nest of loops from parallelization do i 1 n CSPAR DOSERIAL do j 1 n do k Le tt end do end do end do In the example when compiling with parallel the j and k loops will not be parallelized by the compiler but the i loop may be Interaction Between DOSERIAL and DOALL If both DOSERIAL and DOALL are specified for the same loop the last one prevails Example Specifying both DOSERIAL and DOALL CSPAR DOSERIAL do 1 1 0 CSPAR DOALL do 1 0 end do end do In the example the i loop is not parallelized but the j loop is Also the scope of the DOSERIAL directive does not extend beyond the textual loop nest immediately following it The directive is limited to the same function or subroutine that it appears in Fortran Programming Guide May 2000 174 Example DOSERIAL does not extend to a loop in a called subroutine program caller common block a 10 10 CSPAR DOSERIAL do 1 1 10 call callee i end do end subroutine callee k common block a 10 10 do j 1 10 a j k
112. e conservative assumptions are made Adding fsimple 2 enables the optimizer to make further simplifications with the understanding that this might cause some programs to produce slightly different results due to rounding effects If fsimple level 1 or 2 is used all program units should be similarly compiled to ensure consistent numerical accuracy unroll n Unrolling short loops with long iteration counts can be profitable for some routines However unrolling can also increase program size and might even degrade performance of other loops With n 1 the default no loops are unrolled automatically by the optimizer With n greater than 1 the optimizer attempts to unroll loops up to a depth of n Chapter 9 Performance and Optimization 9 The compiler s code generator makes its decision to unroll loops depending on a number of factors The compiler might decline to unroll a loop even though this option is specified with n gt 1 If a DO loop with a variable loop limit can be unrolled both an unrolled version and the original loop are compiled A runtime test on iteration count determines whether or not executing the unrolled loop is inappropriate Loop unrolling especially with simple one or two statement loops increases the amount of computation done per iteration and provides the optimizer with better opportunities to schedule registers and simplify operations The tradeoff between number of iterations loop complexity and choice of unro
113. efly describes the features of the compilers Chapter 2 Fortran Input Output discusses how to use I O efficiently Chapter 3 Program Development demonstrates how program management tools like SCCS make and Teamware can be helpful Chapter 4 Libraries explains use and creation of software libraries Chapter 5 Program Analysis and Debugging describes use of dbx and other analysis tools Chapter 6 Floating Point Arithmetic introduces important issues regarding numerical computation accuracy Chapter 7 Porting considers porting programs to Sun compilers Chapter 8 Performance Profiling describes techniques for performance measurement Chapter 9 Performance and Optimization indicates ways to improve execution performance of Fortran programs Chapter 10 SPARC Parallelization explains the multiprocessing features of the compilers Chapter 11 C Fortran Interface describes how C and Fortran routines can call each other and pass data Typographic Conventions TABLE P 1 shows the typographic conventions that are used in Sun WorkShop documentation Examples Edit your login file Use 1s a to list all files 9 You have mail su Password Read Chapter 6 in the User s Guide These are called class options You must be superuser to do this To delete a file type rm filename Typographic Conventions Meaning The names of comman
114. em the maximum number of threads the program can create The default is 1 In general set the PARALLEL or OMP_NUM_THREADS variable to the available number of processors on the target platform SUNW_MP_THR_IDLE Use the SUNW_MP_THR_IDLE environment variable to control the end of task status of each thread executing the parallel part of a program You can set the value of this variable to spin sleep ms or sleep mms The default is spin which means a thread spin waits when it finishes its share of a parallel task until a new parallel task arrives The other choice puts the thread to sleep after spin waiting for n seconds ns or n milliseconds nms If a new task arrives before this wait time the thread stops spinning and starts the new task setenv SUNW_MP_THR_IDLE 50ms setenv PARALLEL 4 myprog In this example at most four threads are created by the program After finishing a parallel task a thread spins for 50 ms If within that time a new task has arrived for the thread it executes it Otherwise the thread goes to sleep until a new task arrives Debugging Parallelized Programs Debugging parallelized programs requires some extra effort The following schemes suggest ways to approach this task First Steps at Debugging There are some steps you can try immediately to determine the cause of errors Turn off parallelization You can do one of the following Chapter 10 SPARC Parallelization 179 Tur
115. es 42 creating SCCS directory 40 inserting keywords 40 putting files under SCCS 39 SCHEDTYPE directive qualifier 172 scheduling parallel loops 172 178 segmentation fault due to out of bounds subscripts 77 SELF directive qualifier 172 shared library See libraries dynamic 55 SHARED directive qualifier 169 177 sharing I O C Fortran interface 203 shippable libraries 62 SIGFPE signal definition 84 91 when generated 94 SINGLE directive qualifier 178 Solaris versions supported 1 source diagnostics 79 source code preprocessor fpp 13 utility fsplit 13 source code control See SCCS SPARC V9 64 bit environments 58 stack size and parallelization 152 STACKSIZE environment variable 153 stackvar option 152 standard files undeclared checking for with u 77 uninitialized 115 unused checking Xlist 64 used but unset checking Xlist 64 version checking 78 VMS Fortran file names on INCLUDE 26 library 110777 61 time functions 106 WwW watchpoints dbx 78 width of output lines Xlist 74 X xl d option 26 Xlist option global program checking 63 to 76 1n files directory 72 call graph 11 860 1 cross reference XlistX 71 defaults 65 examples 66 75 suboptions 70 to 75 xmaxopt option 138 xprofile option 138 xtarget option 140 Y Y2K year 2000 considerations 107 Z ztext option 59 time command 122 m
116. es are discussed CHAPTER 4 Understanding Libraries A software library is usually a set of subprograms that have been previously compiled and organized into a single binary library file Each member of the set is called a library element or module The linker searches the library files loading object modules referenced by the user program while building the executable binary program See 1d 1 and the Solaris Linker and Libraries Guide for details There are two basic kinds of software libraries a Static library A library in which modules are bound into the executable file before execution Static libraries are commonly named libname a The a suffix refers to archive Dynamic library A library in which modules can be bound into the executable program at runtime Dynamic libraries are commonly named libname so The so suffix refers to shared object Typical system libraries that have both static a and dynamic so versions are Fortran 77 libraries libF77 1ibM77 Fortran95libraries libfsu libfui libfai libfai2 libfsunai libfprodai libfminlai libfmaxlai libminvai libmaxvai libf77compat VMS Fortran libraries 1ibV77 C libraries libc There are two advantages to the use of libraries m There is no need to have source code for the library routines that a program calls Only the needed modules are loaded 43 Library files provide an easy way for programs to share commonly used s
117. exceptions on program termination In general a message results if any one of the invalid division by zero or overflow exceptions have occurred Inexact exceptions do not generate messages because they occur so frequently in real programs 95 programs do not automatically report on exceptions at program termination An explicit call to ieee_retrospect ive 3M is required You can turn off any or all of these messages with 1666 11865 by clearing exception status flags Do this at the end of your program Chapter 6 Floating Point Arithmetic 3 Handling Exceptions Exception handling according to the IEEE standard is the default on SPARC and x86 processors However there is a difference between detecting a floating point exception and generating a signal for a floating point exception SIGFPE Following the IEEE standard two things happen when an untrapped exception occurs during a floating point operation The system returns a default result For example on 0 0 invalid the system returns NaN as the result A flag is set to indicate that an exception is raised For example 0 0 invalid the system sets the invalid operation flag Trapping a Floating Point Exception 77 and 95 differ significantly in the way they handle floating point exceptions With 77 the default on SPARC and x86 systems is not to automatically generate a signal to interrupt the running program for a floating point exception T
118. executed are marked with gt to the left of the statement Fortran Programming Guide May 2000 128 129 Here is a simple example demo demo 77 a o onetwo silent one f two f onetwo output from program demo tcov one f two f demo cat one tcov two tcov program one 1 lt do 0 10 lt call two i end do 1 gt end Top 10 Blocks Line Count 3 10 2 1 5 1 3 Basic blocks in this file 3 Basic blocks executed 100 00 Percent of the fil xecuted 12 Total basic block executions 4 00 Average executions per basic block subroutine two i 10 gt print two called i return end Top 10 Blocks Line Count 2 10 1 Basic blocks in this file 1 Basic blocks executed 100 00 Percent of the fil xecuted 10 Total basic block executions 10 00 Average executions per basic block demo Chapter 8 Performance Profiling New Style Enhanced t cov Analysis To use new style tcov compile with xprofile tcov When the program is run coverage data is stored in program profile tcovd where program is the name of the executable file If the executable were a out a out profile tcovd would be created Run tcov x dirname source_files to create the coverage analysis merged with each source file The report is written to file tcov in the current directory Running a simple example demos 77 o onetwo silent xprofile tcov one f two f demo onetwo output from program demo tcov
119. from the library any elements whose entry points are referenced within the program it is linking such as a subprogram entry name or COMMON block initialized in a BLOCKDATA subprogram These extracted elements routines are bound permanently into the a out executable file generated by the linker Tradeoffs for Static Libraries There are three main issues to keep in mind regarding static as compared to dynamic libraries and linking m Static libraries are more self contained but less adaptable If you bind an a out executable file statically the library routines it needs become part of the executable binary However if it becomes necessary to update a static library routine bound into the a out executable the entire a out file must be relinked and regenerated to take advantage of the updated library With dynamic libraries the library is not part of the a out file and linking is done at runtime To take advantage of an updated dynamic library all that is required is that the new library be installed on the system The elements in a static library are individual compilation units o files Chapter4 Libraries 51 Since a single compilation unit a source file can contain more than one subprogram these routines when compiled together become a single module in the static library This means that all the routines in the compilation unit are loaded together into the a out executable even though only one of those subprog
120. g Guide May 2000 138 However users should be aware that using dalign and therefore fast may cause problems with some programs that have been coded expecting a specific alignment of data in COMMON blocks With dalign the compiler may add padding to ensure that all double and quad precision data either REAL or COMPLEX are aligned on double word boundaries with the result that COMMON blocks might be larger than expected due to added padding All program units sharing COMMON must be compiled with dalign if any one of them is compiled with dalign For example a program that writes data by aliasing an entire COMMON block of mixed data types as a single array might not work properly with dalign because the block will be larger due to padding of double and quad precision variables than the program expects depend Adding depend to optimization levels 03 and higher on the SPARC platform extends the compiler s ability to optimize DO loops and loop nests With this option the optimizer analyzes inter iteration loop dependencies to determine whether or not certain transformations of the loop structure can be performed Only loops without dependencies can be restructured However the added analysis might increase compilation time fsimple 2 Unless directed to the compiler does not attempt to simplify floating point computations the default is fsimple 0 With the fast option fsimple 1 is used and som
121. ght call but that are called from no other routine in the library Also any helper routines called from more than one library routine are gathered together into a single source file This gives a reasonably well organized set of source and object files Fortran Programming Guide May 2000 52 Assume that the name of each source file is taken from the name of the first routine in the file which in most cases is one of the principal files in the library demo cd test_lib demo ls total 14 2 dropx f 2 25 2 markx f 22 2 22 25 2 5 2 5 The lower level helper routines are gathered together into the file 656 5 The other files can contain one or more subprograms First compile each of the library source files using the c option to generate the corresponding relocatable o files demo 77 c delte f delte q_fixx dropx f dropx etc f q_fill q_step q_node q warn CtC demo 8 total 42 2 dropx f 4 etc o 2 linkz f 4 markx o 2 delte f 4 dropx o 2 evalx f 4 linkz o 2 point f 4 delte o 2 etc f 4 evalx o 2 markx f 4 point o demo Now create the static library testlib a using ar demo ar cr testlib a o Chapter 4 Libraries 53 To use this library either include the library file on the compilation command or use the 1 and L compilation options The example uses the a file directly demos cat trylib f 6 program to test testlib routines x 21 998 call evalx x
122. h subprogram that contains such a reference The conventional usage is EXTERNAL ABC XYZ SPRAGMA C ABC XYZ If you use this pragma the C function does not need an underscore appended to the function name Pragma directives are described in the Fortran User s Guide Chapter 11 C Fortran Interface 7 Argument Passing by Reference or Value In general Fortran routines pass arguments by reference In a call if you enclose an argument with the 77 and 95 nonstandard function VAL the calling routine passes it by value In general C passes arguments by value If you precede an argument by the ampersand operator amp C passes the argument by reference using a pointer C always passes arrays and character strings by reference Argument Order Except for arguments that are character strings Fortran and C pass arguments in the same order However for every argument of character type the Fortran routine passes an additional argument giving the length of the string These are long int quantities in C passed by value The order of arguments is Address for each argument datum or function a A long int for each character argument the whole list of string lengths comes after the whole list of other arguments Example This Fortran code fragment Is equivalent to this in C CHARACTER 7 5 char s 7 INTEGER B 3 Int Ce CALL SAM S B 2 sam_ 5 amp b 1 7L Arr
123. he assumptions are that signals could degrade performance and that most exceptions are not significant as long as expected values are returned The default with 95 is to automatically trap on division by zero overflow and invalid operation The 77 and 95 command line option ft rap can be used to change the default In terms of ftrap the default for 77 is ftrap none The default for 95 is ftrap common To enable exception trapping compile the main program with one of the ft rap options for example ftrap common SPARC Nonstandard Arithmetic One aspect of standard IEEE arithmetic called gradual underflow can be manually disabled When disabled the program is considered to be running with nonstandard arithmetic The IEEE standard for arithmetic specifies a way of handling underflowed results gradually by dynamically adjusting the radix point of the significand In IEEE floating point format the radix point occurs before the significand and there is an implicit leading bit of 1 Gradual underflow allows the implicit leading bit to be cleared to 0 and shifts the radix point into the significant when the result of a floating point computation would otherwise underflow With a SPARC processor Fortran Programming Guide May 2000 84 this result is not accomplished in hardware but in software If your program generates many underflows perhaps a sign of a problem with your algorithm and you run on a SPARC proce
124. he Fortran subroutine uses alternate returns in which case it is the value of the expression on the RETURN statement If no expression appears on the RETURN statement and alternate returns are declared on SUBROUTINE statement a zero is returned Data Type Compatibility The tables below summarize the data sizes and default alignments for FORTRAN 77 and Fortran 95 data types In both tables note the following C data types int long int and long are equivalent 4 bytes In a 64 bit environment and compiling with xarch v9 or v9a long and pointers are 8 bytes This is referred to as LP64 REAL 16 and COMPLEX 32 REAL KIND 16 and COMPLEX KIND 16 are available only on SPARC platforms In a 64 bit environment and compiling with xarch v9 or v9a alignment is on 16 byte boundaries Alignments marked 4 8 for SPARC indicate that alignment is 8 bytes by default but on 4 byte boundaries in COMMON blocks The maximum alignment in COMMON is 4 bytes The elements and fields of arrays and structures must be compatible You cannot pass arrays character strings or structures by value m You can pass arguments by value from 77 to C but not from C to 77 since SVAL is not allowed in a Fortran dummy argument list 184 Fortran Programming Guide May 2000 185 Data Sizes and Alignments in Bytes Pass by Reference 77 and cc Default Alignment SPARC x86 1 BR RRR P
125. he variables x and i are STOREBACK variables even though both variables are private to the i loop The value of i after the loop is n 1 while the value of x is whatever value it had at the end of the last iteration There are some potential problems for STOREBACK to be aware of The STOREBACK operation occurs at the last iteration of the explicitly parallelized loop even if this is not the same iteration that last updates the value of the STOREBACK variable or array Example STOREBACK variable potentially different from the serial version CSPAR DOALL PRIVATE x STOREBACK x do i 1 n if then x end if end do print x In the preceding example the value of the STOREBACK variable x that is printed out might not be the same as that printed out by a serial version of the i loop In the explicitly parallelized case the processor that processes the last iteration of the i loop when i n and performs the STOREBACK operation for x might not be the same processor that currently contains the last updated value of x The compiler issues a warning message about these potential problems Fortran Programming Guide May 2000 170 SAVELAST The SAVELAST qualifier specifies that all private scalars and arrays are STOREBACK variables for the DOALL loop Example Specify SAVELAST CSPAR DOALL PRIVATE x y SAVELAST do i 1 n y end do i x
126. help Items within are optional Items within lt gt are variable parameters Bar indicates choice of literal values For example someoption lt yes no gt implies someoption is someoption yes San Collect data for tcov basic block profiling old format ansi Report non ANSI extensions arg local Preserve actual arguments over ENTRY statements autopar Enable automatic loop parallelization requires WorkShop license Bdynamic Allow dynamic linking Bstatic Require static linking s Compile only produce o files suppress linking O Enable runtime subscript range checking cg89 Generate code for generic SPARC V7 architecture cg92 Generate code for SPARC V8 architecture copyargs Allow assignment to constant arguments 5 Chapter1 Introduction 7 18 Fortran Programming Guide May 2000 Fortran Input Output This chapter discusses the input output features provided by Sun Fortran compilers Accessing Files From Within Fortran Programs Data is transferred between the program and devices or files through a Fortran logical unit Logical units are identified in an I O statement by a logical unit number a nonnegative integer from 0 to the maximum 4 byte integer value 2 147 483 647 The character can appear as a logical unit identifier The asterisk stands for standard input file when it appears in a READ statement it stands for standard output file when it appears in a WRITE o
127. hem rather than splitting each routine into its own file this allows the analysis to be global Fortran Programming Guide May 2000 136 fast This single option selects a number of performance options that working together produce object code optimized for execution speed without an excessive increase in compilation time The options selected by fast are subject to change from one release to another and not all are available on each platform native generates code optimized for the host architecture 05 sets optimization level libmil inlines calls to some simple library functions fsimple 2 simplifies floating point code dalign uses faster double word loads and stores xlibmopt use optimized 1ibm math library fns selects non standard floating point mode ftrap none turns off all trapping for 77 or ftrap common selects common floating point trapping for f95 depend analyze loops for data dependencies pad common improves cache performance xvector yes invokes vectorized library functions in loops fast provides a quick way to engage much of the optimizing power of the compilers Each of the composite options may be specified individually and each may have side effects to be aware of discussed in the Fortran User s Guide Following fast with additional options adds further optimizations For example 95 fast xarch v9a compiles for an UltraSPARC 64 bit enabled Solaris platfor
128. high quality products you have come to expect from Sun the only thing that has changed is the name We believe that the Forte name blends the traditional quality and focus of Sun s core programming tools with the multi platform business application deployment focus of the Forte tools such as Forte Fusion and Forte for Java The new Forte organization delivers a complete array of tools for end to end application development and deployment For users of the Sun WorkShop tools the following is a simple mapping of the old product names in WorkShop 5 0 to the new names in Forte Developer 6 Old Product Name New Product Name Sun Visual WorkShop C Forte C Enterprise Edition 6 Sun Visual WorkShop C Personal Forte C Personal Edition 6 Edition Sun Performance WorkShop Fortran Forte for High Performance Computing 6 Sun Performance WorkShop Fortran Forte Fortran Desktop Edition 6 Personal Edition Sun WorkShop Professional C Forte C 6 Sun WorkShop University Edition Forte Developer University Edition 6 In addition to the name changes there have been major changes to two of the products Forte for High Performance Computing contains all the tools formerly found in Sun Performance WorkShop Fortran and now includes the C compiler so High Performance Computing users need to purchase only one product for all their development needs Forte Fortran Desktop Edition is ide
129. hmetic 81 Introduction 81 IEEE Floating Point Arithmetic 2 ftrap mode Compiler Options 3 Floating Point Exceptions and Fortran 3 Handling Exceptions 84 Trapping a Floating Point Exception 84 SPARC Nonstandard Arithmetic 84 IEEE Routines 85 Flags and ieee_flags 86 IEEE Extreme Value Functions 90 Exception Handlers and ieee_handler 1 Retrospective Summary 97 Debugging IEEE Exceptions 97 Further Numerical Adventures 0 Avoiding Simple Underflow 0 Continuing With the Wrong Answer 1 SPARC Excessive Underflow 102 Interval Arithmetic 103 7 Porting 105 Time and Date Functions 105 viii Fortran Programming Guide May 2000 Formats 9 Carriage Control 109 Working With Files 0 Porting From Scientific Mainframes 0 Data Representation 1 Hollerith Data 1 Nonstandard Coding Practices 114 Uninitialized Variables 5 Aliasing Across Calls 5 Obscure Optimizations 115 Troubleshooting 118 Results Are Close but Not Close Enough 118 Program Fails Without Warning 119 Performance Profiling 1 Sun WorkShop Performance Analyzer 121 The time Command 122 Multiprocessor Interpretation of time Output 122 The gprof Profiling Command 3 Overhead and Other Considerations 7 The tcov Profiling Command 127 Old Style tcov Coverage Analysis 8 New Style Enhanced tcov Analysis 0 771 O Profiling 1 Performance and Optimization 135 Choice of Compiler Options 135 Performance Option Reference 136 Other Performance Strategies 141
130. icences qui en restreignent l utilisation la copie la distribution et la d compilation Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme par quelque moyen que ce soit sans l autorisation pr alable et crite de Sun et de ses bailleurs de licence s il y en a Le logiciel d tenu par des tiers et qui comprend la technologie relative aux polices de caract res est prot g par un copyright et licenci par des fournisseurs de Sun Des parties de ce produit pourront tre d riv es des syst mes Berkeley BSD licenci s par l Universit de Californie UNIX est une marque d pos e aux Etats Unis et dans d autres pays et licenci e exclusivement par X Open Company Ltd La notice suivante est applicable a Netscape Netscape Navigator et the Netscape Communications Corporation logo Copyright 1995 Netscape Communications Corporation Tous droits r serv s Sun Sun Microsystems the Sun logo docs sun com AnswerBookz2 Solaris SunOS JavaScript SunExpress Sun WorkShop Sun WorkShop Professional Sun Performance Library Sun Performance WorkShop Sun Visual WorkShop et Forte sont des marques de fabrique ou des marques d pos es ou marques de service de Sun Microsystems Inc aux Etats Unis et dans d autres pays Toutes les marques SPARC sont utilis es sous licence et sont des marques de fabrique ou des marques d pos es de SPARC International Inc aux Etats Unis et dans d autres pays Les produ
131. ight only further obscure the code and make it even more difficult for the compiler s optimizer to achieve significant performance improvements Excessive hand tuning of the source code can hide the original intent of the procedure and could have a significantly detrimental effect on performance for different architectures Using Optimized Libraries In most situations optimized commercial or shareware libraries perform standard computational procedures far more efficiently than you could by coding them by hand Chapter 9 Performance and Optimization 1 For example the Sun Performance Library is a suite of highly optimized mathematical subroutines based on the standard LAPACK BLAS FFTPACK VFFTPACK and LINPACK libraries Performance improvement using these routines can be significant when compared with hand coding See the Sun Performance Library User s Guide for details Eliminating Performance Inhibitors Use the Sun WorkShop Performance Analyzer to identify the key computational parts of the program Then carefully analyze the loop or loop nest to eliminate coding that might either inhibit the optimizer from generating optimal code or otherwise degrade performance Many of the nonstandard coding practices that make portability difficult might also inhibit optimization by the compiler Reprogramming techniques that improve performance are dealt with in more detail in some of the reference books listed at the end of the chapter
132. in a few of the loops that are being parallelized Recompile and execute Use loopinfo to see which loops are being parallelized Continue this process until you start getting the correct results Use explicit parallelization Add the CSPAR DOALL directive to a couple of the loops that are being parallelized Compile with explicitpar then execute and verify the results Use loopinfo to see which loops are being parallelized This method permits the addition of I O statements to the parallelized loop Repeat this process until you find the loop that causes the wrong results Note if you need explicitpar only without autopar do not compile with explicitpar and depend This method is the same as compiling with parallel which of course includes autopar Run loops backward serially 180 Fortran Programming Guide May 2000 181 Replace DO I 1 N with DO 1 1 1 Different results point to data dependencies Avoid using the loop index ER I ER I Replace DO I 1 N CALL SNUBBI ENDDO With DO I1 1 N i eae CALL SNUBBI ENDDO Debugging Parallel Code With dbx To use dbx on a parallel loop temporarily rewrite the program as follows Isolate the body of the loop in a file and subroutine of its own In the original routine replace loop body with a call to the new subroutine Compile the new subroutine with g and no parallelization options Compile the changed original rou
133. inally developed For similar historical reasons an operating system derived from the UNIX operating system does not have Fortran carriage control but you can simulate it in two ways m Use the asa filter to transform Fortran carriage control conventions into the UNIX carriage control format see the asa 1 man page before printing files with the lpr command 77 For simple jobs use OPEN N FORM PRINT to enable single or double spacing formfeed and stripping off of column one It is legal to reopen unit 6 to change the form parameter to PRINT For example OPEN 6 FORM PRINT You can use 1p 1 to print a file that is opened in this manner Chapter 7 Porting 109 Working With Files Early Fortran systems did not use named files but did provide a command line mechanism to equate actual file names with internal unit numbers This facility can be emulated in a number of ways including standard UNIX redirection Example Redirecting stdin to redir data using csh 1 demo cat redir data The data file 9 9 demo cat redir f The source file read 1 2 The program reads standard input print 17 2 stop end demo 77 silent o redir redir f The compilation step demo redir gt redir data Run with redirection reads data file 9 9 90000 demo Porting From Scientific Mainframes If the application code was originally developed for 64 bit or 60 bit mainframes such as CRAY or CDC you might
134. ine repeat At line 2 modified At line 3 argument At line 4 argument At line 5 used rp1 is a 4 byte real in the routine repeat At line 2 it is an argument xis 8 4 byte real in the routines subr1 and prnok In subr1 at line 8 defined used at lines 9 and 10 In prnok at line 20 defined at line 21 used as an argument Program Analysis and Debugging Chapter 5 Suboptions for Global Checking Across Routines The basic global cross checking option is Xlist with no suboption It is a combination of suboptions each of which could have been specified separately The following sections describe options for producing the listing errors and cross reference table Multiple suboptions may appear on the command line Suboption Syntax Add suboptions according to the following rules Append the suboption to Xlist m Put no space between the 11 5 and the suboption Use only one suboption per 11 8 and its Suboptions 8 11 Combine suboptions according to the following rules The most general option is Xlist listing errors cross reference table m Specific features can be combined using Xlistc XlistE XlistL or XlistxX Other suboptions specify further details Example Each of these two command lines performs the same task demos 77 Xlistc Xlist any f demos 77 Xlistc any f The following table shows the reports generated by
135. its portant les marques SPARC sont bas s sur une architecture d velopp e par Sun Microsystems Inc L interface d utilisation graphique OPEN LOOK et Sun a t d velopp e par Sun Microsystems Inc pour ses utilisateurs et licenci s Sun reconna t les efforts de pionniers de Xerox pour la recherche et le d veloppement du concept des interfaces d utilisation visuelle ou graphique pour l industrie de l informatique Sun d tient une licence non exclusive de Xerox sur l interface d utilisation graphique Xerox cette licence couvrant galement les licenci s de Sun qui mettent en place l interface d utilisation graphique OPEN LOOK et quien outre se conforment aux licences crites de Sun Sun 190 195 est deriv de CRAY CF90 un produit de Silicon Graphics Inc CETTE PUBLICATION EST FOURNIE EN L ETAT ET AUCUNE GARANTIE EXPRESSE OU IMPLICITE N EST ACCORDEE Y COMPRIS DES GARANTIES CONCERNANT LA VALEUR MARCHANDE L APTITUDE DE LA PUBLICATION A REPONDRE A UNE UTILISATION PARTICULIERE OU LE FAIT QU ELLE NE SOIT PAS CONTREFAISANTE DE PRODUIT DE TIERS CE DENI DE GARANTIE NE S APPLIQUERAIT PAS DANS LA MESURE OU IL SERAIT TENU JURIDIQUEMENT NUL ET NON AVENU Gd tem Ca Adobe PostScript Important Note New Product Names As part of Sun s new developer product strategy we have changed the names of our development tools from Sun WorkShop to Forte Developer products The products as you can see are the same
136. ize determined when the file is opened The records are represented in the same way as records in disk files formatted records are followed by newlines unformatted records are preceded and followed by character counts In general there is no relation between Fortran records and tape blocks that is records can span blocks which can contain parts of several records The only exception is that Fortran does not write an unformatted record that spans blocks thus the size of the largest unformatted record is eight characters less than the block size The dd Conversion Utility An end of file record in Fortran maps directly into a tape mark In this respect Fortran files are the same as tape system files But since the representation of Fortran files on tape is the same as that used in the rest of UNIX naive Fortran programs cannot read 80 column card images on tape If you have an existing Fortran program and an existing data tape to read with it translate the tape using the dd 1 utility which adds newlines and strips trailing blanks Fortran Programming Guide May 2000 32 Example Convert a tape mt0 and pipe that to the executable ftnprg demo dd if dev rmt0 ibs 20b cbs 80 conv unblock ftnprg The getc Library Routine As an alternative to dd you can call the get c 3F library routine to read characters from the tape You can then combine the characters into a character variable and use internal I O to transfer forma
137. l qualifiers on the DOALL directive Use of these directives is explained in the section Explicit Parallelization on page 159 Appendix E of the Fortran User s Guide gives a detailed summary of all Fortran directives including these and Fortran 95 OpenMP Number of Threads The PARALLEL or OMP_NUM_THREADS environment variable controls the maximum number of threads available to the program Setting the environment variable tells the runtime system the maximum number of threads the program can use The default is 1 In general set the PARALLEL or OMP_NUM_THREADS variable to the available number of processors on the target platform The following example shows how to set it demo setenv PARALLEL 4 C shell or demo PARALLEL 4 Bourne Korn shell demo export PARALLEL In this example setting PARALLEL to four enables the execution of a program using at most four threads If the target machine has four processors available the threads will map to independent processors If there are fewer than four processors available some threads could run on the same processor as others possibly degrading performance The SunOS operating system command psrinfo 1M displays a list of the processors available on a system demo psrinfo 0 on line since 03 18 99 on line since 03 18 99 1 2 on line since 03 18 99 3 on line since 03 18 99 Chapter 10 SPARC Parallelization 1 Stacks Stack Sizes and Paralleli
138. lanation of the column labels followed by an index The gprof b option eliminates the explanatory text see the gprof 1 man page for other options that can be used to limit the amount of output generated In the graph profile each procedure subprogram procedure is presented in a call tree representation The line representing a procedure in its call tree is called the function line and is identified by an index number in the leftmost column within square brackets the lines above it are the parent lines the lines below it the descendant lines Chapter 8 Performance Profiling 3 parent line caller 1 parent line caller 2 n time function line function name descendant line called 1 descendant line called 2 The call graph profile is followed by a flat profile that provides a routine by routine overview An edited example of gprof output follows Note User defined functions appear with their Fortran names followed by an underscore Library routines appear with leading underscores 124 Fortran Programming Guide May 2000 The call graph profile granularity each sample hit covers 2 byte s for 0 08 of 12 78 seconds called total parents index Stim self descendents called self name index called total children 0 00 12 66 1 1 main 1 3 99 1 0 00 12 66 1 MAIN_ 3 0 92 10 99 1000 1000 diffr_ 4 0 62 0 00 2000 2001 code_ 9 0 11 0 00 1000 1000 shock_ 11 0 02 0 00 1000 1000 bndry_ 14
139. lates from one to the other so most Fortran programs do not have to recognize file descriptors Chapter 11 C Fortran Interface 9 Many C programs use a set of subroutines called standard I O or stdio Many functions of Fortran I O use standard I O which in turn uses operating system I O calls Some of the characteristics of these I O systems are listed in in the following table File Descriptors Opened for reading or Opened for writing or Opened for both Always unformatted Direct access if the physical file representation is direct access but can always be read sequentially Byte stream Integers from 0 1023 TABLE 11 3 Comparing I O Between Fortran and C Standard I O File Pointers Opened for reading or Opened for writing or Opened for both or Opened for appending See open 2 Always unformatted but can be read or written with format interpreting routines Direct access if the physical file representation is direct access but can always be read sequentially Byte stream Pointers to structures in the user s address space Fortran Units Opened for reading and writing Formatted or unformatted Direct or sequential Record Arbitrary nonnegative integers from 0 2147483647 Files Open Attributes Access Structure Form File Permissions C programs typically open input files for reading and output files for writing or for reading and writing A 7
140. le of the program Identify the most significant loops 3 Benchmark Determine that the serial test results are accurate Use these results and the performance profile as the benchmark 4 Parallelize Use a combination of options and directives to compile and build a parallelized executable 5 Verify Run the parallelized program on a single processor and single thread and check results to find instabilities and programming errors that might have crept in Set SPARALLEL or OMB_NUM_THREADS to 1 see page 151 6 Test Make various runs on several processors to check results 7 Benchmark Make performance measurements with various numbers of processors on a dedicated system Measure performance changes with changes in problem size scalability 8 Repeat steps 4 to 7 Make improvements to your parallelization scheme based on performance Chapter 10 SPARC Parallelization 147 Data Dependency Issues Not all loops are parallelizable Running a loop in parallel over a number of processors usually results in iterations executing out of order Moreover the multiple processors executing the loop in parallel may interfere with each other whenever there are data dependencies in the loop Situations where data depencency issues arise include recurrence reduction indirect addressing and data dependent loop iterations Recurrence Variables that are set in one iteration of a loop and used in a subsequent iteration introduce cr
141. ler s dependency analysis transforms a DO loop into a parallelizable task The compiler may restructure the loop to split out unparallelizable sections that will run serially It then distributes the work evenly over the available processors Each processor executes a different chunk of iterations For example with four CPUs and a parallelized loop with 1000 iterations each thread would execute a chunk of 250 iterations Processor 1 executes iterations 1 through 0 Processor 2 executes iterations 251 through 500 Processor 3 executes iterations 501 through 750 Processor 4 executes iterations 751 through 0 Chapter 10 SPARC Parallelization 3 Only loops that do not depend on the order in which the computations are performed can be successfully parallelized The compiler s dependence analysis rejects from parallelization those loops with inherent data dependencies If it cannot fully determine the data flow in a loop the compiler acts conservatively and does not parallelize Also it may choose not to parallelize a loop if it determines the performance gain does not justify the overhead Note that the compiler always chooses to parallelize loops using a static loop scheduling simply dividing the work in the loop into equal blocks of iterations Other scheduling schemes may be specified using explicit parallelization directives described later in this chapter Arrays Scalars and Pure Scalars A few definitions from the point of vie
142. ll fdate greeting print Hello print See how long subroutine startclock integer mytime time mytime time return end function wallclock integer wallclock character 24 greeting call startclock Running this program produces the following results Mon Feb 12 11 53 54 1996 in seconds 5 seconds 2 26550E 03 seconds demo TimeTest Hello Time Now Is See how long sleep 4 takes Elapsed time for sleep 4 was atan q 1000 times took demo Fortran Programming Guide May 2000 108 Formats Some 77 and 95 format edit descriptors can behave differently on other systems Here are some format specifiers that 77 treats differently than some other implementations A Alphanumeric conversion Used with character type data elements In FORTRAN 77 this specifier worked with any variable type 77 supports the older usage up to four characters to a word Suppresses newline character output R Sets an arbitrary radix for the I formats that follow in the descriptor SU Selects unsigned output for following I formats For example you can convert output to either hexadecimal or octal with the following formats instead of using the Z or O edit descriptors 10 FORMAT SU 16R 14 20 FORMAT SU 8R 14 Carriage Control Fortran carriage control grew out of the capabilities of the equipment used when Fortran was orig
143. lling depth is not easy to determine and some experimentation might be needed The example that follows shows how a simple loop might be unrolled to a depth of four with unrol1 4 the source code is not changed with this option Original Loop DO I 1 20000 1 1 1 END DO Unrolled by 4 compiles as if it were written DO I 1 19997 4 1 1 1 EMP1 1 1 1 1 1 1 EMP2 1 2 1 2 1 2 3 EMP Y I 3 A I 3 1 3 1 3 TEMP1 1 X I 1 2 X I 2 TEMP3 END DO This example shows a simple loop with a fixed loop count The restructuring is more complex with variable loop counts xtarget platform The performance of some programs might improve if the compiler has an accurate description of the target computer hardware When program performance is critical the proper specification of the target hardware could be very important This is especially true when running on the newer SPARC processors However for most programs and older SPARC processors the performance gain could be negligible and a generic specification might be sufficient Fortran Programming Guide May 2000 140 The Fortran 115075 Guide lists all the system names recognized by xtarget For any given system name for example ult ra2 for UltraSPARC II xtarget expands into a specific combination of xarch
144. lt for i 0 i lt count 1 p2arg In this example the C function and calling C routine must accommodate two initial extra arguments a pointer to the result string and the length of the string and one additional argument at the end of the list length of character argument Note that in the Fortran routine called from C it is necessary to explicitly add a final null character Fortran Programming Guide May 2000 202 Labeled COMMON Fortran labeled COMMON can be emulated in C by using a global struct TABLE 11 16 Emulating Labeled COMMON Fortran COMMON Definition 6 COMMON Definition COMMON BLOCK ALPHA NUM extern struct block float alpha int num extern struct block 100 main block_ alpha 32 block_ num 1 Note that the external name established by the C routine must end in an underscore to link with the block created by the Fortran program Note also that the C directive pragma pack may be needed to get the same padding as with Fortran Both 77 and 95 align data in COMMON blocks to at most 4 byte boundaries Sharing I O Between Fortran and C Mixing Fortran I O with C I O issuing I O calls from both C and Fortran routines is not recommended It is better to do all Fortran I O or all C I O not both The Fortran I O library is implemented largely on top of the C standard I O library Every open unit in a Fortran program has an associated st
145. ly The input report also contains a column to indicate whether a unit was memory mapped or not If mapped the number of mmap calls is recorded in parentheses in the second row of the pair The output report indicates block sizes formatting and access type A file opened for direct access shows its defined record length in parentheses in the second row of the pair Note Compiling with environment variable LD_LIBRARY_PATH set might disable I O profiling which relies on its profiling I O library being in a standard location Chapter 8 Performance Profiling 3 134 Fortran Programming Guide May 2000 Performance and Optimization This chapter considers some optimization techniques that may improve the performance of numerically intense Fortran programs Proper use of algorithms compiler options library routines and coding practices can bring significant performance gains This discussion does not discuss cache I O or system environment tuning Parallelization issues are treated in the next chapter Some of the issues considered here are Compiler options that may improve performance Compiling with feedback from runtime performance profiles Use of optimized library routines for common procedures m Coding strategies to improve performance of key loops The subject of optimization and performance tuning is much too complex to be treated exhaustively here However this discussion should provide the reader with a
146. m Note fast includes dalign and native These options may have unexpected side effects for some programs On No compiler optimizations are performed by the compilers unless a 0 option is specified explicitly or implicitly with macro options like fast In nearly all cases specifying an optimization level for compilation improves program execution performance On the other hand higher levels of optimization increase compilation time and may significantly increase code size For most cases level 03 is a good balance between performance gain code size and compilation time Level 04 adds automatic inlining of calls to routines contained in the same source file as the caller routine among other things See the Fortran 11507 5 Guide for further information about subprogram call inlining Chapter 9 Performance and Optimization 137 Level 05 adds more aggressive optimization techniques that would not be applied at lower levels In general levels above 03 should be specified only to those routines that make up the most compute intensive parts of the program and thereby have a high certainty of improving performance There is no problem linking together parts of a program compiled with different optimization levels PRAGMA OPT n Use the C PRAGMA SUN OPT n directive to set different optimization levels for individual routines in a source file This directive will override the on flag on the compiler command line
147. mand line help 17 common block maps 115 74 task common 166 Index 205 Index SYMBOLS A blank space 4 A abrupt underflow 85 agreement across routines 11 56 63 aliasing 115 align data types Fortran 95 vs C 186 data Fortran 77 vs C 185 errors across routines 11 56 63 analyzing performance 121 ar to create static library 51 55 arguments reference versus value C Fortran interface 188 array differences between C and Fortran 188 asa Fortran print utility 13 ASCII characters maximum characters in data types 111 B Bdynamic Bstatic options 57 binary I O 28 bindings POSIX 60 static or dynamic B d 57 Cray 176 OpenMP 159 Sun 159 parallelization summary 150 display to terminal Xlist 64 division by zero 83 dn dy options 57 DOALL directive 167 qualifiers 168 DOSERIAL directive 173 DOSERIAL directive 173 dynamic libraries See libraries dynamic E environment variables for parallelization 178 LD_LIBRARY_PATH 48 OGICALNAMEMAPPING 26 OMP_NUM_THREADS 151 PARALLEL 151 passed to program 23 with IOINIT 24 environment variables SUN_PROFDATA 130 equivalence block maps Xlist 74 er_print command 121 error messages listing with 11 568 72 suppress with Xlist 72 with error command 13 standard error accrued exceptions 97 establish a signal handler 94 event management dbx 78 exceptions
148. mode interval expression processing To use the Fortran 95 interval specific features specify xia or xinterval in the 95 command line For detailed information on interval arithmetic in Fortran 95 see the Interval Arithmetic Programming Reference Chapter 6 Floating Point Arithmetic 103 104 Fortran Programming Guide May 2000 Porting This chapter discusses the porting of programs from other dialects of Fortran to Sun compilers VAX VMS Fortran programs compile almost exactly as is with Sun 77 this is discussed further in the chapter on VMS extensions in the FORTRAN 77 Language Reference Manual Note Porting issues bear mostly upon Fortran 77 programs The Sun WorkShop Fortran 95 compiler 95 incorporates few nonstandard extensions and these are described in the Fortran User s Guide CHAPTER 7 Time and Date Functions Library functions that return the time of day or elapsed CPU time vary from system to system The following time functions are not supported directly in the Sun Fortran libraries but you can write subroutines to duplicate their functions Time of day in 10h format Date in A10 format Milliseconds of job CPU time Julian date in ASCII 105 The time functions supported in the Sun Fortran library are listed in the following Man Page time 3F date 3F fdate 3F idate 3F itime 3F ctime 3F lt ime 3F gmt ime 3F et ime 3F dt ime 3F date_and_
149. ms A number of good commercially published books on using make and SCCS are currently available including Managing Projects with make by Andrew Oram and Steve Talbott and Applying RCS and SCCS by Don Bolinger and Tan Bronson Both are from O Reilly amp Associates CHAPTER 3 Facilitating Program Builds With the make Utility The make utility applies intelligence to the task of program compilation and linking Typically a large application consists of a set of source files and INCLUDE files requiring linking with a number of libraries Modifying any one or more of the source files requires recompilation of that part of the program and relinking You can automate this process by specifying the interdependencies between files that make up the application along with the commands needed to recompile and relink each piece With these specifications in a file of directives make ensures that only the files that need recompiling are recompiled and that relinking uses the options and libraries you need to build the executable The following discussion provides a simple example of how to use make For a summary see make 1 The Makefile A file called makefile tells make in a structured manner which source and object files depend on other files It also defines the commands required to compile and link the files 35 For example suppose you have a program of four source files and the makefile demo 8 makefile common
150. n access it where it was last left preferably at the beginning of a file or past the end of file record However if your program terminates prematurely it may leave the tape positioned anywhere Use the SunOS operating system command mt 1 to reposition the tape appropriately Chapter 2 Fortran Input Output 3 Fortran 95 I O Considerations Sun WorkShop 6 Fortran 95 and Fortran 77 are I O compatible Executables containing intermixed 77 and 95 compilations can do I O to the same unit from both the 77 and 95 parts of the program However Fortran 95 provides some additional features ADVANCE NO enables nonadvancing I O as in write a ADVANCI Enter size read n m NAMELIST input features 95 allows the group name to be preceded by or amp on input The Fortran 95 standard accepts only 6 and this is what a NAMELIST write outputs 95 accepts as the symbol terminating an input group unless the last data item in the group is CHARACTER in which case the is treated as input data 95 allows NAMELIST input to start in the first column of a record ENCODE and DECODE are recognized and implemented by 595 just as they are by ETT 34 Fortran Programming Guide May 2000 Program Development This chapter briefly introduces two powerful program development tools make and SCCS that can be used very successfully with Fortran progra
151. n in Fortran specifically to serve as an example for similar user supplied routines Retrieve a copy from the following file a part of the FORTRAN 77 package installation opt SUNWspro lt release gt src ioinit f where lt release gt varies for each software release Contact your system adminstrator for details Command Line I O Redirection and Piping Another way to associate a physical file with a program s logical unit number is by redirecting or piping the preconnected standard I O files Redirection or piping occurs on the runtime execution command In this way a program that reads standard input unit 5 and writes to standard output unit 6 or standard error unit 0 can by redirection using lt gt gt gt gt amp amp 2 gt 2 gt amp 1 onthe command line read or write to any other named file ly Chapter 2 Fortran Input Output 5 This is shown in the following table TABLE 2 1 csh sh ksh_ Redirection and Piping on the Command Line Using Bourne or Korn Shell myprog lt mydata myprog gt myoutput myprog gt gt myoutput myprog 2 gt errorfile myprogl myprog2 myprogl 2 gt amp 1 myprog2 Using C Shell myprog lt mydata myprog gt myoutput myprog gt gt myoutput myprog gt amp errorfile myprogl myprog2 myprogl amp myprog2 Action Standard input read from mydata Standard output write overwrite myoutput Standard output
152. n off the parallelization options Verify that the program works correctly by compiling with 03 or 04 but without any parallelization Set the number of threads to one and compile with parallelization on run the program with the environment variable PARALLEL set to 1 If the problem disappears then you can assume it was due to using multiple threads a Check also for out of bounds array references by compiling with C Problems using autopar may indicate that the compiler is parallelizing something it should not Turn off reduction If you are using the reduction option summation reduction may be occurring and yielding slightly different answers Try running without this option m Use the DOSERIAL directive to selectively disable automatic parallelization of individual loops Use fsplit If you have many subroutines in your program use fsp1lit 1 to break them into separate files Then compile some files with and without parallel and use 77 or 95 to link the o files You must specify parallel on this link step See Fortran User s Guide section on consistent compiling and linking Execute the binary and verify results Repeat this process until the problem is narrowed down to one subroutine Use loopinfo Check which loops are being parallelized and which loops are not Use 8 dummy subroutine Create a dummy subroutine or function that does nothing Put calls to this subroutine
153. n the compiler detects a problem while explicitly parallelizing a loop The compiler may still parallelize the loop Chapter 10 SPARC Parallelization 161 The following table lists typical parallelization problems detected by the compiler TABLE 10 4 Explicit Parallelization Problems Warning Problem Parallelized Message Loop is nested inside another loop that is parallelized No No Loop is in a subroutine called within the body of a No No parallelized loop Jumping out of loop is allowed by a flow control No Yes statement Index variable of loop is subject to side effects Yes No Some variable in the loop has a loop carried Yes Yes dependency I O statement in the loop usually unwise because the Yes No order of the output is not predictable Example Nested loops CSPAR DOALL do 900 i 1 1000 Parallelized outer loop do 200 j 1 0 Not parallelized no warning 200 continue 900 continue Example A parallelized loop in a subroutine program main subroutine calc b y CSPAR DOALL CSPAR DOALL do 100 i 1 200 do 1 m 1 1000 call calc a x 1 continue vee return 100 continue end Loop 100 runs in parallel Loop 1 does not run in parallel Fortran Programming Guide May 2000 162 In the example the loop within the subroutine is not parallelized because the subroutine itself is run in parallel Example Jumping out of a loop CSPAR DOALL do i 1 1000 gt Not parallelized
154. n values of DOALL SAVELAST all private variables REDUCTIO Treat the variables v1 v2 as DOALL REDUCTION 1 02 reduction variables SCHEDTYPE Set the scheduling type to t DOALL SCHEDTYPE t PRIVATE varlist The PRIVATE varlist qualifier specifies that all scalars and arrays in the list varlist are private for the DOALL loop Both arrays and scalars can be specified as private In the case of an array each thread of the DOALL loop gets a copy of the entire array All other scalars and arrays referenced in the DOALL loop but not contained in the private list conform to their appropriate default scoping rules See page 160 Example Specify array a private in loop i CSPAR DOALL PRIVATE a do i 1 n 8 1 b i do j 2 n a j 8 7 1 b j 6 end do 1 f a end do 168 Fortran Programming Guide May 2000 SHARED varlist The SHARED varlist qualifier specifies that all scalars and arrays in the list varlist are shared for the DOALL loop Both arrays and scalars can be specified as shared Shared scalars and arrays can be accessed in all the iterations of a DOALL loop All other scalars and arrays referenced in the DOALL loop but not contained in the shared list conform to their appropriate default scoping rules Example Specify a shared variable CSPAR DOALL SHARED do i 1 n 8
155. nger needed and may degrade the portability of a program Two common restructurings are strip mining and loop unrolling Chapter 7 Porting 5 Strip Mining Fixed length vector registers on some architectures led programmers to manually strip mine the array computations in a loop into segments REAL TX 0 63 DO IOUTER 1 NX 64 DO IINNER 0 63 X IINNER AX IOUTER IINNER BX IOUTER IINNER 2 QX IOUTER IINNER TX IINNER 2 END DO END DO Strip mining is no longer appropriate with modern compilers the loop can be written much less obscurely as DO IX 1 N TX AX I BX 1 QOX I TX 2 END DO 116 Fortran Programming Guide May 2000 Loop Unrolling Unrolling loops by hand was a typical source code optimization technique before compilers were available that could perform this restructuring automatically A loop written as DO K 1 N 5 6 DO J 1 N DO I 1 N A I J A I J B I K C K J Bll Rely C K 1 J B I K 2 C K 2 J B I K 3 C K 3 J B I K 4 C K 4 J B I K 5 C K 5 J END DO END DO END DO DO KK K N DO J 1 N DO I 1 N A I J A I J B I KK C KK J END DO END DO END DO DO 1 N DO J 1 N DO 1 1 N A I J A I J B I K C K d END DO END DO END DO Chapter 7 Porting 117 Troublesho
156. nks of iterations may not be distributed uniformly to all available threads e If chunksize is not provided the compiler selects a value Example With 1000 iterations and chunksize of 4 each thread gets 4 iterations at a time until all iterations are processed Use factoring scheduling for this DO loop With n iterations initially and k threads all the iterations are divided into groups of chunks of iterations starting with the first group of k chunks of n 2k iterations each the second group has k chunks of n 4k iterations and so on The chunksize for each group is the remaining iterations divided by 2k Because FACTORING is dynamic there is no guarantee that each thread gets exactly one chunk from each group At least m iterations must be assigned to each thread There can be one final smaller residual chunk If m is not provided the compiler selects a value Example With 1000 iterations and FACTORING 3 and 4 threads the first group has 4 chunks of 125 iterations each the second has 4 chunks of 62 iterations each the third group has 4 chunks of 31 iterations each and so on Use guided self scheduling for this DO loop With n iterations initially and k threads then e Assign n k iterations to the first thread Assign the remaining iterations divided by k to the second thread and so on until all iterations have been processed GSS is dynamic so there is no guarantee that chunks of iterations are uniformly
157. nt Dependencies The 77 compiler may automatically eliminate a reference that appears to create a data dependency in the loop One of the many such transformations makes use of private versions of some of the arrays Typically the compiler does this if it can determine that such arrays are used in the original loops only as temporary storage Example Using autopar with dependencies eliminated by private arrays parameter n 1000 real a n b n c n n do i 1 1000 lt Parallelized do k 1 n a k b k 2 0 end do do j 1 n c i j a j 3 end do end do end In the example the outer loop is parallelized and run on independent processors Although the inner loop references to array a appear to result in a data dependency the compiler generates temporary private copies of the array to make the outer loop iterations independent Inhibitors to Automatic Parallelization Under automatic parallelization the compilers do not parallelize a loop if The DO loop is nested inside another DO loop that is parallelized Flow control allows jumping out of the DO loop A user level subprogram is invoked inside the loop An I O statement is in the loop Calculations within the loop change an aliased scalar variable Chapter 10 SPARC Parallelization 5 Nested Loops In a multithreaded multiprocessor environment it is most effective to parallelize the outermost loop in a loop nest rather than the innermost Beca
158. ntical to the former Sun Performance WorkShop Personal Edition except that the Fortran compilers in that product no longer support the creation of automatically parallelized or explicit directive based parallel code This capability is still supported in the Fortran compilers in Forte for High Performance Computing We appreciate your continued use of our development products and hope that we can continue to fulfill your needs into the future Contents Preface 1 Introduction 11 Standards Conformance 11 Features of the Fortran Compilers 2 Other Fortran Utilities 13 Debugging Utilities 13 Sun Performance Library 14 Interval Arithmetic 14 Man Pages 14 READMEs 16 Command Line Help 17 Fortran Input Output 19 Accessing Files From Within Fortran Programs 9 Accessing Named Files 19 Opening Files Without a Name 1 Preconnected Units 1 Opening Files Without an OPEN Statement 22 Passing File Names to Programs 2 77 VAX VMS Logical File Names 6 Direct I O 7 Binary I O 8 Internal Files 9 77 TapeI O 1 Using TOPEN Routines 31 Fortran Formatted I O for Tape 2 Fortran Unformatted I O for Tape 2 Tape File Representation 32 End of File 33 Multifile Tapes 3 Fortran 95 I O Considerations 34 3 Program Development 35 Facilitating Program Builds With the make Utility 35 The Makefile 35 make Command 7 Macros 37 Overriding of Macro Values 38 Suffix Rulesinmake 8 Version Tracking and
159. numerical programs there are many potential sources for computational error The computational model could be wrong The algorithm used could be numerically unstable The data could be ill conditioned The hardware could be producing unexpected results Finding the source of the errors in a numerical computation that has gone wrong can be extremely difficult The chance of coding errors can be reduced by using commercially available and tested library packages whenever possible Choice of algorithms is another critical issue Using the appropriate computer arithmetic is another This chapter makes no attempt to teach or explain numerical error analysis The material presented here is intended to introduce the IEEE floating point model as implemented by Sun WorkShop Fortran compilers 81 CHAPTER 6 IEEE Floating Point Arithmetic IEEE arithmetic is a relatively new way of dealing with arithmetic operations that result in such problems as invalid division by zero overflow underflow or inexact The differences are in rounding handling numbers near zero and handling numbers near the machine maximum The IEEE standard supports user handling of exceptions rounding and precision Consequently the standard supports interval arithmetic and diagnosis of anomalies IEEE Standard 754 makes it possible to standardize elementary functions like exp and cos to create high precision arithmetic and to couple numerical and symbolic algeb
160. o demo cat intern2 f LINE 4 16 7 This is our internal file 12341234 Tf OF SBT Bly hf 2 ON EBL BQ Nf BY Aar 83s 8B h f 4 84 84 214 I J K L M N J K Lb M N ER 1 LINE LINE LINE LINE INE CHARACT DA DA DA DATA READ PRINT END D lt 77 silent intern2 f 83 83 82 Fortran Programming Guide May 2000 a out 81 82 demo demo 81 demo 30 Example Direct access read from an internal file one record f77 only demo cat intern3 f CHARACTER LINE 4 16 This is our internal file 12341234 DATA LINE 1 81 81 DATA LINE 2 82 82 DATA LINE 3 83 83 DATA LINE 4 84 84 READ LINE FMT 20 REC 3 M N 20 FORMAT 14 14 PRINT END demo 77 silent intern3 f demo a out 83 83 demo 77 Tape I O Most typical Fortran I O is done to disk files However by associating a logical unit number to a physically mounted tape drive via the OPEN statement it is possible to do I O directly to tape It could be more efficient to use the TOPEN routines rather than Fortran I O statements to do I O on magnetic tape Using TOPEN Routines With the nonstandard tape I O package see topen 3F you can transfer blocks between the tape drive and buffers declared as Fortran char
161. o call graph No expansion of include files File Types The checking process recognizes all the files in the compiler command line that end in f 90 95 for F F95 or o The o files supply the process with information regarding only global names such as subroutine and function names Analysis Files 1n Files Programs compiled with Xlist options have their analysis data built into the binary files automatically This enables global program checking over programs in libraries Alternatively the compiler will save individual source file analysis results into files with a f1n suffix if the Xlist 1ndir option is also specified dir indicates the directory to receive these files demo 577 Xlistfln tmp f Chapter5 Program Analysis and Debugging 5 Some Examples of 11 86 and Global Program Checking Here is a listing of the Repeat f source code used in the following examples PROGRAM repeat pnl REAL LOC rpi CALL subril pnl CALL nwfrk pni PRINT pnl END PROGRAM repeat SUBROUTINE subrl x TE GE 14 0 THEN CAL Ssubrl 9 5 END IF END SUBROUTINE nwfrk ix EXTERNAL fork INTEGER prnok fork PRINT prnok ix fork END INTEGER FUNCTION prnok x T x LOC x SUBROUTINE unreach_sub 1 INT demo cat Repeat f prnok D CALL sleep END
162. o the library routines start_iostats and end_iostats around the parts of the program you wish to measure A call to end_iostats is required if the program terminates with an END or STOP statement rather than a CALL EXIT Note The I O statements profiled are READ WRITE PRINT OPEN CLOSE INQUIRE BACKSPACE ENDFILE and REWIND The runtime system opens stdin stdout and stderr before the first executable statement of your program so you must explicitly reopen these units after the call to start_iostats Example Profile stdin stdout and stderr EXTERNAL start_iostats CALL start_iostats OPEN 5 OPEN 6 OPEN 0 If you want to measure only part of the program call end_iostats to stop the process A call to end_iostats may also be required if your program terminates with an END or STOP statement rather than CALL EXIT The program must be compiled with the pg option When the program terminates the I O profile report is produced on the file name io_stats where name is the name of the executable file Chapter 8 Performance Profiling 1 direct seq seq seq seq seq dir 12 dir 12 map 4 cnt 8 rec len fmt Yes Yes Yes Yes No Yes No dev DO O 6 6 CO CVO 6 OGO 6 7 EZT 7 wail Here is an example demos 77 o myprog pg silent m
163. of this directory depends on where your software was installed The path is install_directory SUNWspro READMEs In a normal install install_directory is opt TABLE 1 1 READMEs of Interest Describes new and changed features known limitations documentation errata for this release of the FORTRAN 77 compiler 77 new and changed features known limitations documentation errata for this release of the Fortran 95 compiler 95 overview of fpp features and capabilities overview of the interval arithmetic features in 95 optimized and specialized math libraries available using the performance profiling tools prof gprof and Leov libraries and executables that can be redistributed under the terms of the End User License compiling for 64 bit Solaris operating environments overview of the Sun Performance Library README File fortran_77 fortran_95 fpp_readme interval_arithmetic math_libraries profiling_tools runtime_libraries 64bit_Compilers performance_library The READMEs for all compilers are easily accessed by the xhelp readm command line option For example the command 95 xhelp readme will display the fortran_95 README file directly Fortran Programming Guide May 2000 16 Command Line Help You can view very brief descriptions of the 77 and 90 command line options by invoking the compiler s help option as shown below f77 help or f95
164. on to check across routines for consistency of arguments COMMON blocks and so on Sun WorkShop Provides a visual debugging environment based on dbx and includes a data visualizer and performance data collector Chapter1 Introduction 13 Sun Performance Library The Sun Performance Library is a library of optimized subroutines and functions for computational linear algebra and Fourier transforms It is based on the standard libraries LAPACK BLAS FFTPACK VFFTPACK and LINPACK Each subprogram in the Sun Performance Library performs the same operation and has the same interface as the standard library versions but is generally much faster and possibly more accurate See the performance_library README file and the Sun Performance Library User s Guide for details Interval Arithmetic This release of the Fortran 95 compiler introduces two new compiler flags xia and xinterval that enable the compiler to recognize new language extensions and generate the appropriate code to implement interval arithmetic computations See the Interval Arithmetic Programming Reference for details Man Pages On line manual man pages provide immediate documentation about a command function subroutine or collection of such things Sun WorkShop man pages are located in install_directory SUNWspro man Ina normal install of the Sun WorkShop install_directory is opt Add this path to your MANPATH environment variable to access the
165. onetwo profile one f two f demo cat one f tcov two f tcov program one 1 lt 600 0 10 lt call two i end do 1 gt end fess etc demo Environment variables SUN_PROFDATA and SUN_PROFDATA_DIR can be used to specify where the intermediary data collection files are kept These are the d and tcovd files created by old and new style tcov respectively Each subsequent run accumulates more coverage data into the tcovd file Data for each object file is zeroed out the first time the program is executed after the corresponding source file has been recompiled Data for the entire program is zeroed out by removing the tcovd file These environment variables can be used to separate the collected data from different runs With these variables set the running program writes execution data to the files in SUN_PROFDATA_DIR SUN_PROFDATA Similarly the directory that tcov reads is specified by tcov x SUN_PROFDATA If SSUN_PROFDATA_DIR is set tcov will prepend it looking for files in SUN_PROFDATA_DIR SUN_PROFDATA and not in the working directory For the details see the tcov 1 man page Fortran Programming Guide May 2000 130 77 I O Profiling You can obtain a report about how much data was transferred by your program For each Fortran unit the report shows the file name the number of I O statements the number of bytes and some statistics on these items To obtain an I O profiling report insert calls t
166. ons Sun specific capabilities such as pragmas the Lint tool parallelization migration to a 64 bit operating system and ANSI ISO compliant C Describes the C libraries including C Standard Library Tools h class library Sun WorkShop Memory Monitor Iostream and Complex Provides guidance on migrating code to this version of the Sun WorkShop C compiler Explains how to use the new features to write more efficient programs and covers templates exception handling runtime type identification cast operations performance and multithreaded programs Provides information on command line options and how to use the compiler Describes how the Sun WorkShop Memory Monitor solves the problems of memory management in C and C This manual is only available through your installed product see opt SUNWspro docs index html and not at the docs sun com Web site Provides details about the library routines supplied with the Fortran compiler Document Title C User s Guide C Library Reference C Migration Guide C Programming Guide C User s Guide Sun WorkShop Memory Monitor User s Manual Fortran Library Reference Document Collection Forte C 6 Sun WorkShop 6 Compilers C Forte C 6 Sun WorkShop 6 Compilers C Forte for High Performance Computing 6 Sun WorkShop 6 Compilers Fortran 77 95 Related Sun WorkShop 6 Documentation by Document Colle
167. onversion utility 32 direct I O 27 to internal files 29 end of file on tape 33 Fortran 95 considerations 34 in parallelized loops 163 inhibiting optimization 142 inhibiting parallelization 161 initialize for FORTRAN 77 from C 204 internal I O 29 logical unit 19 opening files 21 preconnect units 0 5 6 from C 204 preconnected units 21 profiling 131 random I O 27 redirection and piping 25 scratch files 21 tape 31 multifile 33 interface problems checking for Xlist 64 internal files 29 interval arithmetic 14 103 INTERVAL declaration 103 208 Fortran Programming Guide May 2000 number of processors 151 NUMCHUNKS directive qualifier 178 0 OMP_NUM_THREADS 151 OMP_NUM_THREADS environment variable 179 OpenMP 159 optimization See performance options debugging useful 77 for optimization 136 to 141 parallelization 149 order of linker libraries search 47 linker search 47 1 Ldir options 49 output to terminal 11 56 64 118 report file 73 overflow excessive 102 floating point arithmetic 83 locating example 99 with reduction operations 157 P PARALLEL environment variable 151 179 parallelization 145 to 182 automatic 153 154 criteria 154 CALL loops with 160 chunk distribution 154 data dependency 148 data race 165 debugging 179 definitions 154 directives Cray style 176 OpenMP 159 Sun style 159 directives summary 150
168. ormatted READ and WRITE statements by transferring and converting data from one character object to another data object No file I O is performed When using internal files The name of the character object receiving the data appears in place of the unit number on a WRITE statement On a READ statement the name of the character object source appears in place of the unit number A constant variable or substring object constitutes a single record in the file With an array object each array element corresponds to a record 77 77 extends direct I O to internal files The ANSI standard includes only sequential formatted I O on internal files This is similar to direct I O on external files except that the number of records in the file cannot be changed In this case a record is a single element of an array of character strings Each sequential READ or WRITE statement starts at the beginning of an internal file Chapter 2 Fortran Input Output 9 Example Sequential formatted read from an internal file one record only 13 14 N1 N2 This codeline reads the internal file X X cat internl f ER X 80 RJE N1 N2 HARACTI EAD X RITE C R READ W D 77 silent o tstintern internl f tstintern Example Sequential formatted read from an internal file three records 99 demo demo demo 12 99 TD dem
169. ors conform to the IEEE standard in a combination of hardware and software support for different aspects x86 processors conform to the IEEE standard entirely through hardware support Chapter 6 Floating Point Arithmetic 85 The newest SPARC processors contain floating point units with integer multiply and divide instructions and hardware square root Best performance is obtained when the compiled code properly matches the runtime floating point hardware The compiler s xt arget option permits specification of the runtime hardware For example xtarget ultra would inform the compiler to generate object code that will perform best on an UltraSPARC processor On SPARC platforms The utility fpversion displays which floating point hardware is installed and indicates the appropriate xtarget value to specify This utility runs on all Sun SPARC architectures See fpversion 1 the Sun WorkShop Fortran User s Guide regarding xtarget and the Numerical Computation Guide for details Flags and ieee_flags The ieee_flags function is used to query and clear exception status flags It is part of the 1ibsunmath library shipped with Sun compilers and performs the following tasks Controls rounding direction and rounding precision m Checks the status of the exception flags m Clears exception status flags The general form of a call to ieee_flags is flags ieee_flags action mode in out Each of the four arguments is a st
170. oss iteration dependencies or recurrences Recurrence in a loop requires that the iterations to be executed in the proper order For example DO I 2 N A I 1 B I C I END DO requires the value computed for A T in the previous iteration to be used as A I 1 in the current iteration To produce correct results iteration I must complete before iteration I 1 can execute Reduction Reduction operations reduce the elements of an array into a single value For example summing the elements of an array into a single variable involves updating that variable in each iteration DO 1 N SUM SUM A I B T END DO If each processor running this loop in parallel takes some subset of the iterations the processors will interfere with each other overwriting the value in SUM For this to work each processor must execute the summation one at a time although the order is not significant Certain common reduction operations are recognized and handled as special cases by the compiler Fortran Programming Guide May 2000 148 Indirect Addressing Loop dependencies can result from stores into arrays that are indexed in the loop by subscripts whose values are not known For example indirect addressing could be order dependent if there are repeated values in the index array DO L A ID L A L B L END DO In the example repeated values in ID cause elements in A to be overwri
171. oting Here are a few suggestions for what to try when programs ported to Sun Fortran do not run as expected Results Are Close but Not Close Enough Try the following m Pay attention to the size and the engineering units Numbers very close to zero can appear to be different but the difference is not significant especially if this number is the difference between two large numbers such as the distance across the continent in feet as calculated on two different computers For example 1 9999999e 30 is very near 9 9992112e 33 even though they differ in sign VAX math is not as good as IEEE math and even different IEEE processors may differ This is especially true if the mathematics involves many trigonometric functions These functions are much more complicated than one might think and the standard defines only the basic arithmetic functions There can be subtle differences even between IEEE machines Review the Floating Point Arithmetic chapter in this Guide Try running witha call nonstandard_arithmetic Doing so can also improve performance considerably and make your Sun workstation behave more like a VAX system If you have access to a VAX or some other system run it there also It is quite common for many numerical applications to produce slightly different results on each floating point implementation Check for NaN Inf and other signs of probable errors See the Floating Point Arithmetic chapter in this Guide
172. pilers Sun WorkShop compilers can detect loops that might be safely and profitably parallelized automatically However in most cases the analysis is necessarily conservative due to the concern for possible hidden side effects A display of which loops were and were not parallelized can be produced by the loopinfo option By inserting source code directives before loops you can explicitly influence the analysis controlling how a specific loop is or is not to be parallelized However it then becomes your responsibility to ensure that such explicit parallelization of a loop does not lead to incorrect results Both 77 and 95 support two styles of explicit parallization directives Sun style and Cray style In addition 95 supports the OpenMP 1 1 directives and runtime library routines Explicit parallelization in Fortran is described on page 159 Speedups What to Expect If you parallelize a program so that it runs over four processors can you expect it to take roughly one fourth the time that it did with a single processor a fourfold speedup Probably not It can be shown by Amdahl s law that the overall speedup of a program is strictly limited by the fraction of the execution time spent in code running in parallel This is true no matter how many processors are applied In fact if p is the percentage of the total program execution time that runs in parallel mode the theoretical speedup limit is 100 100 p therefore
173. port for 64 bit Solaris environments on UltraSPARC platforms Call by value VAL implemented in both 77 and 95 Interoperability between Fortran 77 and Fortran 95 programs and object binaries m Interval Arithmetic expressions in 95 Fortran Programming Guide May 2000 12 Other Fortran Utilities The following utilities provide assistance in the development of software programs in Fortran Sun WorkShop Performance Analyzer In depth performance analysis tool for single threaded and multi threaded applications See analyzer 1 asa This Solaris utility is a Fortran output filter for printing files that have Fortran carriage control characters in column one Use asa to transform files formatted with Fortran carriage control conventions into files formatted according to UNIX line printer conventions See asa 1 fpp A Fortran source code preprocessor See fpp 1 fsplit This utility splits one Fortran file of several routines into several files each with one routine per file Use sp1it FORTRAN 77 or Fortran 95 source files See fsp1it 1 Debugging Utilities The following debugging utilities are available error f77 only A utility to merge compiler error messages with the Fortran 77 source file This utility is included if you do a developer install rather than an end user install of Solaris it is also included if you install the SUNWbtool package Xlist A compiler opti
174. process control dbx 78 processors or threads 151 program analysis 63 to 79 program development tools 35 to 42 make 35 SCCS 39 psrinfo SunOS command 151 pure scalar variable defined 154 R random I O 27 README files 16 READONLY directive qualifier 169 recurrence data dependency 148 redistributable libraries 62 reduction operations data dependency 148 numerical accuracy 157 recognized by the compiler 157 REDUCTION directive qualifier 171 loop scheduling 172 loop scheduling Cray 178 scoping rules 160 scoping variables with Cray directives 176 inhibitors to automatic parallelization 155 to explicit parallelization 161 nested loops 156 OpenMP 159 options summary 149 private and shared variables 160 reduction operations 156 specifying number of processors 151 specifying stack sizes 152 stackvar option 152 steps to 147 what to expect 146 performance optimization 135 to 144 choosing options 135 further reading 144 hand restructurings and portability 115 inhibitors 142 inlining calls 137 libraries 141 loop unrolling 139 On options 137 OPT n directive 138 specifying target hardware 140 with runtime profile 138 profiling 121 to 133 gprof 123 I O 131 overhead 127 tcov 127 time 122 Sun Performance Library 14 performance analyzer 121 performance library 142 pic and PIC options 56 porting 105 to 119 accessing files 110 aliasing 115 carri
175. r PRINT statement A Fortran logical unit can be associated with a specific named file through the OPEN statement Also certain preconnected units are automatically associated with specific files at the start of program execution Accessing Named Files The OPEN statement s FILE specifier establishes the association of a logical unit to a named physical file at runtime This file can be pre existing or created by the program See the Sun FORTRAN 77 Language Reference Manual for a full discussion of the OPEN statement CHAPTER 2 The FILE specifier on an OPEN statement may specify a simple file name FILE myfile out ora file name preceded by an absolute or relative directory path FILE Amber Qproj myfile out Also the specifier may be a character constant variable or character expression Library routines can be used to bring command line arguments and environment variables into the program as character variables for use as file names in OPEN statements See man page entries for getarg 3F and getenv 3F for details these and other useful library routines are also described in the Fortran Library Reference The following example Get FilNam f shows one way to construct an absolute path file name from a typed in name The program uses the library routines GETENV LNBLNK and GETCWD to return the value of the HOME environment variable find the last non blank in the string and
176. raic computation IEEE arithmetic offers users greater control over computation than does any other kind of floating point arithmetic The standard simplifies the task of writing numerically sophisticated portable programs Many questions about floating point arithmetic concern elementary operations on numbers For example What is the result of an operation when the infinitely precise result is not representable in the computer hardware m Are elementary operations like multiplication and addition commutative Another class of questions concerns floating point exceptions and exception handling What happens if you a Multiply two very large numbers with the same sign Divide nonzero by zero a Divide zero by zero In older arithmetic models the first class of questions might not have the expected answers while the exceptional cases in the second class might all have the same result the program aborts on the spot or proceeds with garbage results The standard ensures that operations yield the mathematically expected results with the expected properties It also ensures that exceptional cases yield specified results unless the user specifically makes other choices For example the exceptional values Inf Inf and NaN are introduced intuitively big big Inf Positive infinity big big Inf Negative infinity num 0 0 Inf Where num gt 0 0 num 0 0 Inf Where num lt 0 0 0 0 0 0 NaN Not a Number Fortr
177. rams was actually called This situation can be improved by optimizing the way library routines are distributed into compilable source files Still only those library modules actually referenced by the program are loaded into the executable Order matters when linking static libraries The linker processes its input files in the order in which they appear on the command line left to right When the linker decides whether or not to load an element from a library its decision is determined by the library elements that it has already processed This order is not only dependent on the order of the elements as they appear in the library file but also on the order in which the libraries are specified on the compile command line Example If the Fortran program is in two files main and crunch f and only the latter accesses a library it is an error to reference that library before crunch f or crunch o demos 77 main f lmylibrary crunch f o myprog Incorrect demos 77 main f crunch f lmylibrary o myprog Correct Creation of a Simple Static Library Suppose that you can distribute all the routines in a program over a group of source files and that these files are wholly contained in the subdirectory test_lib Suppose further that the files are organized in such a way that they each contain a single principal subprogram that would be called by the user program along with any helper routines that the subprogram mi
178. raries that they come from Fortran Programming Guide May 2000 44 Example Using m to generate a load map demo setenv LD_OPTIONS demos 77 any f any f MAIN LINK EDITOR MEMORY MAP output input virtual section section address size interp 100d4 11 interp 100d4 11 null hash 100e8 2e8 hash 100e8 268 null dynsym 103d0 650 dynsym 103d0 650 null dynstr 10a20 366 dynstr 10a20 366 null text 10c90 170 text 10c90 00 opt SUNWspro lib crti o text 10c90 f4 opt SUNWspro lib crtl o text 10d84 00 opt SUNWspro lib values xi o text 10d88 d20 sparse o Listing Other Information Additional linker debugging features are available through the linker s Dkeyword option A complete list can be displayed using Dhelp Example List linker debugging aid options using the Dhelp option demo ld Dhelp debug args display input argument processing debug bindings display symbol binding debug detail provide more information debug entry display entrance criteria descriptors demo For example the Dfiles linker option lists all the files and libraries referenced during the link process Chapter 4 Libraries 45 demo setenv LD_OPTIONS Dfiles demo 77 direct f direct f MAIN direct debug file opt SUNWspro lib crti o ET_REL debug file opt SUNWspro lib crtl o ET_REL debug file opt SUNWspro li
179. ring The input is action mode and in The output is out and flags icee_flags is an integer valued function Useful information is returned in flags as a set of 1 bit flags Refer to the man page for icee_flags 3m for complete details Fortran Programming Guide May 2000 86 Possible parameter values are shown in the following table TABLE 6 1 ieee_flags action mode in out Argument Values in out nearest tozero negative positive extended double single inexact division underflow overflow invalid all common action mode get direction set precision clear exception clearall The precision mode is available only on x86 platforms Note that these are literal character strings and the output parameter out must be at least CHARACTER 9 The meanings of the possible values for in and out depend on Rounding direction the action and mode they are used with These are summarized in the following table TABLE 6 2 ieee_flags Argument Meanings Refers to Rounding precision All five exceptions Common exceptions invalid division overflow Chapter 6 Floating Point Arithmetic 7 Value of in and out nearest tozero negative positive extended double single inexact division underflow overflow Exceptions invalid all common For example to determine what is the highest priority exception that has a flag raised pass the input argument in as the null string
180. rriding of Macro Values The initial values of make macros can be overridden with command line options to make For example FFLAGS u OBJ pattern o computepts o startupcore o pattern S OBJ 77 S FFLAGS OBJ lcore77 1 lsunwindow lpixrect o pattern pattern o pattern f commonblock 77 S FFLAGS c pattern f computepts o 77 S FFLAGS c computepts f Now a simple make command without arguments uses the value of FFLAGS set above However this can be overridden from the command line demo make FFLAGS u 0 Here the definition of the FFLAGS macro on the make command line overrides the makefile initialization and both the O flag and the u flag are passed to 77 Note that FFLAGS can also be used on the command to reset the macro to a null string so that it has no effect Suffix Rules in make To make writing a makefile easier make will use its own default rules depending on the suffix of a target file Recognizing the f suffix make uses the 77 compiler passing as arguments any flags specified by the FFLAGS macro the 6 flag and the name of the source file to be compiled Fortran Programming Guide May 2000 38 The example below demonstrates this rule twice OBJ pattern o computepts o startupcore o FFLAGS u pattern OBJ 77 S OBJ 1002677 1 lsunwindow lpixrect o pattern pattern o pattern f commonblock 77 S FFLAGS c pattern f
181. rs COMP and COMPLEX 32 in 550 f1 2 and accessible to C programs preventing such interoperability between Fortran and C on SPARC V9 platforms for this case Returning a CHARACTER String Passing strings between C and Fortran routines is not encouraged However a Fortran character string valued function is equivalent to a C function with two additional first arguments data address and string length The general pattern for the Fortran function and its corresponding C function is C function void c_ result length a1 an char result long length Chapter 11 C Fortran Interface 201 Fortran function CHARACTER n FUNCTION C al an Here is an example TABLE 11 15 A Function Returning a CHARACTER String C calls Fortran void fstr_ char long char Leng Long char sbf 9 123456789 char p2rslt sbf int rslt_len sizeof sbf char ch int n 4 int ch_len sizeof ch make n copies of ch in sbf fstr_ p2rslt 2816 Len amp ch amp n ch_len FUNCTION FSTR C N CHARACTER FSTR C FSTR DO I 1 N FSTR I I C END DO FSTR N 1 N 1 CHAR 0 END Fortran calls C CHARACTER STRING 16 CSTR 9 STRING STRING 123 CSTR 9 void cstr_ char p2rslt long rslt_len char p2arg long p2n long arg_len return n copies of arg int Gounjt T Cher ep count p2n cp p2rs
182. s and cross references with no page breaks for easier on screen viewing output report file 15 name Rename the 560 11 Use Xlisto to rename the generated report output file A space between o and name is required With Xlisto name the output is to name lst To display directly to the screen use the command Xlisto dev tty Chapter5 Program Analysis and Debugging 3 77 Xlists Suppress unreferenced identifiers Use Xlists to suppress from the cross reference table any identifiers defined in the include files but not referenced in the source files This suboption has no effect if the suboption XlistI is used The default is not to show the occurrences in include or INCLUDE files 77 Xlistvn Set level of checking strictness nis 1 2 3 or 4 The default is 2 Xlistv2 m Xlistvl Shows the cross checked information of all names in summary form only with no line numbers This is the lowest level of checking strictness syntax errors only m Xlistv2 Shows cross checked information with summaries and line numbers This is the default level of checking strictness and includes argument inconsistency errors and variable usage errors m Xlistv3 Shows cross checking with summaries line numbers and common block maps This is a high level of checking strictness and includes errors caused by incorrect usage of data types in common blocks in different subprograms m Xlistv4 Shows
183. s in the margin Although these two versions of tcov are essentially the same as far as the Fortran user is concerned most of the enhancements apply to C programs there will be some performance improvement with the newer style Chapter 8 Performance Profiling 127 Note The code coverage report produced by tcov will be unreliable if the compiler has inlined calls to routines The compiler inlines calls whenever appropriate at optimization levels above 03 and according to the inline option With inlining the compiler replaces a call to a routine with the actual code for the called routine And since there is no call references to those inlined routines will not be reported by tcov Therefore to get an accurate coverage report do not enable compiler inlining Old Style tcov Coverage Analysis Compile the program with the a or xa option This produces the file STCOVDIR file d for each source f file in the compilation If environment variable TCOVDIR is not set at compile time the d files are stored in the current directory Run the program execution must complete normally This produces updated information in the d files To view the coverage analysis merged with the individual source files run tcov on the source files The annotated source files are named TCOVDIR file tcov for each source file The output produced by tcov shows the number of times each statement was actually executed Statements that were not
184. s that return the total time used by the calling process See the man pages for dt ime 3F and et ime 3F Also gprof can give misleading results A well known limitation is that gprof cannot differentiate time spent in a function called from more than one caller For example it may be that function FU takes much more time when called from routine BAR than from any other routine and knowing this could suggest to you a significant restructuring of the program and better performance Unfortunatly the results shown by gprof average the total time spent in FU over all calls obscuring this valuable bit of information The Sun WorkShop Performance Analyzer provides much more detailed and useful information if you intend to do serious performance analysis of a program and should be used instead The t cov Profiling Command The tcov 1 command when used with programs compiled with the a or xprofile tcov options produces a statement by statement profile of the 8 source code showing which statements executed and how often It also gives a summary of information about the basic block structure of the program There are two implementations of tcov coverage analysis The original tcov is invoked by the a or xa compiler options Enhanced statement level coverage is invoked by the xprofile tcov compiler option and the x tcov option In either case the output is a copy of the source files annotated with statement execution count
185. sS amp Sun microsystems Fortran Programming Guide Sun WorkShop 6 Fortran 95 Fortran 77 Sun Microsystems Inc 901 San Antonio Road Palo Alto CA 94303 U S A 650 960 1300 Part No 806 3593 10 May 2000 Revision A Send comments about this document to docfeedback sun com Copyright 2000 Sun Microsystems Inc 901 San Antonio Road Palo Alto CA 94303 4900 USA All rights reserved This product or document is distributed under licenses restricting its use copying distribution and decompilation No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors if any Third party software including font technology is copyrighted and licensed from Sun suppliers Parts of the product may be derived from Berkeley BSD systems licensed from the University of California UNIX is a registered trademark in the U S and other countries exclusively licensed through X Open Company Ltd For Netscape Netscape Navigator and the Netscape Communications Corporation logo the following notice applies Copyright 1995 Netscape Communications Corporation All rights reserved Sun Sun Microsystems the Sun logo docs sun com AnswerBookz2 Solaris SunOS JavaScript SunExpress Sun WorkShop Sun WorkShop Professional Sun Performance Library Sun Performance WorkShop Sun Visual WorkShop and Forte are trademarks registered trademarks or service marks
186. se Sun WorkShop man pages See the Fortran User s Guide for details You can display a man page by running the command demo man topic Fortran Programming Guide May 2000 14 Throughout the Fortran documentation man page references appear with the topic name and man section number 77 1 is accessed with man 77 Other sections denoted by iecee_flags 3M for example are accessed using the s option on the man command demo man s 3M ieee_flags Man pages for the Fortran library routines are found in section 3F The following lists man pages of interest to Fortran user The Fortran compilers command line options Sun WorkShop Performance Analyzer Fortran carriage control print output post processor Command line interactive debugger Fortran source code pre processor C source code pre processor Pre processor splits Fortran 77 routines into single files Examine set or clear floating point exception bits Handle floating point exceptions Math library error handling routine Incremental link editor for object files Link editor for object files Chapter1 Introduction 15 77 1 and 95 1 analyzer 1 asa 1 dbx 1 fpp 1 cpp 1 fsplit 1 ieee_flags 3M ieee_handler 3M matherr 3M ila 1 1a 1 README s The READMEs directory contains files that describe new features software incompatibilities bugs and information that was discovered after the manuals were printed The location
187. sharing does not cause correctness problems The compiler does not synchronize on updates or accesses to shared variables If you specify a variable as private in one loop and its only initialization is within some other loop the value of that variable may be left undefined in the loop Subprogram Call in a Loop A subprogram call in a loop or in any subprograms called from within the called routine may introduce data dependencies that could go unnoticed without a deep analysis of the data and control flow through the chain of calls While it is best to parallelize outermost loops that do a significant amount of the work these tend to be the very loops that involve subprogram calls Fortran Programming Guide May 2000 160 Because such an interprocedural analysis is difficult and could greatly increase compilation time automatic parallelization modes do not attempt it With explicit parallelization the compiler generates parallelized code for a loop marked with a DOALL directive even if it contains calls to subprograms It is still the programmer s responsibility to insure that no data dependencies exist within the loop and all that the loop encloses including called subprograms Multiple invocations of a routine by different threads can cause problems resulting from references to local static variables that interfere with each other Making all the local variables in a routine automatic rather than static prevents this Each invoc
188. situations it may be necessary to explicitly reference each system and user library on the link step with the appropriate Bstatic or Bdynamic as required First use LD_OPTIONS set to Dfiles to obtain a listing of all the libraries needed Then perform the link step with nolib to suppress automatic linking of system libraries and explicit references to the libraries you need For example 77 xarch v9 o cdf nolib cdf o Bstatic 1F77 1M77 lsunmath Bdynamic 1m lc Naming Conventions To conform to the dynamic library naming conventions assumed by the link loader and the compilers assign names to the dynamic libraries that you create with the prefix lib and the suffix so For example 1ibmy favs so could be referenced by the compiler option 1my favs The linker also accepts an optional version number suffix for example libmyfavs so 1 for version one of the library The compiler s hname option records name as the name of the dynamic library being built A Simple Dynamic Library Building a dynamic library requires a compilation of the source files with the pic or PIC option and linker options 6 ztext and hname These linker options are available through the compiler command line Fortran Programming Guide May 2000 58 You can create a dynamic library with the same files used in the static library example Example Compile with pic and other linker options demo 77 o libtestli
189. ssor you may experience a performance loss Gradual underflow can be disabled either by compiling with the fns option or by calling the library routine nonstandard_arithmetic from within the program to turn it off Call standard_arithmetic to turn gradual underflow back on Note To be effective the application s main program must be compiled with fns See the Fortran User s Guide For legacy applications take note that The standard_arithmetic subroutine replaces an earlier routine named gradual_underflow The nonstandard_arithmetic subroutine replaces an earlier routine named abrupt_underflow Note The fns option and the nonstandard_arithmetic library routine are effective only on some SPARC systems On x86 platforms gradual underflow is performed by the hardware IEEE Routines The following interfaces help people use IEEE arithmetic and are described in man pages These are mostly in the math library 1ibsunmath and in several h files ieee_flags 3m Controls rounding direction and rounding precision query exception status clear exception status m ieee_handler 3m Establishes an exception handler routine m ieee_functions 3m Lists name and purpose of each IEEE function m ieee_values 3m Lists functions that return special values Other 1ibm functions described in this section ei __retrospectiv nonstandard_arithmetic standard_arithmetic The SPARC process
190. st significant portability problems The following issues should be noted m Sun adheres to the IEEE Standard 754 for floating point arithmetic Therefore the first four bytes in a REAL 8 are not the same as in a REAL 4 The default sizes for reals integers and logicals are described in the FORTRAN 77 standard except when these default sizes are changed by the xt ypemap option or by 12 dbl or r8 m Character variables can be freely mixed and equivalenced to variables of other types but be careful of potential alignment problems 77 IEEE floating point arithmetic does raise exceptions on overflow or divide by zero but does not signal SIGFPE or trap by default It does deliver IEEE indeterminate forms in cases where exceptions would otherwise be signaled This is explained in the Floating Point Arithmetic chapter of this Guide The extreme finite normalized values can be determined See 1ibm_single 3F and 1ibm_doub1le 3F The indeterminate forms can be written and read using formatted and list directed I O statements Hollerith Data Many dusty deck Fortran applications store Hollerith ASCII data into numerical data objects With the 1977 Fortran standard and Fortran 95 the CHARACTER data type was provided for this purpose and its use is recommended You can still initialize variables with the older Fortran Hollerith nH feature but this is not standard practice The following table indic
191. stinctions on function subprogram names Use one of these two solutions but not both Most examples in this chapter use all lowercase letters for the name in the C function and do not use the 95 77 U compiler option Underscores in Routine Names The Fortran compiler normally appends an underscore _ to the names of subprograms appearing both at entry point definition and in calls This convention differs from C procedures or external variables with the same user assigned name All Fortran library procedure names have double leading underscores to reduce clashes with user assigned subroutine names There are three usual solutions to the underscore problem Inthe C function change the name of the function by appending an underscore to that name m Use the C pragma to tell the Fortran compiler to omit those trailing underscores Use the 77 and 95 ext_names option to compile references to external names without underscores Use only one of these solutions The examples in this chapter could use the 0 compiler pragma to avoid underscores The C pragma directive takes the names of external functions as arguments It specifies that these functions are written in the C language so the Fortran compiler does not append an underscore as it ordinarily does with external names The C directive for a particular function must appear before the first reference to that function It must also appear in eac
192. t C shell oe Bourne shell and Korn shell 5 C shell Bourne shell and Korn shell superuser Related Documentation You can access documentation related to the subject matter of this book in the following ways Through the Internet at the docs sun com Web site You can search for a specific book title or you can browse by subject document collection or product at the following Web site http docs sun com Through the installed Sun WorkShop products on your local system or network Sun WorkShop 6 HTML documents manuals online help man pages component readme files and release notes are available with your installed Sun WorkShop 6 products To access the HTML documentation do one of the following In any Sun WorkShop or Sun WorkShop TeamWare window choose Help lt About Documentation In your Netscape Communicator 4 0 or compatible version browser open the following file opt SUNWspro docs index html Contact your system administrator if your Sun WorkShop software is not installed in the opt directory Your browser displays an index of Sun WorkShop 6 HTML documents To open a document in the index click the document s title TABLE P 3 lists related Sun WorkShop 6 manuals by document collection Related Sun WorkShop 6 Documentation by Document Collection Description Describes the documentation available with this Sun WorkShop release and how to access it Provides
193. t passed by value This is implementation dependent The extra string length arguments appear after the explicit arguments in the call Chapter 11 C Fortran Interface 3 A Fortran call with a character string argument is shown in the next example with C equivalent char s 7 int b 3 cstrng_ s amp b 1 7L its C equivalent TABLE 11 6 Passing a CHARACTER string Fortran call CHARACTER 7 S INTEGER B 3 CALL CSTRNG S B 2 If the length of the string is not needed in the called routine the extra arguments may be ignored However note that Fortran does not automatically terminate strings with the explicit null character that C expects This must be added by the calling program One Dimensional Arrays Array subscripts in C start with 0 TABLE 11 7 Passing a One Dimensional Array C calls Fortran extern void vecref_ int int int i sum int v 9 vecref_ v amp Sum subroutine VecRef v total integer i total v 9 total 0 do 1 9 total total end do Fortran calls C integer i Sum integer a 9 external FixVec call FixVec a Sum int 2 sum 0 for i 0 i lt 8 i sum sum 1 Fortran Programming Guide May 2000 194 195 Two Dimensional Arrays Rows and columns between C and Fortran are switched TABLE 11 8 Passing a Two Dimensional Array C calls Fortran extern void 4
194. time 3F table TABLE 7 1 Fortran Time Functions Function Returns the number of seconds elapsed since January 1 1970 Returns date as a character string Returns the current time and date as a character string Returns the current month day and year in an integer array Returns the current hour minute and second in an integer array Converts the time returned by the time function to a character string Converts the time returned by the time function to the local time Converts the time returned by the time function to Greenwich time Single processor Returns elapsed user and system time for program execution Multiple processors Returns the wall clock time Returns the elapsed user and system time since last call to dt ime Returns date and time in character and numeric form Name time date fdate idate itime ctime ltime gmt ime etime dtime date_and_time For details see Fortran Library Reference Manual or the individual man pages for these functions The routines listed in the following table provide compatibility with VMS Fortran system routines idate and time To use these routines you must include the 1V77 option on the 77 command line in which case you also get these VMS versions instead of the standard 77 versions Fortran Programming Guide May 2000 106 Summary Nonstandard VMS Fortran System Routines 7 2 Name Definition Calling Sequence Arg
195. time to time The versions of the applicable standards to which these compilers conform may be revised or replaced resulting in features in future releases of the Sun Fortran compilers that create incompatibilities with earlier releases Features of the Fortran Compilers Sun Fortran compilers provide the following features or extensions 77 Global program checking across routines for consistency of arguments commons parameters and the like m SPARC only Support for multiprocessor systems including automatic and explicit loop parallelization is integrated tightly with optimization Note Parallelization features of the Fortran compilers require a Sun WorkShop HPC license m 77 Many VAX VMS Fortran 5 0 extensions including NAMELIST DO WHILE Structures records unions maps Variable format expressions Recursion Pointers Double precision complex SPARC Quadruple precision real SPARC Quadruple precision complex Cray style parallelization directives including TASK COMMON with extensions for 95 OpenMP parallelization directives accepted by 95 Global peephole and potential parallelization optimizations produce high performance applications Benchmarks show that optimized applications can run significantly faster when compared to unoptimized code Common calling conventions on Solaris systems permit routines written in C or C to be combined with Fortran programs Sup
196. tine with ieee_handler With exception trapping enabled run the program from dbx or the Sun WorkShop using the dox catch FPE command to see where the error occurs The advantage of recompiling with 56 rap common is that the source code need not be modified to trap the exceptions However by calling 1666 826162 you can be more selective as to which exceptions to look at Chapter 6 Floating Point Arithmetic 7 Example Recompiling with 26 rap common and using dbx demos 77 g ftrap common silent myprogram f demo dbx a out Reading symbolic information for a out Reading symbolic information for rtld usr lib ld so 1 Reading symbolic information for libF77 so 3 Reading symbolic information for libc so 1 Reading symbolic information for 1 1 dbx catch FPE dbx run Running a out process id 19739 signal FPE floating point divide by zero in MAIN at line 212 in file myprogram f 212 2 X Y dbx print Y y 0 0 dbx Cot If you find that the program terminates with overflow and other exceptions you can locate the first overflow specifically by calling ieee_handler to trap just overflows This requires modifying the source code of at least the main program as shown in the following example Fortran Programming Guide May 2000 98 Example Locate an overflow when other exceptions occur demo cat myprog F include f77_floatingpoint h program myprogram ier i _h
197. tine with parallelization and no g Chapter 10 SPARC Parallelization Example Manually transform a loop to allow using dbx in parallel Original code demo cat loop f CSPAR DOALL DO i 1 10 WRITE 0 Iteration i END DO END Split into two parts caller loop and loop body as a subroutine demo cat loopl f CSPAR DOALL DO i 1 10 CALL loop_body k D DO D demo cat loop2 f SUBROUTINE loop_body k WRITE 0 Iteration k RETURN END Compile caller loop with parallelization but no debugging demo 77 03 c explicitpar loop1 f Compile the subprogram with debugging but not parallelized demo 77 c g loop2 f Link together both parts into a out demo f77 loopl o loop2 o explicitpar Run a out under dbx and put breakpoint into loop body subroutine demo dbx a out lt Various dbx messages not shown dbx stop in loop_body 2 stop in loop_body dbx run Running a out process id 28163 dbx stops at breakpoint 661 1 1 stopped in loop_body at line 2 in file 16002 5 2 write 0 Iteration k Now show value of k dbx print k k 1 lt Various values other than 1 are possible dbx 182 Fortran Programming Guide May 2000 C Fortran Interface This chapter treats issues regarding Fortran and C interoperability The discussion is inherently limited to the specifics of the Sun FORTRAN 77 Fortran 95 and C compilers Not
198. tion Chapter 7 Porting 119 120 Fortran Programming Guide May 2000 Performance Profiling This chapter describes how to measure and display program performance Knowing where a program is spending most of its compute cycles and how efficiently it uses system resources is a prerequisite for performance tuning CHAPTER 8 Sun WorkShop Performance Analyzer Sun WorkShop Performance Analyzer provides a sophisticated pair of tools for collecting and analyzing program performance data The Sampling Collector collects performance data statistical profiles of call stacks thread synchronization delay events hardware counter overflow profiles address space data and summary information for the operating system and stores it in an experiment file The Sampling Analyzer displays the data recorded by the Sampling Collector so you can examine the information The analyzer processes the data and displays various metrics of performance at function caller callee source line disassembly instruction and program levels The Sampling Analyzer can also help you to fine tune your application s performance by creating a mapfile you can use to improve the order of function loading in the application address space The Collector and Analyzer are designed for use by any software developer even if performance tuning is not the developer s main responsibility Command line equivalents of the Collector and Analyzer are available
199. tions subroutines variables and labels Usage of unset variables Unreachable statements Implicit type variables Inconsistency of the named common block lengths names and layouts How to Invoke Global Program Checking The 11 5 option on the command line invokes the compiler s global program analyzer There are a number of suboptions as described in the sections that follow Example Compile three files for basic global program checking demo 95 Xlist anyl f any2 f any3 f In the preceding example the compiler m Produces output listings in the file anyl 1st Compiles and links the program if there are no errors Screen Output Normally output listings produced by X1listx are written to a file To display directly to the screen use Xlisto to write the output file to dev tty Fortran Programming Guide May 2000 64 Example Display to terminal demo 77 11860 dev tty anyl f Default Output Features The Xlist option provides a combination of features available for output With no other Xlist options you get the following by default The listing file name is taken from the first input source or object file that appears with the extension replaced by 1st a A line numbered source listing m Error messages embedded in listing for inconsistencies across routines m Cross reference table of the identifiers m Pagination at 66 lines per page and 79 columns per line N
200. tomized manner before the program starts up One possible use of these initialization routines to call set locale for an internationalized Fortran program Because set locale does not work if libc is statically linked only Fortran programs that are dynamically linked with libc should be internationalized The source code for the init routines in the library is void f77_init int argc_ptr char argv_ptr char envp_ptr void f90_init int argc_ptr char argv_ptr Char envp_ptr The routine 77_init is called by 77 main programs The routine 90_init is called by 95 main programs The arguments are set to the address of argc the address of argv and the address of envp Chapter 11 C Fortran Interface 1 Passing Data Arguments by Reference The standard method for passing data between Fortran routines and C procedures is by reference To a C procedure a Fortran subroutine or function call looks like a procedure call with all arguments represented by pointers The only peculiarity is the way Fortran handles character strings and functions as arguments and as the return value from a CHARACTER n function Simple Data Types For simple data types not COMPLEX or CHARACTER strings define or pass each associated argument in the C routine as a pointer C calls Fortran int i 100 float r extern void fsim_ int i fsim_ amp i 6 amp 4 float r subroutine FSim i r integer i real r 1
201. tribute one iteration to each available thread SINGLE is dynamic and equivalent to Sun style SELF 1 CHUNKSIZE 1n Distribute n iterations to each available thread n must be an integer expression For best performance n must be an integer constant CHUNKSIZE n is equivalent to Sun style SELF n Example With 100 iterations and CHUNKSIZE 4 each thread gets 4 iterations at a time NUMCHUNKS m If there are n iterations distribute n m iterations to each available thread There can be one smaller residual chunk m is an integer expression For best performance m must be an integer constant NUMCHUNKS m is equivalent to Sun style SELF n m where n is the total number of iterations Example With 100 iterations and NUMCHUNKS 4 each thread gets 25 iterations at a time For both 77 and 95 the default scheduling type when no scheduling type is specified on a Cray style DOALL directive is the Sun style STATIC for which there is no Cray style equivalent Environment Variables There are three environment variables used with parallelization m PARALLEL m SUNW_MP_THR_IDLE OMP_NUM_THREADS See also the STACKSIZE discussion on page 152 178 Fortran Programming Guide May 2000 PARALLEL and OMP_NUM_THREADS To run a parallelized program in a multithreaded environment you must set either the PARALLEL or OMP_NUM_THREADS environment variable prior to execution This tells the runtime syst
202. tructions in this section To determine if you need to set your MANPATH environment variable Request the workshop man page by typing man workshop Review the output if any If the workshop 1 man page cannot be found or if the man page displayed is not for the current version of the software installed follow the instructions in this section for setting your MANPATH environment variable Note The information in this section assumes that your Sun WorkShop 6 products were installed in the opt directory Contact your system administrator if your Sun WorkShop software is not installed in opt The PATH and MANPATH variables should be set in your home cshrc file if you are using the C shell or in your home profile file if you are using the Bourne or Korn shells To use Sun WorkShop commands add the following to your PATH variable opt SUNWspro bin Fortran Programming Guide May 2000 2 To access Sun WorkShop man pages with the man command add the following to your MANPATH variable opt SUNWspro man For more information about the PATH variable see the csh 1 sh 1 and ksh 1 man pages For more information about the MANPATH variable see the man 1 man page For more information about setting your PATH and MANPATH variables to access this release see the Sun WorkShop 6 Installation Guide or your system administrator How This Book Is Organized Chapter 1 Introduction bri
203. tted data See also TOPEN 3F End of File The end of file condition is reached when an end of file record is encountered during execution of a READ statement The standard states that the file is positioned after the end of file record In real life this means that the tape read head is poised at the beginning of the next file on the tape Although it seems as if you could read the next file on the tape this is not strictly true and is not covered by the ANSI FORTRAN 77 Language Standard The standard also says that a BACKSPACE or REWIND statement can be used to reposition the file Consequently after reaching end of file you can backspace over the end of file record and further manipulate the file for example writing more records at the end rewinding the file and rereading or rewriting it Multifile Tapes The name used to open the tape file determines certain characteristics of the connection such as the recording density and whether the tape is automatically rewound when opened and closed To access a file on a tape with multiple files first use the mt 1 utility to position the tape to the needed file Then open the file as a no rewind magnetic tape such as dev nrmt0 Referencing the tape with this name prevents it from being repositioned when it is closed By reading the file until end of file and then reopening it a program can access the next file on the tape Any program subsequently referencing the same tape ca
204. tten In the serial case the last store is the final value In the parallel case the order is not determined The values of A L that are used old or updated are order dependent Data Dependent Loops You might be able to rewrite a loop to eliminate data dependencies making it parallelizable However extensive restructuring could be needed Some general rules are A loop is data independent only if all iterations write to distinct memory locations Iterations may read from the same locations as long as no one iteration writes to them These are general conditions for parallelization The compilers automatic parallelization analysis considers additional criteria when deciding whether to parallelize a loop However you can use directives to explicitly force loops to be parallelized even loops that contain inhibitors and produce incorrect results Parallel Options and Directives Summary The following table shows the Sun WorkShop 6 77 and 95 compilation options related to parallelization TABLE 10 1 Parallelization Options Option Flag Automatic only autopar Automatic and Reduction autopar reduction Explicit only explicitpar Chapter 10 SPARC Parallelization 149 TABLE 10 1 Parallelization Options Continued Option Flag Automatic and Explicit parallel Automatic and Reduction and Explicit parallel reduction Show which loops are parallelized loopinfo Show warnings with explicit vpara
205. ubroutines You need only name the library when linking the program and those library modules that resolve references in the program are linked and merged into the executable file Specifying Linker Debugging Options Summary information about library usage and loading can be obtained by passing additional options to the linker through the LD_OPTIONS environment variable The compiler calls the linker with these options and others it requires when generating object binary files Using the compiler to call the linker is always recommended over calling the linker directly because many compiler options require specific linker options or library references and linking without these could produce unpredictable results Example Using LD_OPTIONS to create a load map demo setenv LD_OPTIONS Dfiles demos 77 o myprog myprog f Some linker options do have compiler command line equivalents that can appear directly on the 77 or 95 command These include Bx dx G hname Rpath and ztext See the 77 1 and 95 1 man pages or the Fortran User s Guide for details More detailed examples and explanations of linker options and environment variables can be found in the Solaris Linker and Libraries Guide Generating a Load Map The linker m option generates a load map that displays library linking information The routines linked during the building of the executable binary program are listed together with the lib
206. ultiprocessor interpretation 122 time functions 105 summarized 106 VMS routines 106 IME VMS routine 61 timing program execution 122 OPEN library routines 31 transporting See porting trapping exceptions with ftrap mode 83 troubleshooting program fails 119 results not close enough 118 type checking across routines Xlist 63 U U option upper lower case 186 undeclared variables u option 77 underflow abrupt 85 floating point arithmetic 83 gradual IEEE 84 100 simple 100 with reduction operations 157 underscore in external names 187 uninitialized variables 115 unit logical unit attached at runtime 24 preconnected units 21 unroll option 139 unused functions subroutines variables labels Xlist 64 uppercase external names 186 utilities 13 V V option 78 VAL pass by value 188 variables aliased 115 private and shared 160 176 212 Fortran Programming Guide May 2000
207. ument Type idate Date as day month year call idate d m y integer time Current time as hhmmss call time t character 8 Note The dat e 3F routine and the VMS version of idat e 3F cannot be Year 2000 safe because they return 2 digit values for the year Programs that compute time duration by subtracting dates returned by these routines will compute erroneous results after December 31 1999 The Fortran 95 routine date_and_t ime 3F is available for both FORTRAN 77 and Fortran 95 programs and should be used instead See the Fortran Library Reference Manual for details The error condition subroutine errsns is not provided because it is totally specific to the VMS operating system Chapter 7 Porting 107 Here is a simple example of the use of these time functions Test Tim f common myclock mytime common myclock mytime integer mytime time newtim newtime time wallclock newtim mytim mytime newtime return end integer wallclock elapsed timearray 2 greeting sleep 4 takes in seconds call system sleep 4 elapsed wallclock print Elapsed time for sleep 4 was elapsed seconds 0 now test the cpu time for some trivial computing timediff dtime timearray q 0 01 do 30 i 1 0 q atan q 30 continue timediff dtime timearray print atan q 1000 times took timediff seconds end Time Now Is real dtime timediff c print a heading ca
208. undeclared variable is accompanied by an error message Chapter 5 Program Analysis and Debugging 77 Version Checking V The option causes the name and version ID of each phase of the compiler to be displayed This option can be useful in tracking the origin of ambiguous error messages and in reporting compiler failures and to verify the level of installed compiler patches Interactive Debugging With dbx and Sun WorkShop The Sun WorkShop provides a tightly integrated development environment for building and browsing as well as debugging applications written in Fortran C and C The Sun WorkShop debugging facility is a window based interface to dbx while dbx itself is an interactive line oriented source level symbolic debugger Either can be used to determine where a program crashed to view or trace the values of variables and expressions in a running code and to set breakpoints Sun WorkShop adds a sophisticated graphical environment to the debugging process that is integrated with tools for editing building and source code version control It includes a data visualization capability to display and explore large and complex datasets simulate results and interactively steer computations For details see the Sun manual Debugging a Program With Sun WorkShop and the dbx 1 man pages The dbx program provides event management process control and data inspection You can watch what is happening during program exec
209. units where the block is defined A check at runtime for task common consistency can be enabled by compiling the program with the xcommonchk yes flag Enable the runtime check only during program development as it can degrade performance DOALL Directive The DOALL directive requests the compiler to generate parallel code for the one DO loop immediately following it if compiled with the parallel or explicitpar options Note Analysis and transformation of reduction operations is not performed within explicitly parallelized loops Example Explicit parallelization of a loop demo cat t4 f CSPAR DOALL do i 1 n a i b i c i end do do k 1 m x k x k z k k end do demo 95 explicitpar t4 f Chapter 10 SPARC Parallelization 167 DOALL Qualifiers All qualifiers on the Sun style DOALL directive are optional The following table summarizes them TABLE 10 5 DOALL Qualifiers Qualifier Assertion Syntax PRIVATE Do not share variables u1 DOALL PRIVATE ul u2 between iterations SHARED Share variables v1 v2 between DOALL SHARED v1 02 iterations MAXCPUS Use no more than n CPUs threads DOALL MAXCPUS READONLY The listed variables are not modified DOALL READONLY v1 02 in the DOALL loop STOREBACK Save the last DO iteration values of DOALL STOREBACK v1 02 variables v1 SAVELAST Save the last DO iteratio
210. use parallel processing typically involves relatively large loop overhead parallelizing the outermost loop minimizes the overhead and maximizes the work done for each thread Under automatic parallelization the compilers start their loop analysis from the outermost loop in a nest and work inward until a parallelizable loop is found Once a loop within the nest is parallelized loops contained within the parallel loop are passed over Automatic Parallelization With Reduction Operations A computation that transforms an array into a scalar is called a reduction operation Typical reduction operations are the sum or product of the elements of a vector Reduction operations violate the criterion that calculations within a loop not change a scalar variable in a cumulative way across iterations Example Reduction summation of the elements of a vector However for some operations if reduction is the only factor that prevents parallelization it is still possible to parallelize the loop Common reduction operations occur so frequently that the compilers are capable of recognizing and parallelizing them as special cases Recognition of reduction operations is not included in the automatic parallelization analysis unless the reduction compiler option is specified along with autopar or parallel If a parallelizable loop contains one of the reduction operations listed in TABLE 10 3 the compiler will parallelize it if reduction is specified
211. ut CROSS REFER Source file Repeat f Legend Definition Declaration Simple use Modified occurrence Actual argument Subroutine Function ca 5 lt 3 Occurrence in EQUIVALE Occurrence in NAMELIST PROGRAM FOR repeat lt repeat gt Functions and Subroutines fork int 4 lt nwfrk gt int intrinsic lt prnok gt loc intrinsic lt repeat gt lt prnok gt nwfrk lt repeat gt lt nwfrk gt prnok int 4 lt nwfrk gt lt prnok gt real intrinsic lt repeat gt sleep lt unreach_sub gt subrl lt repeat gt lt subrl gt unreach_sub lt unreach_sub gt 68 Fortran Programming Guide May 2000 69 Output from compiling 77 Xlist Repeat 5 Continued D 17 A M 3 A 4 A 2A D OU 10 U 1D 21 A AlseU 1 Library subprograms 2 3 Functions 1 Variables and Arrays ix int 4 dummy lt nwfrk gt DA 14 1 real 4 lt repeat gt UMA rpl real 4 lt repeat gt A 2 x real 4 dummy lt subrl gt DU lt prnok gt DUA 20 Date Tue Feb 22 13 15 39 1995 Files 2 Sources 1 libraries Lines 26 Sources 26 Routines 5 MAIN 1 Subroutines Messages 5 Errors 3 Warnings 2 demo In the cross reference table in the preceding example ix is a 4 byte integer a Used as an argument in the routine nwfrk a At line 14 used as a declaration of argument a At line 17 used as an actual argument pni is a 4 byte real in the rout
212. utation Guide Avoiding Simple Underflow Some applications actually do a lot of computation very near zero This is common in algorithms computing residuals or differential corrections For maximum numerically safe performance perform the key computations in extended precision arithmetic If the application is a single precision application you can perform key computations in double precision Fortran Programming Guide May 2000 100 Example A simple dot product computation in single precision If a i and b i are very small many underflows occur By forcing the computation to double precision you compute the dot product with greater accuracy and do not suffer underflows DOUBLE PRECISION sum DO 1 1 n sum sum dble a i dble b i END DO result sum On SPARC platforms You can force a SPARC processor to behave like an older system with respect to underflow Store Zero by adding a call to the library routine nonstandard_arithmetic or by compiling the application s main program with the fns option Continuing With the Wrong Answer You might wonder why you would continue a computation if the answer is clearly wrong IEEE arithmetic allows you to make distinctions about what kind of wrong answers can be ignored such as NaN or Inf Then decisions can be made based on such distinctions For an example consider a circuit simulation The only variable of interest for the sake of argument
213. ution and perform the following tasks Fix one routine then continue executing without recompiling the others Set watchpoints to stop or trace if a specified item changes Collect data for performance tuning Graphically monitor variables structures and arrays Set breakpoints set places to halt in the program at lines or in functions Show values once halted show or modify variables arrays structures Step through a program one source or assembly line at a time Trace program flow show sequence of calls taken Invoke procedures in the program being debugged Step over or into function calls step up and out of a function call Run stop and continue execution at the next line or at some other line a a a a a a a a a a a m Save and then replay all or part of a debugging run Fortran Programming Guide May 2000 78 Examine the call stack or move up and down the call stack Program scripts in the embedded Korn shell m Follow programs as they fork 2 and exec 2 To debug optimized programs use the dbx fix command to recompile the routines you want to debug Compile the program with the appropriate On optimization level Start the execution under dbx Use fix g any f without optimization on the routine you want to debug Use continue with that routine compiled Some optimizations will be inhibited by the presence of on the compilation command See the dbx documentation for details 77
214. vailable with 77 Chapter5 Program Analysis and Debugging 71 77 11860 Show call graphs and cross routine errors Used alone 11 560 does not show a listing or cross reference It produces the call graph in a tree form using printable characters If some subroutines are not called from MAIN more than one graph is shown Each BLOCKDATA is printed separately with no connection to MAIN The default is not to show the call graph XlistE Show cross routine errors Used alone XlistE shows only cross routine errors and does not show a listing or a cross reference Xlisterr nnn Suppress error nnn Use Xlisterr to suppress a numbered error message from the listing or cross reference For example Xlisterr338 suppresses error message 338 If nnn is not specified all error messages are suppressed To suppress additional specific errors use this option repeatedly Xlistf Produce faster output Use Xlistf to produce source file listings and a cross checking report and to verify sources but without generating object files The default without this option is to generate object files 77 Xlistflndir Put f1n files into dir directory Use Xlistfln to specify the directory to receive fln source analysis files The directory specified dir must already exist The default is to include the source analysis information directly within the object o files and not generate f1n files
215. w of automatic parallelization are needed m An array is a variable that is declared with at least one dimension A scalar is a variable that is not an array A pure scalar is a scalar variable that is not aliased not referenced in an EQUIVALENCE or POINTER statement Example Array scalar dimension 8 10 real m 100 10 Sp u x Z equivalence u 2 pointer px s 0 0 Both m and a are array variables s is pure scalar The variables u x z and px are scalar variables but not pure scalars Automatic Parallelization Criteria DO loops that have no cross iteration data dependencies are automatically parallelized by autopar or 08281 161 The general criteria for automatic parallelization are Only explicit DO loops and implicit loops such as IF loops and Fortran 95 array syntax are parallelization candidates The values of array variables for each iteration of the loop must not depend on the values of array variables for any other iteration of the loop Fortran Programming Guide May 2000 154 Calculations within the loop must not conditionally change any pure scalar variable that is referenced after the loop terminates Calculations within the loop must not change a scalar variable across iterations This is called a loop carried dependence The amount of work within the body of the loop must outweigh the overhead of parallelization 77 Appare
216. warning issued if a i gt min_threshold go to 20 end do 20 continue Example A variable in a loop has a loop carried dependency CSPAR DOALL do 100 i 1 200 Parallelized with warning 1 y hasa loop carried dependency 8 1 y 100 continue I O With Explicit Parallelization You can do I O in a loop that executes in parallel provided that m It does not matter that the output from different threads is interleaved program 163 output is nondeterministic m You can ensure the safety of executing the loop in parallel Chapter 10 SPARC Parallelization Example I O statement in loop CSPAR DOALL do iy la 10 Parallelized with no warning not advisable k 1 call show k end do end subroutine show j write 6 1 j 1 format Line number i3 end demo 95 explicitpar vpara t13 f demo setenv PARALLEL 2 demos a out The output displays the numbers 1 through 10 but in a non deterministic order Example Recursive I O do ai 1 10 lt Parallelized with no warning unsafe 6 1 print 1185 k gt list is a function that does I O end do end function list j write 6 Line number 13 3 1350 9 end demo 95 mt t14 f demo setenv PARALLEL 2 demo a out In the example the program may deadlock in 1ibF77_mt and hang Press Control C to regain keyboard control There are situations where the programmer
217. xcache and xchip that properly matches that system The optimizer uses these specifications to determine strategies to follow and instructions to generate The special setting xtarget native enables the optimizer to compile code targeted at the host system the system doing the compilation This is obviously useful when compilation and execution are done on the same system When the execution system is not known it is desirable to compile for a generic architecture Therefore xtarget generic is the default even though it might produce suboptimal performance Other Performance Strategies Assuming that you have experimented with using a variety of optimization options compiling your program and measuring actual runtime performance the next step might be to look closely at the Fortran source program to see what further tuning can be tried Focusing on just those parts of the program that use most of the compute time you might consider the following strategies Replace handwritten procedures with calls to equivalent optimized libraries Remove I O calls and unnecessary conditional operations from key loops Eliminate aliasing that might inhibit optimization Rationalize tangled spaghetti like code to use block IF These are some of the good programming practices that tend to lead to better performance It is possible to go further hand tuning the source code for a specific hardware configuration However these attempts m
218. xception character invalid division overflow underflow or inexact handler Function name The name of the user handler function or SIGFPE_DEFAULT SIGFPE_IGNORE or SIGFPE_ABORT Return value integer 0 OK A Fortran 77 routine compiled with 77 that calls ieee_handler should also declare include f77_floatingpoint h For 95 programs declare include floatingpoint h The special arguments SIGFPE_DEFAULT SIGFPE_IGNORE and SIGFPE_ABORT are defined in these include files and can be used to change the behavior of the program for a specific exception SIGFPE_DEFAULT or No action taken when the specified exception occurs SIGFPE_IGNORE SIGFPE_ABORT Program aborts possibly with dump file on exception Writing User Exception Handler Functions The actions your exception handler takes are up to you However the routine must be an integer function with three arguments specified as shown handler_name sig sip uap handler_name is the name of the integer function sig is an integer sip is a record that has the structure siginfo uap is not used Fortran Programming Guide May 2000 92 Example An exception handler function INTEGER FUNCTION hand sig sip uap INTEGER sig location STRUCTURE fault INTEGER address EGER trapno END STRUCTURE STRUCTURE siginfo EGER si_signo EGER si_code EGER si_errno RECORD fault fault END
219. y LOGICALNAMEMAPP ING the file name is used unchanged Direct I O Direct or random I O allows you to access a file directly by record number Record numbers are assigned when a record is written Unlike sequential I O direct I O records can be read and written in any order However in a direct access file all records must be the same fixed length Direct access files are declared with the ACCESS DIRECT specifier on the OPEN statement for the file A logical record in a direct access file is a string of bytes of a length specified by the OPEN statement s RECL specifier READ and WRITE statements must not specify logical records larger than the defined record size Record sizes are specified in bytes Shorter records are allowed Unformatted direct writes leave the unfilled part of the record undefined Formatted direct writes cause the unfilled record to be padded with blanks Direct access READ and WRITE statements have an extra argument REC n to specify the record number to be read or written Example Direct access unformatted OPEN 2 FILE data db ACCESS DIRECT RECL 200 amp FORM UNFORMATTED ERR 90 READ 2 REC 13 ERR 30 Chapter 2 Fortran Input Output 27 This program opens a file for direct access unformatted I O with a fixed record length of 200 bytes then reads the thirteenth record into
220. yboard demo Now no error Fortran Programming Guide May 2000 76 Special Compiler Options Some compiler options are useful for debugging They check subscripts spot undeclared variables show stages of the compile link sequence display versions of software and so on The Solaris linker has additional debugging aids See 1d 1 or run the command ld Dhelp ata shell prompt to see the online documentation Subscript Bounds 60 The 6 option adds checks for out of bounds array subscripts If you compile with 6 the compiler adds checks at runtime for out of bounds references on each array subscript This action helps catch some situations that cause segmentation faults Example Index out of range demo cat indrange f REAL a 10 10 k 11 a k 2 1 0 END demo 77 C silent indrange f demo a out Subscript out of range on file indrange f line 3 procedure MAIN Subscript number 1 has value 11 in array a Abort core dumped demo Undeclared Variable Types u The u option checks for any undeclared variables The u option causes all variables to be initially identified as undeclared so that all variables that are not explicitly declared by type statements or by an IMPLICIT statement are flagged with an error The u flag is useful for discovering mistyped variables If u is set all variables are treated as undeclared until explicitly declared Use of an
221. yprog f input data avg std 0 0 0 N DA N 6 blk size 500548 503116 503116 503116 total 00 00 00 00 00 00 00 00 00 00 5 BBB Wa std dev 5 5 io 1 63 3 266 4 276 4 276 4 276 4 276 demo myprog output from program demo cat myprog io_stats INPUT REPORT 1 unit 2 file name ent 0 stderr 0 0 5 stdin 2 1 6 stdout 0 0 19 fort 19 8 4 20 fort 20 8 4 21 Fork 21 8 4 22 fort 22 8 4 OUTPUT REPORT 1 unit 5 output data ent total avg 0 4 40 10 1 40 40 5 0 0 0 0 0 0 6 26 248 9 538 6 248 41 33 19 8 48 6 4 48 12 20 8 48 6 4 48 T2 21 8 48 6 4 48 12 22 8 48 6 4 48 12 Fortran Programming Guide May 2000 132 Each pair of lines in the report displays information about an I O unit One section shows input operations and another shows output The first line of a pair displays statistics on the number of data elements transferred before the unit was closed The second row of statistics is based on the number of I O statements processed In the example there were 6 calls to write a total of 26 data elements to standard output A total of 248 bytes was transferred The display also shows the average and standard deviation in bytes transferred per I O statement 9 538 and 1 63 respectively and the average and standard deviation per I O statement call 42 33 and 3 266 respective
222. zation The executing program maintains a main memory stack for the initial thread executing the program as well as distinct stacks for each helper thread Stacks are temporary memory address spaces used to hold arguments and AUTOMATIC variables over subprogram invocations The default size of the main stack is about 8 megabytes The Fortran compilers normally allocate local variables and arrays as STATIC not on the stack However the stackvar option forces the allocation of all local variables and arrays on the stack as if they were AUTOMATIC variables Use of stackvar is recommended with parallelization because it improves the optimizer s ability to parallelize subprogram calls in loops stackvar is required with explicitly parallelized loops containing subprogram calls See the discussion of stackvar in the Fortran User s Guide Using the C shell csh the Limit command displays the current main stack size as well as sets it demo limit C shell example cputime unlimited filesize unlimited datasize 2097148 kbytes stacksize 8192 kbytes lt current main stack size coredumpsize 0 8 descriptors 64 memorysize unlimited demo limit stacksize 65536 lt set main stack to 64Mb demo limit stacksize stacksize 65536 kbytes With Bourne or Korn shells the corresponding command is ulimit demo ulimit a Korn Shell example time seconds unlimited file blocks unlimited data kbytes 2097148 stack kbytes

Fortran Programming Guide

Contents

Download Pdf Manuals

Related Search

Related Contents