Home

NS3D v2.14: user's manual

1. TEETHETE ETHER HETETEHERERETEHIESRE ETETEHERETETEHE RE ETHER CSSS BH it it dt NS3D Makefile it it it TEETHEIEETHETE HETETEHERERETEIERE ETETEHERETETEHE NE ETEHENE HEBR BH This Makefile uses the GNU Fortran compiler 4 10 the JMFFT 8 0 library This Makefile should work on most Linux systems Start of system configuration section command to call the Fortran 90 95 compiler F90C gfortran options of the compiler d optimisation 03 Intel GNU real to double automatic conversion r8 Intel fdefault real 8 fdefault double 8 GNU extended adddressable memory space mcmodel medium Intel GNU F9OC_FLAGS 03 fdefault real 8 fdefault double 8 pre processor flags flag to call the preprocessor fpp Intel cpp GNU definition of macros choice of FFT testing for NaN etc PREPROC_FLAGS cpp DJMFFT if necessary linking flags for the Fast Fourier Transform FFT library that is used 18 Chapter 2 Compilation and execution FFT_LIB End of system configuration section FILES MPI Times F90 timing F90 fft F90 data parser F90 global vars F9 0 subfunctions F90 input F90 output F90 gen velocity F90 time scheme F90 main F90 ns3d extended config h config h FILES F90C F90C FLAGS PREPROC FLAGS FILES o 0 FFT LIB clean rm f o mod ns3d The options to edit in this Makefile are the following
2. 29 10 EDD Input and output binary files 3 1 General remarks on binary files in FORTRAN 3 1 1 Headers and trailers of records 3 1 2 Array storage order S123 Endian ord r 222a sace ba moe RRO Eds 3 2 Output velocity state files velo rho vort t xxxx xxx 3 3 Initial velocity state velo type 3 3 1 Reading velocity init 44 2 Internal subrotiting oea es soe 6465 bee han O0 O0 J O ct a 10 13 13 14 15 15 16 17 19 19 22 22 CONTENTS 3 9 White noise 2 eccess oro m 3 4 Base state base2D type 3 4 1 Reading base2D init 3 4 2 Internal subroutine 4 MPI parallel run 4 1 Running a MPI run quick start 4 2 Details of the MPI implementation 4 2 1 Data distribution and transposed FFT 4 2 2 FORTRAN array indexes 5 Performances 5 1 Memory oals 22 omo sa du 5 11 Memory usage 5 1 2 Memory location 52 Speed 2 24 bse kod se be ee beac D ed 5 21 Time consuming steps 5 2 2 FFT libraries performances 5 2 3 MPlIspeed up 6 Frequently asked questions FAQ Bibliography Introduction NS3D is a direct numerical simulation DNS code that integrates the incom pressible Navier Stokes Three Dimensional NS3D equations Its main speci fications
3. O vby vbz wbx wby wbz to avoid any compilation warning stating that the variable work is not used work 0 0 0 0 0 end if gen_velocity F90 subroutines gen_velo_tanh_yz etc You must create specific yz version of the required subroutines For instance G OO RA I I RI k kk kk 2k 2k kk kk 2k subroutine gen_velo_tanh_yz vx vy vz wx wy wz L EEEE EEEE EE EEEE EE err errererr errr reer rer kk kk kk kk implicit none double precision dimension 0 dy 1 0 dz 1 intent out vx vy vz wx wy wz integer iy VE ES M iz WX wy WZ T O O OnO G 54 Chapter 6 Frequently asked questions FAQ do 1ys0 ny i vy iy tanh zz 1z 2 D0 wx iy 1 D0O tanh zz 1z 2 D0 2 end do end subroutine Bibliography BILLANT P amp CHOMAZ J M 2000 Three dimensional stability of a vertical columnar vortex pair in a stratified fluid J Fluid Mech 419 65 91 DELONCLE A BILLANT P amp CHOMAZ J M 2008 Nonlinear evolution of the zigzag instability in stratified fluids a shortcut on the route to dissipa tion Journal of Fluid Mechanics 599 229 239 DELONCLE A CHOMAZ J M amp BILLANT P 2007 Three dimensional stability of a horizontally sheared flow in a stably stratified fluid Journal of Fluid Mechanics 570 297 305 Frico M amp JOHNSON S G 2005 The design and implementation of FFTWS3 Proceedings of the IEEE 93 2 216 231 special
4. e F90C command to call the FORTRAN 90 95 compiler Classical options are gfortran GNU ifort Intel x1 90 IBM XL Fortran 90 etc e F90C FLAG compiler flags optimizations flags are strongly advised to speed up the execution of the code A classical option valid on most compilers is 03 GNU Intel etc t is also advised to set a flag enforcing REAL 4 bytes variables to be converted into DOUBLE PRECISION 8 bytes variables This is not mandatory as NS3D already only uses DOUBLE PRE CISION variables and constants However if the source code is modified without precaution it can avoid numerical accuracy mis takes The corresponding flags are for instance fdefault real 8 fdefault double 8 GNU or r8 Intel For large simulations it may also be necessary to extend the ad dressable data memory The corresponding flag is for instance mcmodel medium Intel See 85 1 2 for more details e PREPROC FLAGS preprocessor flags T he flag to call the preprocessor must be defined if the preprocessor is not called automatically for instance cpp GNU or pp Intel The FFT library to use see 2 3 1 should also be set here DJMFFT DFFTW etc e FFT LIB if necessary the flags to link the third party FFT library should be defined here The JMFFT library is directly compiled from its em bedded source code so that no linking is necessary in this example For instance the classical flag to link FFT
5. it 2000 time 1000 000 cpu time in sec elapsed remaining 78 0 mean quadratic velocity 4 0724780826732005E 139 growthrate 0 18957510829664204 2 5 Test case 25 name calls cpu elapsed cpu elapsed main 1 98 93 99 06 100 0 100 0 initialization 1 0 03 0 03 0 0 0 0 time_stepping 2000 98 89 99 02 100 0 100 0 time_scheme 2000 97 86 98 02 99 0 99 0 nonlin term 8000 91 06 91 14 93 0 93 0 curl 8000 25 97 2 01 2 2 2 2 fft 72000 83 20 83 31 91 4 91 4 vect prod 8000 3 44 3 43 3 8 3 8 projection 8000 2 40 2 36 2 6 2 6 others 0 06 0 03 0 1 0 0 others 6 81 6 88 7 0 7 0 projection 2000 0 60 0 59 0 6 0 6 de aliasing 2000 0 39 0 38 0 4 0 4 outputi 5 0 00 0 00 0 0 0 0 output2 2 0 02 0 02 0 0 0 0 others 0 01 0 00 0 0 0 0 others 0 01 0 01 0 0 0 0 TIMING REPORT FLAT in sec Time spent in the subroutine but NOT in the nested subroutines name calls cpu elapsed Acpu elapsed fft 72000 83 20 83 31 84 1 84 1 time scheme 2000 6 81 6 88 6 9 6 29 vect prod 8000 3 44 3 43 3 5 35 projection 10000 3 00 2 96 3 0 3 0 curl 8000 1 97 2 01 2 0 2 0 time stepping 2000 1 03 1 00 1 0 1 0 de aliasing 2000 0 39 0 38 0 4 0 4 nonlin term 8000 0 06 0 03 Om 0 0 initialization 1 0 03 0 03 0 0 0 0 output2 2 0 02 0 02 0 0 0 0 main 1 0 01 0 01 0 0 0 0 outputi 5 0 00 0 00 0 0 0 0 Chapter 3 Input and output binary files In this chapter we present the format of the binary files used in NS3D e output velocity state files velo
6. 1 6c 1 1 Governing equations 7 1 1 3 Spectral form of the governing equations We apply three dimensional Fourier transforms to the terms of the equa tions 1 1 for example 1 La pLy pLz kr ky kz t v i u x y z t e Weerthuutk 2 dadydz 1 7 where the hat denotes the Fourier transform i the imaginary unit and kz ky and k are the components of the total wavenumber k kz ky kz In spectral space the governing equations 1 1 are replaced by o UT TE arm P k u x w Whe x be vk 1 8a ob 2 V quy ik bu k b 1 Ai ik bu N Sc b 1 8b The tensor P k with Cartesian components Pi dij kik k designates the projection operator on the space of solenoidal fields so as to enforce the incompressibility condition k 0 The viscous and diffusive terms are integrated exactly This leads to the equations actually integrated in time by NS3D vk t a P9 ao 20e x bez V 1 92 jk oe ik bu Nal esc Ft 1 9b One may note that that the pressure field p does not appear in the spectral form 1 9 of the governing equations The pressure field is not solved by NS3D and should be deduced from u b if necessary The generalisation to the perturbative cases is straightforward and will not be detailed here 8 Chapter 1 Governing equations and numerical method 1 2 Numerical method 1 2 1 Spatial discretisati
7. in 81 2 3 We indicate the operation for each step The timing was done for a run of a simulation of size N 256 x 256 x 256 with the FFT library FFTW 3 2 run on a single 3 6GHz Intel Xeon processor The elapsed times were determined with the Fortran function system_clock 5 2 Speed 4T 1500 1000 8 Ez a E d a A 500r 0 S DA e C SS S SO E xe gs Q AS NS US os Figure 5 2 Comparison of the performances of different FFT libraries for one Real to Complex FFT of size N 256 x 256 x 256 performed on a single 3 6GHz Intel Xeon processor The performance is given in pseudo Mflop s defined as BEEN eS Higher is better 5 2 3 MPI speed up We present MPI parallelisation speed up obtained in 2007 on two different parallel computers e Tournesol SGI Altix 450 cluster based at LadHyX 16 x 1 6GHz dual core Intel Itanium processors 32 cores 128 Go of shared memory e Zahir IBM Regatta cluster based at IDRIS 1024 x 1 3 1 7GHz IBM Power4 processors 3136 Go of distributed shared memory Figure 5 3 shows the speed up S obtained for the NS3D code on Tournesol and Zahir for a number of processors from 1 to 256 with the MPI parallelisation of NS3D The speed up remains excellent even for a large number of processors we obtain S 18 04 for 28 processors on Tournesol and S 176 41 for 256 processors on Zahir Therefore MPI parallelism allows to run large simulations wit
8. operations whereas all the other steps involved in the pseudo spectral algorithm need only O N operations This explains why the FFTs of a pseudo spectral code are the most time consuming and become increasingly critical for large simulations Consequently most of the optimisation should focus on the FFT implementation 5 2 2 FFT libraries performances Figure 5 2 shows the speed of a few FFT libraries for a FFT of size N 256 x 256 x 256 performed on a single 3 6GHz Intel Xeon processor The different libraries have extremely different speeds on this example the highly optimized FFTW 3 2 library is 13 times faster than the naive Numerical Recipes library 1082 Mflops compared to 82 Mflops As already emphasized the choice of the FFT library is critical for the overall performances of the pseudo spectral algorithm It is thus important to determine the fastest or at least a reasonably fast library on each computer for a given dimension N After benchmarking several libraries FFTW 3 2 JMFFT 8 0 Temperton Numerical Recipes F77 ESSL MathKeisan Intel MKL we found that FFTW 3 2 was always the fastest or almost library on scalar processors x86 IBM Power and that the MathKeisan library was the fastest on the NEC SX vector processors 46 Chapter 5 Performances 50 evaluation of the nonlinear terms 40r time EN Figure 5 1 Percentage of time spent at each step of the pseudo spectral algo rithm described
9. precision velocity values u r yj z on the collocation points b xi yj Zk double precision buoyancy values b zj yj zi on the collo cation points Wx Wy Wz Xi j Zx double precision vorticity values w zi yj 2x on the collocation points ug uy u2 zi yj double precision velocity base state u xi yj on the collocation points wy Wy W2 Xi yj double precision vorticity base state wb xi yj on the collocation points Almost all the simulation parameters are enclosed in the output file along the main fields u b w i Yj z This ensures the traceability of the results and ease post processing The format of the output velocity state file is identical to the one of the initial velocity state file Thus it is possible to use an output state file as an initial state file in order to resume a simulation see 3 3 3 3 Initial velocity state velo type 31 3 3 Initial velocity state velo type A simulation can be initialized with a three dimensional state flow u b i Yj Zk to either read from a file or internally generated by a subroutine Moreover an additional white noise can be added to the initial state 3 3 1 Reading velocity init The initial state flow can be read from an external file To select this option velo type in data in must be set to file The data describing the initial state flow must be stored in a binary data file called velocity init T
10. rnd_min_distance_____________ d rnd_mean_gamma_______________ 6 28 HN ANSE CE EAN SSSR rnd_mean_radius rnd_std_radius ey gt 2 4 2 Running the executable ns3d The files data in and if necessary velocity init and base2D init must be present in the same folder than the executable ns3d To run the executable type the following command line The code will execute gt ns3d 2 5 Test case The parameter files config h and data in presented above in 88 2 3 2 and 2 4 1 correspond to the linear stability study of a horizontal flow sheared horizontally the hyperbolic tangent velocity profile in homogeneous quasi inviscid fluid u y u y e tanh v ex 2 1 For the interested readers a more complete study of the stability of this flow can be found in Michalke 1964 and Deloncle et al 2007 2 5 Test case 23 This test case is a convenient way to quickly check whether the code was correctly compiled and run by studying the growth rate of the most unstable mode We present below the screen output of this simulation the growth rate converges towards o 0 189 The total simulation time was about 100 seconds on an Intel Xeon 2 13GHz processor THEHHIEIERHRHHIERERHEHHIBIERRE HAE HER PROGRAM NS3D version 2 14 HR HER HER HER DIMENSIONS OF THE SIMULATION 64 x 256 x 3 db ERE 200 00 db DEUS OU ONU 60 000 qz E D nee a er ie een ne 12 566 oce PP sie nl MA ee ne Do 0 50000 pepi
11. I 16 DFFTW MPI if necessary linking flags for the Fast Fourier Transform FFT library that is used FET LIB 1fftw3 Im Impi End of system configuration section 4 2 Details of the MPI implementation 37 FILES MPI_Times F90 timing F90 fft F90 data_parser F90 global_vars F90 subfunctions F90 input F90 output F90 gen_velocity F90 time_scheme F90 main F90 ns3d extended_config h config h FILES F90C F90C FLAGS PREPROC FLAGS FILES o FFT LIB clean rm f o mod ns3d 4 2 Details of the MPI implementation 4 2 1 Data distribution and transposed FFT The most important concept to understand in using MPI is the data distribu tion In MPI there is no concept of global address space and each process has its own memory as shown in figure 4 1 cpu 1 cpu 2 cpu 3 cpu 4 memory memory memory memory 1 2 3 4 l l memory bus Figure 4 1 A distributed memory architecture each process has its own mem ory In MPI the data structure is split up and resides as slices in the local memory of each task All the tasks work concurrently and exchange data through communications by sending and receiving messages as illustrated in figure 4 2 Compared to a global address space the implementation is more complex because we have to define explicitly the distribution of the whole data among the processes as well as each data communication We outline here the implementation of the
12. Intel or fconvert big endian GNU Fortran in the Makefile at the compilation step 3 2 Output velocity state files velo rho vort t xxxx xxx 29 3 2 Output velocity state files velo rho vort t xxxx xxx In this section we present the format of the velocity state files generated by the subroutine output2 of NS3D This subroutine is automatically called at the beginning and the end of a run in order to save the initial and final state respectively It is also possible to generate intermediate output state files at time steps specified in the section Output of data in see 82 4 1 KK Output xx output1_period___ 400 output2_period___ 1000 OU uit SIDON EE 0 These output files have a name of the form velo rho vort t xxxx xxx and their format is the following hi ho hs hs hs ha ha hs ha ha ha 1 Nx Ny N2 Ly Ly Lz dt dealiasing trunc type rtrunc_x rtrunc_y rtrunc_z nu stratified xns schmidt omega2 perturbative linear time hp co ae sUx Xw 1 Yuy 1 Zn 1 hs Uy Xo yo Zo uy Xw i yw 1 Zw 1i ha Uz Xo yo Zo uz Xw i yw 1 Zw 1i ha b xo Yo Zo D Xn 1 Yn a ha Wx Xo Yo Zo gt Wx Xu ERE ha Wy Xo Yo Zo Wy Xn 1 Yn 1 Zu 1 hs Wz Xo Yo Zo Wa Xw i yw 1 Zw 1i ha u xo yo ub 1 Ya 1 uy xo yo We Xo Yo Ts SW Xn 1 Yn 1 W3 Xo yo dis S Uy Q
13. MPI parallelisation of the NS3D code for a simulation of total size N Nz x Ny x Nz In the following we denote N N 2 1 Ni N and N N the total number of spectral modes that are effectively stored in NS3D We recall that only half of the k modes are stored see 8 1 2 1 38 Chapter 4 MPI parallel run Figure 4 2 The Message Passing Interface MPI paradigm all the tasks run concurrently and exchange data through different types of communications If we consider a simulation running on a computer with p processes the data in physical space are stored in the natural order x y z where the star indicates that the z direction is distributed among the p processes It means that each process has a data slice of size N x Ny x N p We recall here that FORTRAN stores data in column major order meaning that contiguous elements in memory correspond to the first dimension of an array ccessing array elements that are contiguous in memory is much faster than accessing elements which are not due to caching This is important when implementing the FFT algorithm in parallel We describe below briefly the main steps of a forward three dimensional Real to Complex FFT 1 Each process performs a sequence of N x N p one dimensional Real to Complex FFTs of size Nz along the local x direction At this step the array ends in the order k y 2 with k indicating that the first dimension has been switched into spectral sp
14. NS3D v2 14 user s manual LadHyX Ecole Polytechnique Palaiseau France Manual v1 01 10 06 2014 Author Axel DELONCLE axel deloncle ladhyx polytechnique fr Cover illustration direct numerical simulation of the zigzag instability of a pair of counter rotating vertical vortices in a stratified fluid by Deloncle et al 2008 This sim ulation of size 1440 x 1440 x 192 was performed with NS3D on a parallel computer Contents Governing equations and numerical method Ll Governing questions r o ecse osa v Le LR DURUM fans 1 1 1 Non perturbative case 1 1 2 Perturbative cases 4 4 mur m 1 1 3 Spectral form of the governing equations 12 Numerical method saw ta 4 4 4444 9 xa 1 2 1 Spatial discretisation s ssi llle 12 2 Time discretisation 0 05 4 4 Lea a m Ros 1 2 3 Pseudo spectral evaluation of the nonlinear term 12 4 DeslhasmEg 2 dessu ein RE ES Compilation and execution 2 1 Overview iua kom Epor o9 x Yo Rho die S OEC a 2 2 Directory content o sr e a xo nom gore mom des do 29 Compilation SUD o oa ls eso mx m ono m dom E n ROW 2 3 1 Fast Fourier Transform FFT libraries 2 3 2 Preprocessor files config h and extended config h 2 3 3 Compilation parameter file Makefile 2 1 Execution step s esens de get da b e o 93 2 4 1 Run time parameter filedata in 2 4 Running the executablens3d
15. W library is 1m 1fftw3 2 4 Execution step 19 To compile the code set the prompt into the NS3D source directory The Makefile the FORTRAN source files F90 and the preprocessor files config h and extended_config h must be located in the compilation folder Type the following command line to automatically generate the executable ns3d gt make This executable ns3d is specific to the parameters defined in config h and the Makefile in particular the dimensions Ng x Ny x Nz of the simulation and the FFT library to use To avoid any incoherence it is advised to generate a new executable ns3d for each new simulation 2 4 Execution step 2 4 1 Run time parameter file data in The run time parameter file data in must be located in the same directory than the executable ns3d and is read at the beginning of each run Here is an example of data in BES DH DK k k k kk I k k k k k IKK ICA 1 kk k Ke NS3D simulation parameters BREA kkk kk kkk k kkk kkk k kkk kk kkk kkk kk k This file is read at every simulation start discretisation variables HL EX N EE RS NP PEER LE Rey 200 ly ders TAE ve ier LALE E EA 60 XU ERR RE SORORE MAUREEN RENE 12 5664 Co i PEOR ERE TEE EROR ERE NER REI e E 0 5 De BAT Sect ER NE TNI OF BIN ofa D CON de E ORI ta S CO 2000 defalfiaSTNe namS q squared 1_or_elliptic 2__ 1 radius truncation x 0 66 radius truncation y 0 66 radius truncation z 1 E30 physical variab
16. a 45 Yn 1 uz Xo yo EE Ue 1 Yu 1 ha b b b Wy Xu 15 Yny 1 Wz Xo yo W2 Xn 1 Yn 1 ha hi NN with the following notations e h integer headers and trailers of the records see 83 1 1 e 1 integer fixed flag useful to check endianness consistency e Nx Ny Nz integer number of collocation points N Ny and N e Lx Ly Lz double precision dimensions L Ly and L of the computa tional domain e dt double precision fixed time step t used by the time scheme e dealiasing logical indicates whether dealiasing is active T or not F tic trunc type integer type of dealiasing truncation 1 squared 2 ellip 30 Chapter 3 Input and output binary files rtrunc x rtrunc y rtrunc z double precision radius of truncation rE i and r along each spectral direction nu double precision viscosity v of the fluid stratified logical indicates whether the simulation is in a stratified fluid T or a homogeneous fluid F xns double precision Brunt Vaisala frequency N schmidt double precision Schmidt number Sc omega double precision 20 twice the rotational speed of the frame perturbative logical non perturbative F or perturbative T simula tion linear logical for a perturbative run indicates whether the simulation is linear T or nonlinear F time double precision time t of the record ux Uy Uz Xi Yj Zk double
17. ace The data in the kg direction are stored contiguously ensuring a fast memory access for the one dimensional FFT 2 Each process transposes the data between the first and second local di mensions kz y 2 y ky zx 3 Each process performs a sequence of NF x N p one dimensional Complex to Complex FFTs of size Ny along the local y direction Thanks to the transpose of the previous step the array is in the order ky kz 2 en suring again a fast memory access 4 We perform a distributed transpose of the data between the processes This is done through a MPI communication of type MPI_alltoall that distributes the data along the k direction in the order z kz ky i e each process has a data slice of size N x N x NE p This distributed transpose between directions k and z is illustrated schematically on fig ure 4 3 4 2 Details of the MPI implementation 39 5 Each process performs a sequence of NF x NE p one dimensional Complex to Complex FFTs of size N along the local z direction At this step the array is in the order kz kx ky ensuring again a fast memory access This parallel transposed FFT algorithm gives the Discrete Fourier Trans form of the original data but transposed from x y zx directions into kz kz ky An extra distributed transpose has not been implemented to retrieve the origi nal order because this would have required time costly extra communications All the other st
18. are pseudo spectral numerical method imposing periodic boundary condi tions homogeneous fluid or stratified fluid under the Boussinesq approxima tion possibility to add a frame background rotation non perturbative simulations or perturbative simulations around a base state sequential execution or parallel MPI execution written in FORTRAN 90 for a Unix Linux environment It was first written for a homogeneous fluid by Vincent amp Meneguzzi 1991 and later adapted to stratified fluids by Billant amp Chomaz 2000 and Otheguy et al 2006 The parallel mode of the code was implemented by Deloncle et al 2008 Chapter 1 Governing equations and numerical method 1 1 Governing equations 1 1 1 Non perturbative case The code integrates the incompressible Navier Stokes equations within the Boussinesq approximation in a frame rotating at angular velocity Q about the vertical z axis 2 PM L ux w 2e x u v 2E e vau l la V u 0 1 1b Ob 2 V CU VIENS Ob 1 1c where u u v w is the velocity vector in Cartesian coordinates x y z w the vorticity po a constant reference density p the pressure b gp po the buoyancy with p the density perturbation with respect to the base density po p z g the gravity and e the unit vector in the upward z direction N J g podp dz is the Brunt V is l frequency assumed here constant v is the kinematic viscosity and c v D the Schmidt
19. d by your system MPI library must be set for instance 1mpi Please refer to the documentation of your system MPI library e step 2d the executable ns3d must be started with the correct shell in structions as required by your system MPI library for instance gt mpirun np 16 nsd3 where p 16 is the number of MPI processes Please refer to your system MPI library documentation for more details It must be noted that the formats of the different input velocity init base2D init and output velo rho vort t xxxx xxx files are identical e tween a sequential and a MPI run This implies that the same input file can be used either for a sequential or a MPI run Similarly the same post processing tools can be used with the output files We present below an example of Makefile configured at step 1c to perform a MPI run with p 16 processes with the FFTW MPI library LS SSD Sd SES SES IE TETEHERERETEHIENER RE TEE St SES IE HERE I HII it iit NS3D Makefile it TEHETEHETEHIEETEHERERETEIETE HE Sd ESS es Sd ESS SE S E HERE I HU This Makefile corresponds to a computer using Intel FORTRAN compiler 10 and the FFTW library MPI parallel mode Start of system configuration section command and arguments of the FORTRAN 90 95 compiler F90C ifort F90C FLAGS r8 03 flags to call the preprocessor and definition of a preprocessor macros to set the FFT library PREPROC FLAGS fpp DMP
20. dicity of the discrete Fourier coefficient as a function of the wavenumber The aliasing error pollutes the accuracy of the high order modes especially those last 1 3 of the high order modes To limit aliasing errors NS3D allows to truncate high order spectral modes at each time step of the time scheme Two dealiasing functions are available Squared dealiasing The following spectral modes are truncated i r gies ik jk kk i j B di 0 if or Kae gt ee ik jk kk kk kj max where ky ky and k7 are the maximum positive wavenumbers de fined in 1 11 and r B and r are the truncation radius along the three spectral directions set by the user For instance a value r 0 implies that all the modes are truncated along the k direction while r gt 1 means that no mode is truncated 1 2 Numerical method 11 The classical 2 3 rule by Orszag 1971 that removes most of the aliasing effects is equivalent to rk m rk 2 3 However this implies to truncate a large number of high order modes and thus decrease the spectral and spatial resolution of the simulation Depending on the nature of the physical problems truncating fewer modes may or may not be sufficient Elliptic dealiasing The following spectral modes are truncated 2 Wik ik kk kik d ki E Ho 0 if kj maz k pmax k pmax gt 1 Dik jk kk rk kn qos rk kn Chapter 2 Compilation and execution 2 1 Overview The mains s
21. ds to a value Q 0 e perturbative indicates whether the simulation is non perturbative F or perturbative T In perturbative mode the simulation can be linear perturbative linear T or nonlinear perturbative linear F See 1 1 for the meaning of the different options e base2D type is only used for perturbative simulations It defines the two dimensional base flow 2 4 Execution step 21 null the base flow is null u5 w x yj 0 file the base flow is read from a binary file base2D init tanh the base flow is internally generated and has a hyperbolic tangent profile etc See 83 4 for more details on the available options e velo type allows to select the type of initial velocity state l null the initial velocity state flow is null u b xi Yj zk 0 file the initial velocity state flow is read from a binary file velocity init file vortices the initial velocity state flow is internally gener ated and is made of vertical vortices etc See 83 3 for more details on the available options e white noise defines whether white noise is added to the initial velocity flow It corresponds to the amplitude of the added white noise A value white_noise 0 corresponds to no noise See 3 3 for more details Output defines the number of time steps between two successive calls of the different output subroutines It is generally advised not to call output subroutine
22. e CONFIG_H Number of colocation points define DIMX 64 define DIMY 256 define DIMZ 3 Padding along each direction define PPADKX O define PPADKY O define PPADKZ 0 The following time schemes are available AB2 Adams Bashforth of order 2 RK2 Runge and Kutta of order 2 RK3 Runge and Kutta of order 3 RK4 Runge and Kutta of order 4 define RK4 endif This file contains preprocessor variables used at the compilation step The variables coloured in red must be edited by the user e DIMX DIMY DIMZ number of collocation points Nz Ny and N along each physical direction 2 3 Compilation step 17 e PPADKX PPADKX PPADKZ number of padding values along each spectral direction kz ky and kz The padding values correspond to extra non used values appended in the arrays storing the main fields such as u b or w It can improve memory alignment on some systems and thus speed e time scheme AB2 RK2 RK3 or RKA The other preprocessor file extended config h must not be modified and is also used during compilation 2 3 3 Compilation parameter file Makefile The easiest way of compiling the NS3D code is to use the command make that relies on a Makefile Here is an example of Makefile corresponding to a compilation with the GNU Fortran compiler and the JMFFT library These options will rarely generate the fastest executable but this Makefile should work by default on most systems
23. e Fourier Transforms DFT For example y jk e ORE kif KT 1 12 1 2 Numerical method 9 where tix jk kk is the Discrete Fourier Transform of uj j k 1 N 1 N Nz71 Essen Qik jk kk gt Y dure BRUNE t Ny T Ne NN i 0 j 0 k 0 1 13 This Discrete Fourier Transform can easily be shown to possess the Her mitian symmetry Qik jk kk tin i Ny jk N kk Where the overline denotes the complex conjugate As a result of this symmetry half of the values of Qik jk kk is redundant being the complex conjugate of the other half and thus are not stored in NS3D In FORTRAN we have chosen to store only the first half of the k modes corresponding to positive k wavenumbers More precisely the Discrete Fourier Transforms are stored in double precision arrays of size 2 N 2 1 Ny Nz For instance a 1 ik jk kk Re thig jk kk Nz for ik jk kk 0 x 0 N 1 x 0 N 1 a 2 ik jk kk j 2 a where is the FORTRAN array storing the Discrete Fourier Transform of ux i j k and where Re and Im are the real part and the imaginary part respectively 1 2 2 Time discretisation The following time schemes can be chosen with a constant time step dt e Adams Bashforth of order two e Runge Kutta of order two e Runge Kutta of order three e Runge Kutta of order four A higher order time scheme implies more numerous evaluations of the non linear term
24. e arrays are located in the data part of the system RAM Some compilers limit the size of the data memory to a value smaller than the total available RAM For instance on most systems the addressable data memory is limited by default to 2 Go whereas the RAM can be much larger Trying to compile a code re 5 2 Speed 45 quiring more than the available data memory will cause a relocation memory error at compile link time To overcome this issue a compilation flag is normally available on the compiler For instance the flag mcmodel medium GNU Fortran Intel will make available all the RAM for the data memory 5 2 Speed Calculation time is highly dependent on many factors such as simulation size number type and frequency of processors compiler compilation options FFT library etc We do not intend in this section to give precise running time but rather introduce the key points to understand when optimizing running time 5 2 1 Time consuming steps Figure 5 1 shows the percentage of time spent at each step of the algorithm described in 1 2 3 for a typical simulation of dimension N 256 x 256 x 256 run on a single 3 6GHz Intel Xeon processor We see that 42 36 7896 of the time is spent at steps 2 and 4 of the evaluation of the nonlinear terms These steps correspond to Discrete Fourier Transforms between physical and spectral spaces that are performed with a Fast Fourier Transform FFT algorithm One FFT requires O N log N
25. e2d which is in the file gen velocity F90 It is possible to directly modify the source code of this subroutine to satisfy his needs Chapter 4 MPI parallel run When running large simulations requiring much memory or calculation time a parallel run performed on several processes may become necessary Paral lelism achieved through the use of Message Passing Interface MPI has been implemented in NS3D 4 1 Running a MPI run quick start Let us consider a simulation of dimension N NN x Ny x Nz to run on p MPI processes The procedure is identical to the one of a sequential run described in 82 1 except the following changes e step 1b Ny and N are not necessary equal but must be both multiple of the number of processes p This condition is mandatory e step lc the number of MPI processes must be defined in the Makefile with the flag DMPI p the MPI version of a FFT library must be set in the Makefile Most of the FFT interfaces available in the current version of NS3D have their MPI counterpart DJMFFT MPI DFFTW MPI DESSL MPI and DMATHKEISAN MPI Note that the parallelisation of the three dimensional FFT is imple mented directly by NS3D and relies only on sequential one dimensional FF Ts performed by the third party library It means that it is not required to install a MPI parallel FFT library but only the default sequential one 36 Chapter 4 MPI parallel run the compilation options require
26. eans that the spectral directions kz ky and kz are as sociated with different FORTRAN array indexes in sequential and MPI run To deal with both situations the source code makes use of two different sets of array indexes 1 iki ik2 ik3 corresponds to the FORTRAN storage order ik1 and ik3 being the inner and outer dimension of the array respectively This is verified in both sequential and MPI run 40 Chapter 4 MPI parallel run Distributed transpose Qo OY contiguous elements contiguous elements Proc 1 Proc2 Proc 3 z Proc 1 Proc 2 Proc 3 y Figure 4 3 Schematic of a distributed transpose with p 3 processes on a two dimensional array of total size NE x N 9x 6 The array is initially in the order ky z the data are distributed along the z direction and the ky direction corresponds to contiguous elements in memory The distributed transpose ends up with an array in the order z ky the data are distributed along the k direction and the z direction corresponds to contiguous elements in memory 4 2 Details of the MPI implementation 41 2 IKX IKY IKZ corresponds to the spectral directions kz ky and kz respec tively During the compilation step these variables are replaced by the preprocessor so that they match the correct FORT RAN storage index a sequential run IKX iki IKY ik2 IKZ ik3 b MPI run IKX gt ik2 IKY ik3 IKZ gt ik1 The first set iki ik2 ik3 is u
27. eps of the pseudo spectral algorithm are simply performed point by point in both physical space and spectral space and make use only of the local data of each process These portions are easy to implement and will not be detailed further here The communications between the processes are thus limited to the distributed transpose performed in the FFT This makes the dis tributed memory parallelisation well adapted to the pseudo spectral algorithm It must be noted that because of the transposed parallel FFT both N and N must be multiple of p Indeed Ny NE is splitted between the p processes in spectral space kz kx kyx while N is splitted in physical space x y zx We finally outline here that this transposed FFT algorithm is directly imple mented within the NS3D code and only makes use of sequential one dimensional FFTs provided by third party FFT libraries We do not rely on MPI three dimensional FFT that may be already available in some FFT libraries We do not use for instance the MPI version of the library FFTW that is available in FFTW v3 3 and above 4 2 2 FORTRAN array indexes As outlined in 4 2 1 in MPI parallel mode NS3D uses a parallel transposed FFT algorithm giving the Discrete Fourier Transform of the original data but transposed from x y z directions into kz kz ky It it not the case for a se quential run where the original order is preserved in spectral space kz ky kz As a consequence it m
28. format of the generated output binary files velo_rho_vort t xxx xxxx is precisely described in 83 Examples of Matlab scripts are also provided in the directory NS3D 2 14 post processing These scripts include examples of reading the binary data of output files In particular we outline that the information contained in the header 50 Chapter 6 Frequently asked questions FAQ Q4 Q5 Q6 Q7 such as the size of the simulation Nz Ny Nz can greatly ease post processing In the FORTRAN source code what the variable rho stands for The variable rho is the buoyancy b gp po introduced in the governing equations 1 1 It does not refer to the density p This variable name is used for historical reasons In the FORTRAN source code why the index arrays corre sponding to variables in spectral space are sometimes iki ik2 ik3 and sometimes IKX IKY IKZ Please refer to 4 2 2 I need to run a large xy simulation in MPI parallel mode It is impossible because N 1 and thus it is not a multiple of the number p of processes Is there any solution In MPI mode NS3D distributes the data along the y and z directions implying that N and N must be both multiple of the number p of processes Changing the dimensions that are distributed in MPI mode is not easy as it requires to rewrite the parallel transposed FFT A better solution is to consider changing the orientation of the physical problem from zy to
29. h a large number of processors However we recall that any speed result is highly dependent on the exact architecture of the used system This is especially true for distributed memory parallelism performed on parallel computers a Tournesol b Zahir 32 speed up 32 8 32 64 128 256 number of processors number of processors Figure 5 3 Speed up S of the NS3D code parallelized with MPI on a Tour nesol and b Zahir The ideal linear speed up is shown in dashed line The size of the simulation is N 256 x 256 x 256 The FFT library is FFTW 3 2 Chapter 6 Frequently asked questions FAQ Q1 The code compiles and run well for small simulation sizes but I get compilation linking errors for larger sizes Check that the full memory usage does not exceed the available RAM memory see 5 1 1 If so consider MPI parallel execution Check also that the data memory limitations of the system or compiler are not exceeded see 8 5 1 2 Q2 What is the use of the shell scripts that are in the directory NS3D 2 14 jobs Theses scripts automate the use of NS3D source files copy compilation running save of the results Although not strictly necessary such scripts are usually the most convenient way of using NS3D Please refer directly to the content of job sh for instance for more details Q3 How can I read and post process the output binary files gener ated by NS3D with Matlab The
30. his file is read once at the beginning of the simulation and must be copied in the same directory than the executable ns3d The initial velocity file has the same expected format than the one of the output velocity files see 83 2 so that an output state file can be used as an initial state file However only part of the data stored in a velocity file is actually used to initialize a run e ux Uy Uz b xi yj Zx and time are read from the initial velocity file and are used to initialize the new run e Nx Ny Nz Lx Ly Lz read in the initial velocity file must match the values defined in config h and data in of the new run all the other physical parameters dt nu perturbative etc are not read in the initial velocity file They are freely defined in data in of the new run Wx Wy Wz Xi Yj Zx is not read in the initial velocity file as the vorticity is re computed at each time step e uy uy us xi yj and wy wy w2 xi yj are not read in the initial veloc ity file In perturbative mode the base flow must be defined indepen dently for the new run see 3 4 3 3 2 Internal subroutine The other way of defining an initial state flow u b xi yj Zk to is to use an internal subroutine within the NS3D code To select this option velo type in data in file must be set to the name of this internal subroutine for in stance null null field tanh or stuart This internal subroutine is called at the beginn
31. ing of the run from the subroutine gen velo which is in the file gen velocity F90 It is possible to directly modify the source code of this subroutine to satisfy his needs 32 Chapter 3 Input and output binary files 3 3 3 White noise Finally it is possible to add white noise to the initial state flow with the vari able white noise in data in The noise is added to the initial velocity field u T i Yj Zk to The added noise follows a uniform distribution with a zero mean value The value defined in white noise corresponds to the maximum amplitude of the noise A value white noise 0 means that no noise is added This white noise function is especially useful to initialize perturbative sim ulations when looking for the most unstable mode In this case velo type is set to null and white noise is added 3 4 Base state base2D type In perturbative mode the flow is simulated around a steady two dimensional base state with a null buoyancy To avoid any numerical approximation the base state vorticity w must be explicitly provided by the user and is not computed from the velocity u The base state to be defined by the user is thus of the form w w z yj This base state can be initialized either by reading a base flow file or with an internal subroutine 3 4 1 Reading base2D init The base flow can be read from an external file To select this option base2D_type in data in must be set to file The data describi
32. issue on Program Generation Optimization and Platform Adaptation GOTTLIEB D amp ORSZAG S A 1977 Numerical analysis of spectral methods theory and applications CBMS NSF Regional Conference Series in Applied Mathematics 26 Philadelphia SIAM MICHALKE A 1964 On the inviscid instability of the hyperbolic tangent ve locity profile J Fluid Mech 19 543 556 OrSZAG S A 1971 On the elimination of aliasing in finite difference schemes by filtering high wavenumber components Journal of the Atmospheric Sci ences 28 1074 A two paragraph classic OTHEGUY P CHOMAZ J M amp BILLANT P 2006 Elliptic and zigzag in stabilities on co rotating vertical vortices in a stratified fluid J Fluid Mech 553 253 272 VINCENT A amp MENEGUZZI M 1991 The spatial structure and statistical properties of homogeneous turbulence J Fluid Mech 225 1 20
33. les VENIS CO GIG Ye SPES SET a iente C o S S V E brunt vaisala frequency 10 Schmidt number de 20 Chapter 2 Compilation and execution Type of simulation perturbative T INC A TRS mede a a T Base state only perturbative run basc2DEC po tanh Initial velocity velo type null white noise 1E 10 xk Output KK Oya ovens Jereenieyol o e s 400 output2_period_______________ 1000 output3_period_____________ 0 e 1x ly and 1z are the dimensions Ly Ly and L of the computational domain along each physical direction e dt is the fixed time step dt used by the time scheme e begin is the arbitrary numerical value of initial time to of the simulation e itmax is the number of time steps Consequently the initial time is begin and the ending time is begintitmaxxdt e de aliasing indicates whether dealiasing see 81 2 4 is applied T or not F If dealiasing is applied it is possible to choose between a squared 1 or elliptic 2 dealiasing radius truncation x y z are the dealias ing radius r p and r along each spectral direction e viscosity is the viscosity v of the fluid e stratified indicates whether the simulation is in a stratified fluid T or in a homogeneous fluid F In a stratified fluid the Brunt V is l frequency N and the Schmidt number Sc must be defined e 2omega is 20 twice the angular velocity of the rotating frame The non rotating case correspon
34. ng FF T interfaces are already available in the current version of NS3D e JMFFT 8 0 flag DJMFFT a FFT library written in FORTRAN by Jean Marie Teuler that emulates most of the Cray SCILIB library e FFTW 3 2 flag DFFTW the FFTW library developped by Frigo amp John son 2005 http www fftw org Usually the fastest library on scalar processors x86 IBM PowerPC e ESSL 4 2 flag DESSL a fast library on IBM PowerPC processors Provided by IBM e MathKeisan 1 6 0 flag DMATHKEISAN the fastest library on vectorial NEC SX processors Provided by NEC 16 Chapter 2 Compilation and execution e ASL 19 0 flag DASL an older NEC library designed for NEC SX pro cessors The FORTRAN source code of the JMFFT 8 0 library is included along the NS3D source code It allows to compile and run NS3D without any installed external FFT library However the JMFFT library is usually slower than the other options and thus should be avoided if possible In most situations the FFTW library is the preferred choice 2 3 2 Preprocessor files config h and extended config h The text file config h must be in the NS3D directory Voce AR AR a ok k K k oe ook A OH ok ok kK k KK kK k Kk KK kK k K K NS3D compilation parameters L aoa E kkk kkk kkk kkk kk kkk k kkk kk This file is used by the preprocessor during the compilation step Avoid multiple inclusions of this file Do not change ifndef CONFIG_H defin
35. ng the base state flow must be stored in a binary data file called base2D init This file is read once at the beginning of the simulation and must be copied in the same directory than the executable ns3d The exact format of this base state file is hi 1 Nx Ny Lx Ly hy ho Uy Xo0 Yo gt Ux Xu 1 ny 1 s uy Xo Yo lt o Uy Xw 1 Un 1 uz Xo yo U2 Xn 1 Yn 1 h2 ho Wy Xo Yo 1 Wy Xn 15 Yu 1 s Wy Xo o 5 Wy Xw 1 Yny 1 We Xo Yo W209i Yu h I j with the following notations e h integer headers and trailers of the records see 3 1 1 e 1 integer fixed flag useful to check endianness consistency e Nx Ny integer number of collocation points Ng and Ny 3 4 Base state base2D type 33 e Lx Ly double precision dimensions L and Ly of the computational domain e uy uy u2 i yj double precision velocity base state u xi yj on the collocation points e wp Wy W2 xi yj double precision vorticity base state wb r yj on the collocation points 3 4 2 Internal subroutine The other way of defining a base flow ut w xi yj is to use an inter nal subroutine within the NS3D code To select this option base2D_type in data in file must be set to the name of this internal subroutine for in stance file vortices or stuart This internal subroutine is called at the beginning of the run from the subroutine gen bas
36. nos c PIE 0 0000 AGM EL XC eoe kerosene de aee ere Melee ees 2000 de aliasing en cca neoe T squared 1_ _elliptic 2 1 radius truncation x 0 66000 radius truncation y 0 66000 radius truncation z 2 0000 ASCOSd tV EE 0 50000E 06 Stratilficatlon 2 F brunt_vaisala_frequency 10 000 schmidt number ero esee 1 0000 OME PAD oe ce ee secs 0 0000 perturbative eee r eee ees T ea BIER eee case E seus eee T 24 Chapter 2 Compilation and execution velo types a es E null white Noisen soo ee e eaaa se ec e 0 10000E 09 OUTPUT outputi periodar eree e eisa 400 output2_ period seere 1000 output3 period eoe se eese 0 Initialization of the Fast Fourier Transformation FFT JMFFT 8 0 3D Author Jean Marie Teuler CNRS Base state gt tanh Initial velocity null white noise it 400 time 200 000 cpu time in sec elapsed remaining 0 0 mean quadratic velocity 108398573 85692854 growthrate 0 0000000000000000 it 800 time 400 000 cpu time in sec elapsed remaining 19 59 mean quadratic velocity 7 2431437148986121E 040 growthrate 0 18895533389024657 it 1200 time 600 000 cpu time in sec elapsed remaining 39 39 mean quadratic velocity 5 6960132779538870E 073 growthrate 0 18936254811773309 it 1600 time 800 000 cpu time in sec elapsed remaining 59 19 mean quadratic velocity 4 7565205774962807E 106 growthrate 0 18951264498358636
37. number with D the molecular diffusivity of the stratifying agent We consider periodic boundary conditions u u p e Lz yt Ly z L t p x y z t 1 2 b b where Lz Ly and L are the computational domain sizes 6 Chapter 1 Governing equations and numerical method Simulations in a homogenous fluid can also be performed In this case the governing equations become 2 DL NE S EA E prd vAu 1 3a Ot po 2 V u 0 1 3b In both cases stratified or homogenous fluid a non rotating frame can be chosen by setting Q 0 1 1 2 Perturbative cases Linear perturbative case We consider a steady two dimensional base state w p P z y with a null buoyancy b 0 that is solution of the equations 1 1 This base state is sub jected to infinitesimal three dimensional perturbations p b x y z t such that the total flow is of the form b u u p z y z t p A E 1 4 b 0 b The flow decomposition 1 4 is inserted in 1 1 and the equations are linearized around the base state or cu x B a x wh 2e x av u l be vA 1 5a Po V a 0 1 5b db b wi 27 Yaz um i N Ab 1 at Vb Se 1 5c Nonlinear perturbative case The flow decomposition 1 4 is also inserted in 1 1 but contrary to 1 5 the nonlinear terms are conserved ateoquxe qd xeo x V E n4 Po be vA t Se V a 0 Ob b Q V acu Vb u Vb N GMb 1 6a 1 6b
38. on Discretisation in physical space The Cartesian coordinates x y z are discretised into N Nz x Ny x Nz collocation points x L zx i for ic 0 N 1 1 102 Nz Ly dae for j 0 N 1 1 10b y Lz ze k for ke 0 N 1 1 10c Nz The spatial numerical schemes are based on numerical approximations of the variables u b etc on the collocation points For example the numerical estimate U jk of the exact solution u is such that Ui jk u vi yj 2x In FORTRAN the numerical estimates are stored in double precision arrays of size N Ny Nz For instance ux i j k Ui j k for i j k 0 Nz 1 x 0 Ny 1 x 0 Nz 1 where ux is the FORTRAN array storing the numerical approximation of the x velocity u and i j and k are the array indexes Discretisation in spectral space The spectral coordinates kr ky kz are discretised into N Nz x Ny x Nz wavenumbers 7 270 Nz pik ik for ik 0 7 1 112 ik Ne Z for ike Ae 1 Ns 1 Qn N pe cs jp for jk 0 1 11b i Gk Ny for jke 5 1 Ny 1 T Nz pk kk for kk 0 L 11c z kk Nz 32 for kke 5 1 N 1 where the divisions by 2 are rounded down The first half of the wavenumbers are positive while the second half is negative and in backwards order The Fourier transforms 4 5 etc are numerically approximated on these discretised wavenumbers by Discret
39. re 5 25 x N N N x 8bytes Adams Bashforth of order 218 x N N N x 8 bytes eos Runge Kutta of order 2 8 x N N N x 8 bytes Runge Kutta of order 3 12 x N N N x 8bytes Runge Kutta of order 4 12 x N N N x 8bytes JMFFT amp 0 FFTW amp 0 ESSL amp 0 Mathkeisan z 0 ASL amp 0 FFT libraries ppt MPI 2 x N N N x 8bytes FFTW_MPI 2x N N N x 8 bytes ESSL_MPI 2 x N N N x 8 bytes Mathkeisan_MPI 2x N N N x 8 bytes ASL_MPI not available For a given simulation the full memory usage will be the sum of the re quirements of the NS3D core the used time scheme and the used FFT library We give below examples of typical memory usage for different simulation sizes in the case of the classical Runge Kutta of order four time scheme and FF TW library Size Nx x Ny x N Time scheme FFT library Typical full memory usage 64 x 64 x 64 RK4 FFTW 0 03 Go 128 x 128 x 128 RK4 FFTW 0 27 Go 256 x 256 x 256 RK4 FFTW 2 16 Go 512 x 512 x 512 RK4 FFTW MPI 119 25 Go 1024 x 1024 x 1024 RK4 FFTW MPI 154 Go 2048 x 2048 x 2048 RK4 FFTW MPI 11232 Go 4096 x 4096 x 4096 RK4 FFTW MPI 9956 Go We recall that with MPI parallelism the full memory usage is divided equally among the p MPI processes making possible larger simulations 5 1 2 Memory location The large three dimensional arrays used in NS3D are declared in FORTRAN as static arrays whose dimensions are set at compile time Thes
40. rho vort t xxxx xxx e initial velocity state files velocity init e base state files base2D init 3 1 General remarks on binary files in FORTRAN 3 1 1 Headers and trailers of records The different binary files used in NS3D are opened with the OPEN command with the attribute FORM unformatted In FORTRAN each time the WRITE statement is issued a record is written the record consists in an integer header followed by the data and finally a trailer that matches the header The integer header and trailer consist in the number of bytes that are written in the data section So for example the source code OPEN 60 FILE filename FORM unformatted WRITE 60 nx ny nz WRITE 60 nx ny CLOSE 60 writes the following binary file 12 nx ny nz 12 8 nx ny 8 28 Chapter 3 Input and output binary files where nx ny nz are 4 byte long integers Note the extra integers 12 and 8 corresponding to the number of bytes of each record When reading a binary file FORTRAN is also expecting to find similar headers and trailers for each record It is necessary to take it into account when exchanging binary date file between NS3D and other tools Scilab Matlab The additional headers and trailers are integers and so are usually coded on 4 bytes more rarely they can be coded on 8 bytes on some specific 64 bits systems 3 1 2 Array storage order The multi dimensional arrays that are used
41. s at every time step as they require computational time A value of 0 means that the corresponding subroutine is never called In the current code version three output subroutines are available outputi this subroutine outputs on the terminal screen basic infor mation elapsed and remaining time mean quadratic velocity and velocity growthrate output2 this subroutine writes a velocity state output binary file on the disk see 3 3 output3 this subroutine is empty and can be completed by the user directly in the source code output F90 When the base flow or the initial velocity state file are internally generated by subroutines additional run time parameters can be read from data in We present below an example of extra parameters found at the end of a data in file The meaning of these parameters will not be explained here as they are specific to user defined subroutines that are not part of the body of NS3D They are not required for a standard simulation 22 Chapter 2 Compilation and execution additional variables x Stuart vortices concentration stuart 0 25 File gaussian vortices file nb vortices 2 Vortex 1 DO SINUM OT Fe 3 1415927 DO Sls TIN yn 3 8165927 circulation DE COTERTAIUS NENNEN ERE E 0 2 Vortex 2 positionis o LL 3 1415927 POSANTAKON E LL 2 4665927 circulations ERE 2 core_radius OP Random gaussian vortices rnd_nb_vortices______________ 10
42. s of 1 9 at each time step but stability is usually achieved for larger time steps More details about these classical time schemes can be found in textbooks 1 2 3 Pseudo spectral evaluation of the nonlinear term The time schemes require the evaluation of the nonlinear terms in brackets n 1 9 These terms are computed with a pseudo spectral method from and b by performing the following steps 10 Chapter 1 Governing equations and numerical method 1 we evaluate the vorticity in spectral space ik x 2 we apply backward Fourier transforms to the spectral terms and b to obtain u w and b in the physical space 3 we evaluate the nonlinear terms wu x w and bu in physical space 4 A forward Fourier transforms is applied to the physical terms u x w and bu to obtain u x w and bu in the spectral space 5 we evaluate the nonlinear terms P k uxw 20 e x be and ik bu N in spectral space This algorithm makes an extensive use of Discrete Fourier Transforms at steps 2 and 4 These Discrete Fourier Transforms are performed with a Fast Fourier Transform FFT algorithm As detailed in 85 2 1 the FFTs are the most time consuming steps of the algorithm requiring usually 70 95 of the total calculation time 1 2 4 Dealiasing The Discrete Fourier Transforms of a periodic function introduces the so called aliasing error Gottlieb amp Orszag 1977 which is partially due to the artificial perio
43. sually used when an operation is done spec tral mode by spectral mode and when the mode direction is not important The second set IKX IKY IKZ should be used when the specific orientation of a mode is important Chapter 5 Performances 5 1 Memory 5 1 1 Memory usage Random Access Memory RAM usage can become an issue for large simula tions making important the ability to evaluate the memory usage of NS3D Almost all the memory is used by three dimensional arrays of size Nz x Ny x Nz storing the solution fields u and b as well as working arrays All these large arrays are declared in the file global_vars F90 More precisely the memory usage of NS3D can be divided into e NS3D core arrays corresponding to the core of the code mainly the solution fields u and b e time scheme working arrays arrays used in the time schemes mainly to store intermediate values e FFT working arrays arrays used to perform the FFTs We detail below the typical memory usage of each section supposing that FORTRAN double precision and logical values are stored on 8 bytes and 4 bytes respectively We neglect all the scalars as well as the one and two dimensional arrays Indeed for large simulations the three dimensional arrays become increasingly dominant in memory usage However a precise estimation implies to add a small overhead to the values indicated below 44 Chapter 5 Performances Typical memory usage NS3D co
44. teps to use the NS3D code are the following 1 Compiling the code a installing a third party Fast Fourier Transform FF T library b editing the preprocessor file config h c compiling the source files with a Makefile to generate the exe cutable ns3d 2 Running the code a editing the run time parameter file data in b optional generating an initial velocity state file velocity init c optional generating a base state file base2D init d running the executable ns3d These steps are detailed in the following sections in the case of a sequential execution The specificities of parallel MPI runs are presented in 4 14 Chapter 2 Compilation and execution 2 2 Directory content The NS3D directory structure is the following NS3D 2 14 source MPI_Times F90 data_parser F90 global_vars F90 subfunctions F90 input F90 output F90 gen_velocity F90 time_scheme F90 main F90 config h extended config h nues ED PUT Rae D FORTRAN source files timing F90 3 fft F90 preprocessor files Makefile compilation parameter file data in velocity init base2D init initial velocity state file run time parameter file optional base state file optional JMFFT 8 0 source of the third party FFT library JMFFT optional examples of Makefiles for various compilers and FFT optional this documentation post_processing Matlab scrip
45. throughout NS3D velocity buoy ancy vorticity fields are stored in column major order by FORTRAN meaning that the first array index varies most rapidly This is also how an array is dumped to or read from a file Let us consider for example a two dimensional array u x y defined on a Cartesian grid xo xa 1 X yo yp i NS3S writes and reads this ve locity field in the following order u xo Yo u x1 Yo e U Xn 1 yo U Xo y1 u xi y1 toe U Xn 1 y1 ee U Xn 1 Yp 1 The generalisation to the three dimensional arrays used throughout NS3D is straightforward 3 1 3 Endian order Endianness is the attribute of a system that indicates whether numbers are represented from left to right or right to left Endianness comes in two varieties big endian and little endian and depends on the system processor PC Intel AMD are little endian whereas most of the other processors PowerPC NEC SGI are all big endian An endianness difference can cause problems if a computer unknowingly tries to read binary data written in the opposite format from a file It can happen if you create a data file with NS3D on a computer and try to use it as an input file on another system not sharing the same endianness When working with systems with different endianness we advise to force all the compilers to work with a similar endianness For instance big endianness can be enforced with the flag convert big endian
46. ts bash scripts to automate the use of NS3D 2 3 Compilation step 15 The only files that are mandatory to use NS3D are located in the source directory The other files are optional and are only provided in the hope of helping 2 3 Compilation step 2 3 1 Fast Fourier Transform FFT libraries As a pseudo spectral code NS3D makes an extensive use of FFTs The FFT is not embedded in the NS3D source code and must be provided by a third party FFT library installed on the system To install a library please refer to the intructions of the FFT library provider The FFT library must be interfaced with NS3D through the interface mod ule fft F90 that contains the following generic interface subroutines e init fft this subroutine is called once at the beginning of the NS3D code It performs all the required initialisation operations before doing a forward or a backward FFT e fwd fft this subroutine performs a forward three dimensional Real to Complex FFT i e from physical space to spectral space e bck fft this subroutine performs a backward three dimensional Complex to Real FFT i e from spectral space to physical space Interfacing a new FFT library with NS3D requires only to write the corre sponding previous subroutines No additional modification in the body of the NS3D code is necessary The FFT interface to use is defined at the compilation step through a preprocessor flag of the Makefile see 82 3 3 The followi
47. urbative and non linear Cy wbx0 wbx ix iy wby0 wby ix iy tomega2 wbz0 wbz ix iy Co perturbative and linear koae wbxO wbx ix iy wby0 wby ix iy tomega2 wbz0 wbz ix iy b Q9 The two dimensional base flow depends on x and y u w x y I need a yz dependency u5 w y z Is it possible The base flow is hard coded to depend only on xy It is however possible to change it directly in the code The following changes indicated in red must be performed global vars F90 base state double precision dimension 0 dy 1 0 dz 1 save vbx vby vbz wbx wby wbz subfunctions F90 subroutine vect prod 52 Chapter 6 Frequently asked questions FAQ perturbative and non linear else if perturbative and not linear then Che vbx0 vbx iy iz vby0 vby iy iz vbzO vbz iy iz wbxO wbx iy iz wbyO wby iy iz wbz0 wbz iy iz omega2 CCS perturbative and linear else if perturbative and linear then e vbxO vbx iy iz vbyO vby iy iz vbzO vbz iy iz wbxO wbx iy iz wbyO wby iy iz wbz0 wbz iy iz omega2 output F90 subroutine output base double precision dimension 0 dy 1 0 dz 1 intent in vbx vby vbz wbx wby wbz CRR open 77 file base2D init form unformatted write 77 1 gny gnz ly lz we print the base state into the file write 77 vbx O ny 1 0 nz 1 vby O ny 1 0 nz 1 vbz O ny 1 0 n
48. yz It should require to modify at most a the orientation of the stratification see Q7 b the orientation of the frame background rotation Q see Q8 c in perturbative mode the dependency of the base flow see Q9 d the orientation of the initial velocity state In stratified flow how can I change the orientation of the strat ification The stratification is hard coded to be oriented in the z direction How ever it is relatively easy to modify the source code to change its ori entation For instance to set the stratification in the y direction the following changes indicated in red must be performed Frequently asked questions FAQ 51 subfunctions F90 subroutine non linear term save of svy in the field sfrho trick to save one storage field later if stratification then sfrho svy end if Cae we add the buoyancy term for the velocity equation sfy sfytsrho Q8 How can I change the orientation of the frame background ro tation Q The frame background rotation Q is hard coded to be oriented in the z direction However it is relatively easy to modify the source code to change its orientation For instance to set the rotation in the y direction the following changes indicated in red must be performed subfunctions F90 subroutine vect_prod pbx ix iy iz ay0 bz0 az0 omega2 by0 pby ix iy iz az0 bx0 ax0 bz0 pbz ix iy iz ax0 omega2 by0 ay0 bx0 em pert
49. z 1 write 77 wbx 0 ny 1 0 nz 1 wby 0 ny 1 0 nz 1 wbz 0 ny 1 0 nz 1 gen velocity F90 subroutine read base2D double precision dimension 0 dy 1 0 dz 1 intent out vbx vby vbz wbx wby wbz integer flag nyread nzread double precision lyread lzread open unit 88 file base2D init form unformatted action read read 88 flag nyread nzread lyread lzread we check the file format we check only the dimensions that must not changed between two runs if flag 1 or nyread gny or nzread gnz amp or abs lyread ly gt epsilo or abs lzread 1z gt epsilo then write ERROR the 2D base flow file has bad format Failure of the simulation stop end if Frequently asked questions FAQ 53 read 88 vbx 0 ny 1 0 nz 1 vby 0 ny 1 0 nz 1 vbz 0 ny 1 0 nz 1 read 88 wbx 0 ny 1 0 nz 1 wby 0 ny 1 0 nz 1 wbz 0 ny 1 0 nz 1 gen velocity F90 subroutine gen base2D double precision dimension 0 dx 1 0 dy 1 intent out vbx vby vbz wbx wby wbz m To avoid any dimensions discrepancy it is advised to delete all the calls to the xy subroutines gen velo tanh gen velo stuart etc and create specific yz version of these subroutines when necessary For instance if field name file then call read base2D vbx vby vbz wbx wby wbz else if field name tanh then call gen velo tanh yz vbx vby vbz wbx wby wbz else if field name null then vbx

NS3D v2.14: user's manual

Contents

Download Pdf Manuals

Related Search

Related Contents