Home

User Manual Version 2.0 - P-arch

1. At the bottom it is possible to consider free slip or no slip boundary conditions 5 _ h Aho dae By u v w 0 10 In the present work we suppose that the pressure field is almost hydrostatic i e that the vertical accelerations in the fluid are negligible with respect to the hydrostatic pressure gradient so that equation 3 can be approximated as follows 1 Op poz g 9 11 The assumption 11 is valid only when the vertical accelerations are small i e when the wavelength is much greater than the height of the wave itself so that it is usually referred to these equations as long waves model However when investigating the range of applicability that this assumption allows it is sometimes used as non dimensional reference quantity the ratio between the basin depth and the basin width instead of the amplitude and length of the involved waves This implies an identification between these quantities that should be verified case by case 2 Shallow Water Model The shallow water equations also referred to as de Saint Venant equations are derived integrating the Navier Stokes equations along the vertical under the hypothesis of con stant density In this way only the average velocity is involved and a 2D description is recovered If p is constant equation 11 is immediately integrated as follows p po pglE 2 12 Defining 1 p ualz y t f u x y z t dz 13 ho x y t z e x y z t dz 14 t
2. 32 Figure 4 Red Refinement standard or regular and green of a triangle Moreover when simulating subcritical shallow water flow regions are more suitable to be refined instead of lines see results in section 11 5 So in the present work the computations are started on a mesh as regular as possible then refinements and de refinements are accomplished in such a way to preserve regularity In Figure 4 is shown the subdivision technique known as standard or regular or red refinement 31 The marked elements are first divided in standard way The surrounding triangles have then one two or three edges sub divided The nodes in the midpoint of the edges of the elements not yet refined are called hanging nodes Elements having hanging nodes must be refined in a proper way to ensure consistency of the triangulation Among the possible strategies we have chose the one described in the following in C like form for i 0 1 lt NEL 1 if Err i gt thresh StandRef i for i 0 1 lt NEL 1 if NHang i gt 1 NHang i 1 EType i green StandRef i if StandRef has been called at least one time in the last block then it is re executed for i 0 1 lt NEL 1 if NHang i 1 MakeGreen i Err i is a function evaluating the error on the triangle i with one of the described methods StandRef i refines the triangle i in standard way if the triangle is green it and the sibling are substituted by the father before ref
3. is the elevation over a reference plane h is the total depth of the water v is the horizontal dispersion coefficient formerly va g is the gravity acceleration K is the Strickler coefficient In the Coriolis term Q is the angular velocity of the earth 3 Turbulence Modeling The Shallow Water Equations describe the motion of a turbulent flow in a satisfactory way but in any practical numerical solution the computational grid needed to fully resolve the turbulent motion would be too fine to fit in the memory of any computer Turbulent motion indeed occurs on a great range of length scale The energy is passed from big vortices to smaller vortices in a cascade process and it is eventually dissipated by viscous effect at a very small scale the turbulent scale where all the Fourier modes are dissipated Being impossible to describe the motion of the fluid in such detail we are forced to resort to a modelization of the effects that the turbulent sub grid motion has on the fluid motion which we are willing to compute on our computational mesh A great variety of turbulence models have been proposed through the years Here we are interested in those methods which rely upon the Eddy Viscosity Diffusity concept first introduced by Boussinesq which models turbulent stresses as proportional to the mean velocity field introducing the concept of a turbulent viscosity in addition to the usual physical viscosity The values for this new viscosity
4. SWEET code is Davide Ambrosi currently at Politecnico di Torino To him not only our sincere thank is due but mainly the recognizance that SWEET is and will remain a work of his TOVTO pel Everything flows Heraclitus V sec b c Part I Physical Modeling 1 The 3D Navier Stokes Equations Let us consider the Reynolds averaged incompressible Navier Stokes NS equations for a free surface fluid ue h oa Vav 7 aot fo 1 auido 4 wae v ve 2 n L fi 2 Dipende iv i Van E ne 10 3 pag 4 ee pot uy 2 uot 0 5 where u v w is the velocity vector vp is the horizontal eddy viscosity v is the vertical eddy viscosity p is the density V and V represent the horizontal gradient and the horizontal divergence respectively p is the pressure g is the gravity acceleration T is the temperature x is the horizontal eddy diffusivity and y is the vertical eddy diffusivity The values of vp and v are usually very different due to the fact that the horizontal dimensions of the water body are often much larger than the vertical dimension Here we neglect the internal energy transfer due to viscous effects The fluid domain is vertically bounded by the surfaces satisfying the following equations a E x yt 6 z ho z y 7 The boundary condition on the free surface is that the fluid doesn t cross it i e the fluid moves with velocity equal to that of the surface itself _d oE 0 at de Oy 8
5. depends on the roughness of the walls we have considered hydraulically smooth walls for which 9 n and 7 are the versors normal and tangent to the closed boundary respectively At the open boundaries on the contrary where the discharge is imposed Dirichlet boundary conditions are enforced and elsewhere natural boundary conditions have been imposed for k and e while the boundary conditions for q are not different from the ones used for laminar flow 12 Part II Numerical Algorithms Introduction to the Numerical Discretization The time advancing method adopted for SWEET is of fractional step type The main idea underlying this formulation is the splitting at every time step of the equations of the differential system in order to decouple the physical contributions In particular the wave traveling at speed gh which is the most restrictive with respect to the maximum time step allowed in this kind of problem is treated implicitly with a low computational cost In the discussion of the numerical results it will be shown that this method coupled with a Lagrangian treatment of the convective terms totally avoids the oscillations for the velocity that are known to plague the finite element approximations of the shallow water equations written in primitive form 2 4 The Shallow Water Equations Let s rewrite the SWE of eqs 34 once again dq _ alal as dI V q 0 48 As before we have that q z y t qz qy
6. is the unit width discharge that is q hua qy hva E is the elevation over a reference plane A is the total depth of the water v is the horizontal dispersion coefficient g is the gravity acceleration K is the Strickler coefficient and 2 is the angular velocity of the earth A schematic representation of some of these quantities may be seen in Figure 1 According to the theory of characteristics if v 0 and the flow is subcritical two boundary conditions are to be prescribed at the inflow and one at the outflow However when considering the case v 0 the presence of the diffusion term in system 47 48 requires the imposition of a proper boundary condition for the unit width discharge on the whole boundary and moreover as v is usually very small in the applications it is natural to require that these boundary conditions recall the inviscid case as the viscosity coefficient tends to zero Therefore the boundary conditions applied here are as follows as many Dirichlet conditions as required by the characteristic theory plus Neumann boundary conditions for each component of the unit width discharge where its value is not yet imposed Note that the weak Neumann condition on q arises naturally in the integration by parts of the diffusive term when considering the weak form of 47 13 Figure 1 Elevation and depth 5 The Numerical Scheme for the SWE The main idea behind the adopted time advancing scheme is to split the equa
7. pp 257 268 1985 D E Abbott and J Kline Experimental investigation of subsonic turbulent flow over single and double backward facing step Journal of Basic Engineering Transac tions ASME 84 1962 84
8. Circular Reservoir 11 2 Hydraulic Jump 11 3 Parallel Computation on a Complex Geometry 0 11 4 Abrupt enlargement of a channel the k e model 11 5 Automatic Mesh Adaption 11 6 Automatic Mesh Adaption III User Manual 12 Structure of the Code 12 1 Flow Chart 0 13 List of the Vectors 14 Data structures 15 Sequential Input and Output 15 1 Input 2 0020 15 2 Output 16 The Parallel Setup 16 1 Partitioning the Mesh 17 The Parallel Run 17 1 The Parallel Output 18 Practical Remarks 18 1 Hints 0020 18 2 When Everything Else Fails A PVM Quick Guide A l Starting PVMe A 2 PVM Messages steady state case 2 ee a unsteady state case 35 39 39 38 42 43 53 57 57 60 60 63 65 65 69 72 72 75 75 76 76 78 Abstract SWEET Shallow Water Equations Evolving in Time is a code for the solution of the 2D de Saint Venant equations written in their conservative form The code adopts a Finite Differences scheme to advance in time with a fractional step pro cedure The space discretization is realized through Finite Elements with a linear representation of the water elevation and a quadratic representation of the unit width discharge In this document the physical model and the numerical schemes used for solving the resulting equations are extensively described The accuracy of the scheme is verified in
9. Lagrangian integration at the boundary between sub domains Lu f 73 where u L L H q t 3 At indicates a quasi symmetric linear operator and PRA After being approximated by finite elements relation 73 can be written in the algebraic form Ax b 74 The matrix A is symmetric positive definite sparse and typically very large An effective algorithm to solve the linear problem 74 is the conjugate gradient CG when coupled with a suitable preconditioner We use a parallel implementation of a CG to solve equation 74 on a distributed memory machine An effective parallelization of this iterative method can be be obtained as follows Let us suppose that the computational domain Q is partitioned into N sub domains with an overlap of one element such that Q U Q We assign at every processor the job of performing the computations on the elements of the matrix belonging to a sub domain 2 The following pseudo code focuses the communications needed in the parallel version of the algorithm see for instance 15 ro b Axo Po ro For 1 until convergence 22 a rj r Ap p inter processor communication Tj41 Tj Q Pj ria Tj aj Ap By ry44 r 41 1 r inter processor communication Pitt ja Bip end do A look at the algorithm listed above shows that the iterative method can be parallelized by exchanging information only when global scalar quantities are co
10. Publications 1996 Beguelin A Dongarra J Geist G A Manchek R and Sunderam V A user s guide to PVM Parallel virtual machine Technical report TM 11826 Oak Ridge National Laboratories 1991 Ambrosi D Corti S Pennati V and Saleri F A 2D numerical simulation of the Po river delta flow in Modelling of Flood Propagation Over Initially Dry Areas Molinaro P and Natale L eds 1994 Falconer R A Numerical modeling of tidal circulation in harbors Journal of Wa terway Port Coastal and Ocean Division ASCE 106 n WW1 pp 31 48 1980 Toh K K H The MTV plot data format Visualization Working Group Intel Cor poration 1993 Rodi W Turbulence models and their application in hydraulics a state of art review IAHR Monography Series 1993 W G Habashi M Fortin J Dompierre M G Vallet and Y Bourgault Certifiable CFD through mesh optimization submitted to AIAA Journal pages 1 22 1996 R A Walters and E J Barragy Comparison of Hand P finite element approximations of the shallow water model Int J Numer Meth Fluids 24 61 79 1997 R E Bank and A Weiser Some a posteriori error estimators for elliptic partial differential equations Mathematics of Computation 44 170 283 301 April 1985 R E Bank PLTMG A software package for solving elliptic partial differential equa tions Uesrs guide 7 0 SIAM 1994 R Lohner Finite element methods in CFD grid generation a
11. directory The restart tem restart files are fully compatible with the ones produced by the scalar one The files to monitor the spy nodes have the same format than the sequential ones but with different names while the name in the sequential code are 2001 2020 after a parallel run the files will have names like XXYY where XX indicates which processor has written the file from 1 to 99 and YY is the number of the spy node from 1 to 20 When a spy node is shared between two processors then two identical files are produced with different names The standard output of the sequential run is now directed to a file called proc0 output This file is always written even when the output level iwarn flag in datin file is set to zero When iwarn is greater than one each processor writes an output file named procX output where X is the processor number 18 Practical Remarks We add here some practical remarks about the questions that can arise when using SWEET to simulate real life flows 18 1 Hints Which time step should I choose From strictly physical arguments we can say that the time step dt should not necessarily be larger than 107 107 times the characteristic time of the phenomena we are interested in From a numerical point of view an upper bound can be imposed because of stability reasons In particular if mass matrix lumping is adopted at Step 2 it must be verified that Zaty lt 1 Moreover for the lagrangian
12. do until convergence precondition solving local linear systems on each of the N sub domains by CG ILUT perform one CG iteration on the global domain in parallel 28 end do solve Step 3 for the unknown q in each of the N sub domains exchange boundary conditions among the N sub domains end do 10 Mesh Adaption The use of unstructured grids for the numerical approximation of partial differential equa tions of applied mathematics has two great attractives The one most commonly claimed is the geometrical flexibility that is the capability to handle computational domains with complicated boundaries of problems that would be almost impossible to solve by a struc tured approach However there is a second aspect of unstructured grids that has even more relevance the possibility to refine the computational mesh where needed in order to minimize the computational error in some proper sense Suitable indicators of the accuracy of the solution allow to refine the mesh where the numerical error is large and to coarsen it where the error is small in order to optimize the quality of the computed solution for a given computational effort 22 Mesh adaption techniques have been used since many years ago in several fields of computational fluid dynamics but adaptivity has not yet much explored in the framework of free surface hydrostatic flow At out knowledge only a very recent paper 23 compares and discusses the use of high order p
13. domains In Figure 13 the total speed up of the shallow water code is shown As it can be noticed both the Lagrangian integration and the Step 2 integration which are explicit show a speed up which is very near to the ideal one The speed up of the implicit step while not being ideal is however satisfying Since this step has the major computational cost its behavior reflects largely on the total speed up continuous line The odd behavior of the curve the first derivative does not decrease monotonically as it would be expected is due to the particular setup of sub domains we use the shape of the sub domains for the preconditioning step changes every time the number of processors changes even if their number and size do not Thus since the speed and the preconditioning properties of the Schwarz algorithm can be influenced also by the shape of the sub domains we can need different numbers of iterations for the global CG to reach the given convergence threshold In order to further test the behavior of the Schwarz algorithm we have run the same 40 300 0 5 r r r r Bis No coarse grid correction Coarse grid correction 200 0 Time sec 100 0 0 0 1 1 1 1 1 1 2 40 80 120 160 200 240 Number of subdomains Figure 12 Schwarz preconditioning CPU time required to achieve convergence of the conjugate gradient solver versus number of sub domains N using two processors 8 0 Lagrangian I
14. errors both at the corner of the enlargement and at the left inflow corner The error indicator show a large error at the left inflow corner The global behavior of ez is even better showing in most of the domain an error of 0 5 versus 2 5 given by the other criteria The error in the magnitude of the velocity suggest more or less the same remarks The maximum error induced by the criteria ej is slightly larger than the others but it is strongly confined around the island The other indicators lead to a relevant error also near to the right inflow corner of the secondary channel The numerical tests show that mesh adaption is a very reliable tool for numerical simula tion of shallow water steady flows Any error indicator yield to numerical results that are strongly improved with respect to a uniform mesh with a minor increase in the compu tational effort The mesh refinement coarsening technique is almost decoupled from the numerical integration aspects Last but not least the initial mesh generation does not require any a priori knowledge of the flowfield as any quite regular background grid is automatically refined to an almost optimal one Although the numerical results have been obtained for a test case that has been thought to include more scenarios general conclusions can be hardly drawn However it is possible to say that all error indicators do their own job detecting regions where truncation error is relevant The local mass
15. family of overlapping Schwarz algorithms for non symmetric and in definite elliptic problems in Domain based parallelism and problem decomposition methods in computational science and engineering SIAM 1994 Hendrickson B and Leland R The Chaco user s guide Version 1 0 Sandia Tech nical Report SAND93 2339 1993 Sharp M and Farhat C TOP DOMDEC User s Manual PGSoft and The Re gents of the University of Colorado 1995 Karypis G and Kumar V Metis Unstructured graph partitioning and sparse matrix ordering system University of Minnesota Technical Report 1995 Chan T F and Mathew T P Domain decomposition algorithms Acta Numerica 1994 pag 61 143 Koobus B Lallemand M H and Dervieux A Unstructured Volume agglomeration MG solution of the Poisson equation Rapport de recherche RR 1496 INRIA FR 1993 82 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Lallemand M H Steve H and Dervieux A Unstructured multigridding by volume agglomeration current status Computer and Fluids 21 3 397 433 1992 Saad Y Iterative Methods for Sparse Linear Systems PWS Publishing Company 1996 Hinkelmann R and Zielke W A parallel three dimensional Lagrangian Eulerian model for the shallow water and transport equations Computational methods in water resources XI A A Aldama et al eds Computational Mechanics
16. integration of the convective terms it must be ensured that cflmax lt rk that is the pathline must not require more than rk steps to be reconstructed For steady state computations the time step should be chosen only according to stability reasons Pathline out of boundary what does it mean This message given on the standard error only if warn is greater than one indicates that the lagrangian integration is failed on a given node Thus it can be considered as an inaccuracy indicator The tolerance for this kind of inaccuracy is left to the user in the sense that it is the user himself that has to judge on how many nodes it is possible to ignore the advection contribution It is possible to augment the accuracy by increasing the rk value that is the number of steps in which the lagrangian integration is performed and or by decreasing the dt 76 value Which eddy viscosity and Strickler coefficient should I choose In a 2D approach the stress on the bottom can be only modeled and not actually simu lated The Strickler coefficient depends on the nature of the bottom and standard tables yield values experimentally deduced typically ranging between 30 and 100 Usually the eddy viscosity coefficient has a secondary importance in the equations not being neces sary to use its stabilizing numerical effect in the SWEET approach typical values for it are 0 1 2 for simulations in river few hundreds for simulations in large areas co
17. is a proprietary version of PVM made by IBM for its hardware systems It is nearly fully compatible with PVM and on the parallel systems SPx it has been designed so to use the High Performance Switch to enhance its performance Differently from PVM which runs on the IBM systems as well PVMe runs on the SPx nodes in an exclusive way only a single PVMe daemon can run on a node Thus only a user at a time can access the specified node s For this reason the PVMe daemon must be stopped when one is not going to use it for some time since otherwise the nodes on which the daemon runs will remain unaccessible to the other users A 1 Starting PVMe The first thing to do is to select a pool of nodes and create a file named as example hostfile containing the names of the chosen nodes one per row without any blank row Example isplnl sp2 cris enel it ispln4 sp2 cris enel it ispln5 sp2 cris enel it At this point run pvme hostfile You should have an output like the following PVMD Assuming default as control workstation PVMD Trying to get 3 nodes PVMD Using ispini with dx usr lpp pvme bin pvmd3e ep u sweet pvm3 bin RS6K PVMD Using ispin4 with dx usr lpp pvme bin pvmd3e ep u sweet pvm3 bin RS6K PVMD Using ispin5 with dx usr lpp pvme bin pvmd3e ep u sweet pvm3 bin RS6K PVMD Ready for lt 3 gt hosts 79 The name usr lpp pvme bin pvmd3e indicates the full path of the PVMe daemon while u sweet pvm3 bi
18. is use case without correction present the same curve since the inversion of the coarse system does id correction the coarse gr bj Here number of sub domains Ns not take a significant amount of time around 2 for the case with 240 sub domains 39 original Schwarz algorithm degrades when the number of sub domains increases However the algebraic coarse grid correction halts this degeneration completely when more than ten sub domains are used By joining the results shown in Figures 10 and 11 we obtain the curve for the total CPU time needed to solve the linear system versus the number of sub domains for a fixed number of processors Figure 12 The best performance for the present test case has been obtained with 200 sub domains This result demonstrates the usefulness of choosing the number of sub domains independently from the number of processors actually used in the calculation 300 200 100 Number of global CG iterations Sena No coarse grid correction Coarse grid correction 0 1 1 1 1 1 2 40 80 120 160 200 240 Number of subdomains Figure 11 Schwarz preconditioning number of global CG iterations required to reach conver gence versus the number of sub domains N We now examine the parallel performance of the code by varying the number of processors and keeping fixed the number of sub domains for the Schwarz preconditioner in the present case we have used the partition of 200 sub
19. maximum deviation for the module of the discharge of the solutions fine and fine are respectively 0 12 and 0 06 This seems indicate with good certainty that with the progressive uniform re finement of the mesh the solution is converging and that fine8 represents a quite accurate solution For this reason from now on with the sentence estimated true error we will refer to the difference between a generic and the fine solution evaluated on the nodes of the background mesh To verify the performance of the three error estimators proposed we have calculated the spatial correlation between these and the corresponding estimated true error for the solution fine0 The correlation is practically zero for the elevation due to the fact that the errors of the solution for this variable are negligible see Fig 28 The temporal evolution of the correlation for the module of the discharge appears to be more interesting this value generally being equal to about 0 8 but periodically falling to very low values In the same figure we show the temporal evolution of the imposed elevation periods of low correlation are in correspondence with periods of stagnation of the tide d dt 0 Near to these instants velocity of the water already very small is practically zero For this reason the fact of having obtained a low correlation should not be surprising nor of concern when the correlation is low all the estimated errors has a minimum see als
20. of 1074 can be confidently used What difference in the results is expected to be found when using mass matrix lump ing The numerical results computed using mass matrix lumping are less accurate than when using consistent mass matrix This is because mass matrix lumping corresponds to using a lower order quadrature rule in evaluating integrals Moreover mass lumping involves ZAT lt 1 must be observed It is suggested to perform preliminary calculations by lumped approach in explicit discretization of the diffusive term so that the limitations such a way to tune appropriately the parameters and then to use the consistent mass matrix for final computations Which kind of parallel preconditioner should I use for the solution of the linear system Since there is not an universal choice the code asks for specification about the parallel TT preconditioner Two possibilities are implemented in the parallel SWEET code a diago nal preconditioner and an additive Schwarz preconditioner The Schwarz preconditioner is always the most effective algorithm in reducing the number of global CG iterations However its computational cost is high and it can be slower than the diagonal pre conditioner This latter situation occurs when the matrix of the linear system is not enough ill conditioned to represents a difficult task for the global CG Low number of unknowns smooth bathymetry small time steps are all indices of relatively easy p
21. of the code can be strongly influenced by the number of subdomains used in the Schwarz preconditioning and by the quality of the partition Sometimes a bad partition of the original mesh can lead even to non convergence of the global CG in dependence of the chosen preconditioner If a problem is too big to fit on the selected pool of processors other processors must be added to the pool when possible The code is capable to run about 25000 nodes per processor in double precision considering 128 MB of RAM per node i e the standard IBM SP2 configuration Some error messages are redirected to the standard error some other are signaled in the output files check both the devices to monitor the run When the run is completed always wait for the PVMe new epoch signal to access the output files 17 1 The Parallel Output Warning this section only contains the differences between the sequential and parallel output Please read first the Section 15 2 to understand the output informations of the SWEET code There are some differences between the output files produced by the sequential and the parallel version of SWEET This is primarily due to the fact that every processor produces distinct output files However when some global informations have to be collected into a 75 file like the restart tem files the parallel program will produce unique files just like the sequential program All the output files are written in the working
22. ordered list containing all the elements that are adjacent to a node or to a given element In this way the search for the element in which the pathline foot falls is restricted to clusters of elements To avoid that the foot of the pathline reconstructed falls outside of the domain the rigid boundary of the domain is always assumed to be a streamline It is worthwhile to remark that the quadratic representation of velocities that has been adopted for compatibility reasons fully satisfies the accuracy requirements recommended for the reconstruction of the pathline 6 5 2 Imposition of Boundary Conditions Particular conditions on the unknowns must be posed on the boundary of the integration domain To further investigate which are the different possible conditions we distin guish between open boundaries across which we can have a net flux of water and closed boundaries i e solid walls 5 2 1 Open Boundaries In these regions we can impose conditions on the discharge unknown or on the elevation unknown We can have Dirichlet b c on the discharge for example at an inflow region to impose a particular flux of water on that part of the domain possibly changing in time In this case we will not have conditions on the elevation Alternatively we can impose Dirichlet b c on the elevation for example to simulate sea tides In this case we usually impose natural b c on the discharge that is the requirement that the discharge must be nor
23. permitted during the adaption procedure as a multiple ot the elements of the background mesh 1 lt refmax procs ignored by the sequential code Number of processors for the parallel run subd ignored by the sequential code Number of subdomains for the Schwarz preconditioner if precond 1 coarse ignored by the sequential code If 1 coarse correction for the Schwarz preconditioner if precond 1 ilut ignored by the sequential code If 1 ILUT acceleration for the Schwarz preconditioner if precond 1 mesh It is an ASCII file where the informations about the mesh are contained The data structure is the standard one position of the nodes and list of the connections among the nodes The informations contained in the mesh file are ne number of elements nv2 total number of nodes nvi number of P1 nodes x k j k th coordinate of the node j inod j type of the node lel k i list of the nodes belonging to element i These informations are given in the mesh file in the following order ne nv2 nvi x 1 1 x 2 1 inod 1 x 1 2 x 2 2 inod 2 x 1 nv2 x 2 nv2 inod nv2 lel 1 1 lel 2 1 lel 6 1 lel 1 2 lel 2 2 lel 6 2 67 lel 1 ne lel 2 ne lel 6 ne In the list of the nodes the P1 are grouped in the upper part of the list in the connection list the first three nodes are the P1 nodes listed in counterclockwise sense eo po o Figure 33 Order of the numbering of the nodes in a triangle bati Thi
24. there is only one sub domain where a given node is considered as interior This sub domain will be termed as the parent sub domain for that node The domain decomposition approach can be easily applied to the explicit parts of algo rithms since the operations are mainly local The nodes at the boundary between sub domains must be updated at the end of each step by receiving the values from their parent sub domains Thus lists of sending and receiving nodes are defined for every sub domain together with a communication pattern able to guarantee no blocking communications During the Lagrangian step Step 1 the situation is complicated by the fact that for a border node it is possible that the pathline falls outside the sub domain Thanks to the overlap it is possible to perform the Lagrangian integration in the neighboring sub domain However the list of the border nodes for which the pathline falls outside the sub domain can change at each temporal iteration depending on the velocity direction A dynamic mechanism has then been devised for the definition of the sending and receiving nodes The integration of the implicit equation for the elevation at Step 3 poses different problems from the point of view of the parallel implementation of the algorithm The CG is parallelized in a genuine domain decomposition way the matrix the right hand side and the unknown vector are distributed among the processors and the matrix times vector and vec
25. these last case is distinguished storing indices of the new elements with negative sign etype The major drawback of this procedure is that when re building the mesh different indices in the old and new mesh can be given to the same triangle To avoid this problem we conceived a procedure of identification of the elements that on the basis of an integer permits us to locate the position and the relatives with the other elements In fact since the background grid is never modified all the NELI elements of the initial grid can be indexed in a unique way At the end of the first step of refinement since a single triangle can generate only four two or zero new triangles the mesh can have at maximum 4 NELI elements It is possible then to identify in a unique way every triangle created first level triangles dividing a triangle of the background grid zero level triangles assigning to it a number between NELI 1 and 5 NELI Similar considerations 64 can be applied to triangles ot the second level obtained from a triangle of the first level These triangles will be identified using an integer between 5 NELI 1 e 21 NELI and so on for the other levels of refinement Every time a new P1 node is created all the physical fields are interpolated in the new position and the type of the node is determined When the refinement procedure ends with the eventual creation of green elements the P2 nodes are created and they are queued to P1 nodes Every
26. water elevation and velocity computed on the finest mesh and the values computed on the adaptively refined meshes The figs 26 27 show plots of the error defined as n ct eee 89 where 0 is the reference solution computed on the finest grid The absolute value of the velocity error is considered not to overestimate the contribution due to the stagnation 47 Figure 22 Velocity field computed on the background mesh Figure 23 Refined mesh obtained using the error indicator er 48 Figure 24 Refined mesh obtained using the error indicator ez Figure 25 Refined mesh obtained using the error indicator e3 49 points Generally speaking in all the simulations the bigger error is located around the island that is the region mostly refined in all the the adapted meshes Moreover the results plotted in figs 26 27 show that both globally and locally the error indicator e3 performs better in computing the water elevation and gives better global results in computing the velocity field error indicator v m s v direction rad Table 1 Average error of the computed solution The maximum relative error in the water elevation is 5 and is located behind the island this value is lower than 8 and 10 obtained by the other criteria In particular the error indicator e show large
27. we impose a Dirichlet condition on 7 assigning it a given value usually zero At the outlet we impose the value resulting from the integration of the advective term 8 Parallelization Strategy In the numerical scheme described above two computational kernels can be recognized For equations 50 52 since all terms but q t are evaluated at previous time levels the finite element formulation of equation 50 gives Mart g 70 where g is a known vector and M is the finite element mass matrix i e Mi dibd 71 being the set of nodal shape functions which provide a basis of piecewise linear polynomials By adopting a mass lumping technique 3 the solution of the Step 2 is explicit and therefore the main computational effort reduces to the following tasks 1 Lagrangian integration of the convective term as defined by equations 55 56 2 Solution of the elliptic problem defined by equation 53 8 1 Mesh Partitioning The strategy devised for the parallelization of the above listed computational kernels is based on domain decomposition This technique exploits the topology of the problem partitioning the computational domain into subregions The fact that we are dealing with unstructured meshes poses some additional problems for the implementation of the parallel algorithms both for defining properly the decomposition into sub domains and for the definition of an efficient communication scheme However
28. SWEET User Manual Version 2 0 S Corti M Marrocu L Paglieri L Trotta ENEL Polo Idraulico e Strutturale Milano CRS4 Cagliari October 1997 Contents I Physical Modeling 1 The 3D Navier Stokes Equations 2 Shallow Water Model 2 1 Continuity Equation oaoa 2 2 Momentum Equation 0 ee 3 Turbulence Modeling 3 1 The k e Turbulence Model for Shallow Water Equations II Numerical Algorithms 4 The Shallow Water Equations 5 The Numerical Scheme for the SWE 5 1 Lagrangian Scheme for the Convective Terms 0 0 5 2 Imposition of Boundary Conditions 020004 5 2 1 Open Boundaries 0 0 0 0202000000 ee ee 5 2 2 Close Boundaries 1 2 0 0 ee ee 6 Numerical Scheme for the k e Model 7 Transport of a Passive Tracer 8 Parallelization Strategy 8 1 Mesh Partitioning 2 2 8 2 Lagrangian Integration of the Convective Termo 8 3 Parallel Solution of the Linear System 8 4 Additive Schwarz Preconditioning for the Elliptic Problem 8 5 Coarse Grid Correction oa ooa ee 8 6 Parallelization of the k model 02000 4 8 7 Parallelization of the Transport of the Scalar Tracer 0 9 Parallelization Implementation Details 10 Mesh Adaption 10 1 Error estimate and error indicators L22222 10 2 The mesh refinement technique 220 10 2 1 Pre refinement and mesh enhancement 11 Test Cases 11 1 Jet in a
29. agnitude of the velocity and the local mass conservation in a specific sense Every error indicator has 29 a mathematical basis or is suggested by numerical or physical reasons The performance of these error estimators is discussed together with the numerical results in the last section of the paper 10 1 Error estimate and error indicators The mesh adaption technique requires some a posteriori estimate of the error of the numerical solution based on the computed solution itself it is necessary to state locally how much the numerical solution differs in a proper sense from the exact solution of the differential problem In this section we propose a few ways to determine where the computational mesh should be refined or coarsened 1 For linear elliptic problems it is possible to estimate rigorously the numerical error in terms of the second derivative of the exact solution Let u be the exact solution of the elliptic problem Vu v that is in weak form Vu Vv 6 0 80 for all v belonging to a suitable space and where indicates the usual internal product in L Then given a triangulation with maximum side length A it can be shown 27 that the distance between the exact and the computed solution linear u in H is bounded as follows V u ur V u un V u un lt h max 81 u J i j As the computational kernel of the numerical scheme adopted in section 4 is an elliptic equation for
30. artitioned in several regions All the explicit integrations act on the first level of sub domains The Schwarz algorithm acts locally on the second level of sub domains The global CG uses the same data sets of the Schwarz algorithm In dark grey the overlap regions for the first level sub domains in light grey the overlap regions for the second level sub domains 27 Using a large number of sub domains for the Schwarz algorithm may produce a con siderable reduction of computational time In fact the solution of many small linear systems can be faster than the solution of few linear systems of larger dimension The distribution of the matrix and vectors for the global CG is made at the second level of sub domains in this way the same data structures are used for carrying out both the global CG and the Schwarz preconditioning thus optimizing the memory requirements This choice greatly complicates the managing of the communications for the global CG To gain the maximum efficiency we have set up a scheme that uses implicit communications via common blocks when the two communicating sub domains reside on the same processor and explicit message passing instructions when the communication involves two different processors Moreover to minimize the latency time all the messages from sub domains residing on one processor that are to be delivered to sub domains all residing on another processor are first collected in a buffer and then sent with a un
31. astal regions The eddy viscosity is usually tuned by comparison of the numerical results with experimental measures For numerical reasons it is necessary that the eddy viscosity is not zero when mass matrix lumping is adopted Remember that if the amp e model is used turbo 1 then the value for the eddy viscosity is changed through the calculation according to the model How do the results of the simulation depend on initial conditions The initial conditions are often unknown in real life simulations When one is interested in the steady state behavior any initial condition can be used unless it is not so extreme to yield to instabilities in the computation When considering unsteady flow for instance because the boundary conditions change in time usually the only reasonably initial con dition is to start from the steady state solution related to the boundary conditions at time zero Which is the minimum tolerance of the residual to get acceptable results The residual compared with tol is the maximum value of the absolute value of the residual As an error of 1 can be usually accepted in this kind of simulations that involve uncertainty for a lot of physical parameters and assumptions in the chosen model it is suggested to imposed a value of tol which is one thousandth of the typical value of the unknown For instance if the elevation of the water is expected to change in the computational domain for few centimeters a tolerance
32. at the condition number of the matrix MTA is bounded as cond M7 A lt CL 1 B7 76 where C is a value independent from H and In 2D problems H is proportional to the number of sub domains and therefore this estimate reveals a deterioration ot the quality of the algorithm with the increase of the number of sub domains This inconvenience can be removed by introducing a coarse grid operator Let s Ay be the matrix arising from the discretization of the elliptic problem on a coarse grid whose element size is of the same order of magnitude of the sub domains Then we can replace M by M defined as 24 My RIA Ra M7 77 Here R is the prolongation map from coarse to fine grid given for example by a piecewise linear interpolant from coarse grid nodes It can be shown that cond M7 A lt C 1 87 78 where again C is independent from H and 6 Thus the preconditioning property of the operator Me does not depend on the number of sub domains but only on the amount of the overlap between them The coarsening of an unstructured grid can be a non trivial task Therefore we have investigated a different procedure for the construction of the coarse grid operator Ay by resorting to an agglomeration technique similar to that introduced in 13 14 in the context of multigrid procedures We consider Ry so that Ay Ry ARL 79 where t fjEeQuU Q Rr 0 otherwise The construction of Ay is thus a completel
33. been executed For the other programs the output and error go to the shell from which the PVMe console has been started Note that the two shell can be the actually the same this is the normal situation so all the output will be mixed together Usually PVMe puts an indication of the node that gives a particular output by writing the name of the node before the message 81 References 1 10 11 12 13 Agoshkov V I Ambrosi D Pennati V Quarteroni A and Saleri F Mathemat ical and Numerical Modelling of Shallow Water Flow Computational Mechanics 11 1993 Ambrosi D Corti S Pennati V and Saleri F Numerical simulation of the unsteady flow at the Po river delta J of Hydraulic Engineering ASCE to appear Quarteroni A and Valli A Numerical approximation of partial differential equa tions Springer Verlag 1994 Walters A Numerically induced oscillations in the finite element approrimations to the shallow water equations Int J Numer Meth Fluids 3 1993 Zienkiewicz O C and Taylor R L The finite element method Mc Graw Hill 1989 Benque J P Cunge J A Feuillet J Hauguel A and Holly F M New method for tidal current computation J of Waterway Port Coastal and Ocean Engineering 108 1982 Pironneau O On the Transport Diffusion Algorithm and its Application to the Navier Stokes Equations Numerische Matematik 38 1982 Cai X C A
34. can be obtained through algebraic models or through the solution of one or two equations which describe the temporal and spatial evolution of some quantities related to the turbulent viscosity We have chosen the most classical between the two equations models the so called k e model to be implemented in the SWEET code This model determine the turbulent viscosity through the evaluation of two quantities the turbulent kinetic energy k and its rate of dissipation e This is accomplished through the solution of two coupled advection diffusion equations Appropriate conditions for k and e on closed and open boundary have to be tuned in a suitable way as explained in the following paragraph 3 1 Thek e Turbulence Model for Shallow Water Equations We will not derived the formulation of the k e model for the SWE an excellent intro duction being easily found in 21 The Reynolds averaged shallow water equations in conservative differential form read DI Lv gg h Y v Va Va AV 96 2h Ot gal 38 Imp 20 xq og 1 1V q 0 39 a t Va 39 where q x y dx qy is the unit width discharge is the elevation over a reference plane h is the total depth of the water v is the kinematic viscosity about 1075 m s7t for water v is the turbulent viscosity computed as k2 Vi Cy E g is the gravity acceleration Q is the angular velocity of the earth K is the Strickler coefficient 10 Equatio
35. ce in few seconds all the input files needed by the program without any other external intervention all rhs dat graph partition dat_mtrx_m01_p001 tmp pa_mesh rhs dat mesh matrix graph mesh before partition graph 73 mesh __T datin ba BEFORE MATRIX raph dat METIS rc run_vars artition Fi a CA AFTER SETUP dat_ tmp pa_mesh datin bati elev time spy nodes SWEET Figure 34 Relations between the input files and the pre processing programs for the SWEET code Programs are represented in boxes plain names refers to input output files 74 chaco dat_mtrx_m01_p001 tmp partition rc run_vars setup pamesh partition after 17 The Parallel Run The SWEET code use the PVM message passing library so the PVM procedure for running a PVM code must be followed See Appendix A for a PVM resume It is not possible to run the parallel code on a single processor The minimum partition is composed by 2 first level subdomains and 2 second level subdomains running on two processors It must be kept in mind that the number of first level subdomains must be equal to the number of processors and that the number of second level subdomains must be equal or greater than the number of processors Before running a long simulation the behaviour of the selected combination of para meters both physical as well numerical must be checked The speed
36. conservation criteria performs better in the considered test case yielding always to lower error in a global sense The reason of this behavior is probably 50 700 600 500 400 300 200 100 0 700 600 500 400 300 200 100 0 700 600 500 400 300 200 100 0 ade Nel ao Figure 26 Relative error in the water elevation value obtained by using the refined grids of figures 23 25 1000 1000 1000 51 2000 2000 2000 10 9 8 71 6 5 4 3 2 1 0 10 9 8 71 6 5 4 3 2 1 0 10 9 8 71 6 5 4 3 2 1 0 700 600 0 63 m s 0 56 m s 500 0 49 m s 0 42 m s 400 0 35 m s 300 0 28 m s 0 21 m s 200 0 14 m s 100 0 07 m s 0 00 m s k 1000 2000 me 600 0 63 m s 0 56 m s 500 0 49 m s 0 42 m s 400 0 35 m s 300 0 28 m s 0 21 m s 200 0 14 m s 100 0 07 m s 0 00 m s a 1000 2000 io 600 0 63 m s 0 56 m s 500 0 49 m s 0 42 m s 400 0 35 m s 300 0 28 m s 0 21 m s 200 0 14 m s 100 0 07 m s 0 00 m s n 1000 2000 Figure 27 Absolute error in the magnitude of the water velocity obtained by using the refined grids of figures 23 25 52 that only mass defect check involves both the unknowns velocity and elevation of the problem 11 6 Automatic Mesh Adaption unsteady state case To further investigate the performance of the error indicators and of the adaption proce dure in the case of unsteady flow we chose a case of circulat
37. d P2 is the set of piecewise quadratic functions on triangles The choice of these interpolation spaces first suggested in 4 eliminates the spurious oscillations that arise in the elevation field when a P1 P1 representation is used To knowledge no theoretical explanation of incompatibility between spaces of representation of the unknowns has yet been stated theoretically for the SWE The main advantage of this fractional step procedure is that the wave traveling at speed Vgh is decoupled in the equations and treated implicitly Therefore the CFL condition due to the celerity is cheaply circumvented Moreover as the Lagrangian integration is unconditionally stable and all the terms appearing in eq 50 are discretized implicitly the resulting scheme is unconditionally stable A drawback of a fractional step scheme as the one adopted here is that this scheme is only first order accurate in time However this is not an actual disadvantage as the model deals with tidal phenomena that vary slowly in time From a mathematical point of view in this fractional step framework one requires a priori that the boundary conditions to be satisfied by the collection of fractional steps coincide with the boundary conditions to be satisfied by the original differential system as described in Section 4 Unfortunately at Step 3 the solution of the elliptic equation 53 requires the imposition of proper boundary conditions for the elevation on the whole boundar
38. daptivity and paral lelization In Unstructured grids methods for advection dominated flows AGARD R 787 1992 C Johnson Numerical solution of partial differential equations by the finite element method Cambridge University Press 1987 83 28 29 30 31 32 33 R Sacco and F Saleri Mixed Finite Volume Methods for Semiconductor Device Simulation Numerical Methods for Partial Differential Equations 3 13 215 236 1997 T Utnes Finite element modelling of quasi three dimensional nearly horizontal flow Int J Numer Meth Fluids 12 559 576 1991 V Casulli and Cheng R T Semi implicit finite difference methods for Three dimen sional shallow water flow Int J Numer Meth Fluids 15 629 648 1992 V Rudiger A Review of A Posteriori Error Estimation and Adaptive Mesh Re finemnt Techniques Wiley Teubner 1996 L Formaggia and F Rapetti MeSh2D Unstructured Mesh Generator in 2D version 1 0 Algorithm overview and description CRS4 Cagliari February 1996 A Balzano R Pastres and C Rossi In Coupling hydrodynamic and biochemical mathematical models for lagoon ecosystem XI Int Conf on Comp Meth in Water Resources Cancun Mexico 1996 Various Handbook of Grid Generation CRC Press 1997 R S Chapman and C Y Kuo Application of the two dimensional k e turbulence model to a two dimensional steady free surface flow problem with separation Int J Numer Meth Fluids 5
39. different test cases The sequential algorithm has been ported in the parallel computing framework by using the domain decomposition approach The Schwarz algorithm has been added to the scheme for preconditioning the iterative solution of the elliptic equation modeling the dynamics of the elevation of the water level The performance of the parallel code are evaluated on a large size computational test case The structure of the code is explained by a description of the role of each sub routine and by a flowchart of the program The input and output files are described in detail as they constitute the user interface of the code Both input and output files have a simple structure and any effort has been made to simplify the procedure of the input setup for the parallel code and to manage the output results The PVM message passing library has been used to perform the communications in the parallel version of SWEET A short introduction to PVM is added at the end of the present report The SWEET package is the results of a joint work between CRS4 and Enel Polo Idraulico e Strutturale The authors of this document kindly acknowledge the valuable contributions of Vincenzo Pennati from Enel Polo Idraulico e Strutturale and of Luca Formaggia Alfio Quarteroni and Alan Scheinine from CRS4 This manual is an extension and revision of the SWEET User Manual Version 1 0 1996 The author of the former document as well as of the largest part of the
40. ds to the mesh plotted in fig 23 It can be seen that the shear layer at the left corner of the inflow of the smaller channel is not devised by the error indicator as a zone to be refined The channel enlargement the right corner between channels and the bottom jump are slightly refined but most of the new triangles are posed around the island mainly behind When using the error indicator e2 the secondary channel inflow the bathymetry slope 46 and the region around the stagnation point are not detected as zones to be refined The refinement is deserved for regions around detachment points that is zones with larger velocity shear The error indicator 3 based on local mass conservation refines all the regions that the other error indicators detect one by one The global mass error performed by the coarse initial mesh defined as the difference between the inflowing and the outflowing water was about 3 By using the grid refined as driven by the e 4 indicator the mass defect is reduced to 1 Figure 21 Background mesh To get a quantitative evaluation of the quality of the adaptively refined meshes the shallow water code has been run on a grid obtained refining uniformly three times the background grid up to a final number of elements which is 64 times the initial one The numerical solution computed on the finest grid is then taken as the reference one the truncation error is evaluated comparing the
41. e discrete case insures positive value of k and e For physical and mathematical reasons it is essential that the system of PDE yields positive values for k and e The final scheme for the equations 3 and 4 is then kot 1 ar AtV w pV Rt k X ALP ALP 67 ent 1 Ate MV 4 ve eX ALE pn AtP 68 kn Cu Cu k with n kn Ve Cy an 7 Transport of a Passive Tracer A passive tracer is a quantity that is transported by the velocity field of the fluid but that does not affect the fluid motion itself It can represent a concentration of a pollutant or a thermal field where the effects due to the variation of density are neglected The equation describing the evolution of the tracer is thus of the form advection diffusion with the possible presence of source terms a u VT iv hy VT Sr 69 where T is the vertically integrated value of the tracer u u v is the velocity field of the fluid A is the water depth vr is the diffusion coefficient of the tracer and Sy is the source term 19 This equation is solved in SWEET through the usual algorithms used for the other equations that is a lagrangian integration for the advective term and an implicit formu lation for the diffusive term giving rise to a linear system solved through a conjugate gradient algorithm Boundary conditions are imposed only on open boundaries where the water enters or leaves the domain At the inlet
42. e of many vectors of the code SWEET is explained in the code However for sake of simplicity we list here a short description of the principal vectors that are introduced in the main of the code The names of the vectors follow the standard Fortran rule 60 INITIALIZATION DO 1 niter LAGRAN lump1 1 lumpl 0 i turbo 4 Keps_impl j STEPS STEP2L Y STEP3XI lump2 0 lump2 1 STEP3Q STEP3QL Scalar_impl ali I REWRITE END Figure 32 Flowchart of the code SWEET 61 the names beginning with i j k l m n are integer 4 the others are real 8 All the physical quantities are measured in SI unit system x k j slmax j slmin j n0 3 lne k j nen j lee k i nel i area i dphi1 k j i dphi2 k j 1 i reint i j phi01 i j amasi j k i amas2 j k i aloc j k vn k j alump j h j xi j xiold j xipo j v k j qx j ayj k th coordinate of the node j maximum side length of the triangles surrounding the node j minimum side length of the triangles surrounding the node j depth of the bottom in node j list of the elements surrounding the node j number of the triangles surrounding the node j list of the elements surrounding the triangle i number of the elements surrounding the triangle i area of the element i derivati
43. eNy yne bey byns 66 In the code the first condition is imposed on the qs variable and the second on the q variable if nz gt ny and the opposite is done if n gt ne This procedure ensures positivity of the diagonal terms of the matrix Please note that condition 65 imposes a null normal velocity component while equation 66 solves the tangential component of the velocity 18 6 Numerical Scheme for the k Model The numerical approximation of the shallow water equations for turbulent flow is almost the same that was originally developed for laminar flow and described in Section 5 We have adopted a fractional step method where for turbulent flow it has been decided to adopt only implicit discretization because by definition turbulence modeling is used for flows where diffusive effects play a relevant role The convection of turbulent quantities is carried out by lagrangian integration in analogy to what is done for momentum transport The source terms are discretized implicitly or explicitly depending on their sign This procedure known as semi implicit scheme 1 reduce the cost of the fully implicit scheme the idea is to split the terms of order zero into their positive part and negative part and treat implicitly the positive terms and explicitly the other ones Then all the terms on the left hand side are positive and so are all the terms on the right hand side The maximum principle for PDE in th
44. econditioning gives better performance with respect to the sequential ILUT algorithm when more than three processors are used 11 4 Abrupt enlargement of a channel the k e model To validate the k e model described above it has been chosen the test case discussed in 35 corresponding to the flow in an abrupt enlargement of a prismatic channel The geometry of the problem may be seen in fig 1 it has been supposed that the channel is 4 meters deep and the incoming flow has unit width discharge of 3 square meters per second The flow is expected to separate at the edge of the enlargement with consequent recirculation The most interesting aspect to be compared with experiments is the length of the re circulation zone the reattachment length for a geometry as the one showed in in fig 15 should be slightly less than six times the width of the enlargement if the ratio ot the width of the former part of the channel over the depth is nearly one 36 It is worthwhile 42 to remark that such an estimate assumes no dependence of the reattachment length on the roughness of the channel and on the inflowing velocity The velocity field plotted in fig 17 shows that the computed solution has a reattachment length which is smaller then the expected one This is accordance with the results showed in 35 The use of wall functions at least when using a mesh not stretched at the boundary yields to a negligible production of turbulent quantities a
45. eld 36 wea Hire 2 ra a where H and y are constant values it can be easily verified that the equation 86 has the following exact solution h w H yr q e Q 88 This test case allows for the verification of the accuracy of the scheme when a strong gradient is present in the bathymetry as often occurs in rivers The computation for this test has been carried out using an inflow depth H 4 an inflow unit width discharge Q 4 and a bottom slope y 0 06 Although this test is essentially one dimensional an analogous case for the 2D code has been run on a channel 300 m long and 4 m wide The computational mesh is almost regular it is composed by 580 elements and a detail of it may be seen in Figure 7 EDRIC Figure 7 A detail of a regular mesh used in test case 2 For the boundary conditions the value of Q has been imposed at the inflow whilst the value of has been imposed at the outflow Figure 8 contains two graphs a plot of the exact elevation versus the computed elevation and a plot of the computed unit width discharge The former plot evidences the good accuracy of the scheme for the elevation as well as the fulfillment of the mass conservation property with a small loss of 0 5 for this mesh It should be mentioned that a very large number of time steps were required to reach the steady state solution since no mechanism to dissipate the spurious components of the initial conditions is present in th
46. ep 2 the Step 2 of the fractional step procedure in the case of lumped mass matrix see eq 50 computes the right hand side in the equation for discharge at Step 2 in the case of lumped mass matrix the Step 3 of the fractional step procedure for the elevation unknown see eq 53 computes the local stiffness matrix for elevation at Step 3 computes the right hand side term in the equation for elevation inserts the Dirichlet boundary conditions for elevation in the linear system the Step 3 of the fractional step procedure for the unit width discharge unknown when mass consistency is adopted see eq 51 computes the local stiffness matrix for discharge at Step 3 computes the right hand side in the equation for discharge at Step 3 introduces the no slip boundary conditions at the wall in the linear system introduces the free slip boundary conditions at the wall in the linear system inserts the Dirichlet boundary conditions for discharge in the linear system 58 Step3 lumping STEP3QL Passive scalar SCALAR_IMPL SC_LOCSTIF SC_RHS SC_DIRBOUND Common routines CGPREC BICGPREC GLOSTIFF Output SPYSTORY BOUNDISCH REWRITE Mesh Adaption INIVARS BUILDSTRUCTURES ERRESTIMATE ADAPTION MARKGRID STOREOLDXYAND FIELDS ADAPT REFINE DEREFINE UREFINE the Step 3 of the fractional step procedure the unit width discharge unknown when mass lumping is adopted see eq 51 the passive scalar model local st
47. ere A R ARP 75 t the parallel conjugate gradient algorithm preconditioned with the Additive Schwarz method then reads 1 To b Axo zo M To Po Zo 23 a rj 2 Ap pi inter process communication Tjt1 Lj Oj riga Tj Oj Ap zia MU ra By 1412541 1 2 inter process communication Pisa zia Bip end do The Schwarz preconditioner is an attractive choice for parallel computations because of its locality it does not require any exchange of information between sub domains Moreover because of the definition of the restriction operator R the elements of the matrix M are identical to the ones that are in the matrix A so that no specific storage is required for the preconditioner Finally it may be noted that the local subproblems to be solved at the preconditioning level are always well posed because they can be seen at a functional level as the discretization ot a Poisson problem with homogeneous Dirichlet boundary conditions An open question refers to of the possibility of solving the local problems i e the Schwarz preconditioning in an approximate way We will discuss further this subject in Section 9 8 5 Coarse Grid Correction Some theoretical results are available from the analysis of the Schwarz preconditioner When using a regular grid of spatial step Ax partitioned into sub domains of linear length H with overlap size 6 BH it may be shown 8 12 th
48. formly refining three times the previous one fine3 solution These grids are the same we described in the previous section Using fine3 as reference solution we calculated the maximum relative deviation be tween the two solutions as b t t ene max a s fine3 fine0 91 ryt maxzyabs fine3 t where fine can be either or q The more interesting result is that the maximum deviation for the elevation is very small 1074 but that of the module of the discharge is not negligible 0 33 The fact that the deviation of is so small makes very difficult if not useless the location of 53 the elevation errors in this particular case The deviation of the module of the discharge is instead consistent but as already remarked absolute values of discharge in this test are very small These observations should highlight the capability of the chosen test case to assess the goodness of the error estimators and of the whole adaption procedure If with the refinement of the background mesh the numerical solution is converging toward the exact one and fine can be considered a good estimate of the exact solution then el can be thought as an estimate of the maximum relative error of the fine0 solu tion A qualitative proof of the convergence of the numerical solution has been obtained comparing the solutions fine0 and fine3 and those obtained refining the background mesh uniformly one and two times fine and fine solutions The
49. g the new mesh one containing the new bathymetry and a restart file are created The names of this files are mesh iter bati iter and restart tem iter where iter is the time step index The content of the last two files has already been described In the first file are contained all the informations of a usual mesh file and in addition informations about how to rebuild the final mesh starting from the background mesh The additional informations in the mesh iter file are given in brackets as follows ne nv2 nvi x 1 1 x 2 1 inod 1 x 1 2 x 2 2 inod 2 mc nv2 x 2 nv2 inod nv2 lel 1 1 1e1 2 1 1e1 6 1 ETYPE 1 E_ID 1 lel 1 2 lel 2 2 1el 6 2 LETYPE 2 E_ID 2 lel 1 ne lel 2 ne lel 6 ne ETYPE ne EID ne DO AAAI AR ASK oo IC 1 4 1 24 21 1 21 1 24 21 21 o aK kk kkk END OF STANDARD MESH FILE WHAT FOLLOWS IS THE HISTORY FILE LEO COO I kkk kk KKH NELI NVI NELF NVF LAST_LEVEL PHISTORY O 0 0 THIS IS THE 1 TH LEVEL OF REFINEMENT phistory e1 e2 e3 e4 ei_id e2_id e3_id e4_id o 0 phistory THIS IS THE 2 TH LEVEL OF REFINEMENT phistory e1 e2 e3 e4 e1_id e2_id e3_id e4_id 71 where the ETYPE i is the type of the i th element and E_ID i its identifier NELI and NVI are number of elements and of P1 nodes in the background mesh NELF and NVF are number of elements and of P1 nodes in the new mesh PHISTORY is the pointer to the last element contained in the history tree phistory is t
50. gence indication for the Schwarz Additive preconditioner are added in every proc output file 4 As with warn 2 for a sequential run Detailed and heavy indications for the Schwarz Additive preconditioner are added in every proc output file Output on File The output of SWEET is condensed in a unique unformatted file This file will be written every rewrite iterations with the name restart tem lt iteration_number gt This file contains 70 the physical time at which the values refer to nv2 values of the x component of the discharge qx nv2 values of the y component of the discharge qy nvl values of the elevation xi If tracer 1 nvl values of the passive tracer concentration If turbo 1 nvl values of the k quantity If turbo 1 nvl values of the e quantity In order to visualize all these quantities and other of interest an interactive interface program has been written named sweetmtv As its name suggests this program writes the data of the restart tem file in the MTV format and it visualizes them through the PlotMTV 20 program In the case of unsteady computations SWEET also produces some output relative to the values of the solution as computed in some reference nodes that have been listed in the spy nodes file This output is contained in files that are numerated progressively starting from 1001 and on 1002 1003 When the adaption is used tre f gt 0 every time the mesh is changed a file containin
51. generates at the bound aries of a refined area elements of poor quality When pre refining the initial mesh such drawback can be avoided improving the overall mesh quality by means of an algorithm known as Laplacian smoothing Given a generic node P ot the mesh not in the boundary we will call patch around P the polygon formed by all the elements which have this point in common The information about elements which constitute the patch is contained in the structure VVER see section 14 Smoothing of Laplace consists in moving every internal point of the triangulation to the baricentrum of the patch around this point This can be done without problems when the patch is convex When instead the patch is concave the algorithm must be modified in order to avoid that the node movement would produce an inconsistent triangulation The modified Laplace Smoothing used in the pre refine module is described in detail in 32 Refining and coarsening When designing a practical strategy of mesh refinement coarsening it would be very useful to state first an acceptable numerical error and then use it as a yardstick coarsen the mesh where the error indicator is lower than the reference one refine when larger Unfortunately the error indicators described above only give at best estimates ot the numerical error or hints about where the error is larger they do not ensure any absolute evaluation of its magnitude When computing steady flows this diff
52. good partitions are created by the Spectral Bisection and by the Kernighan Lin algorithms The Chaco program works using informations on the graph of the mesh To build the input file for Chaco we must first run a program called before which creates a file called graph in the Chaco input format The program before reads as input the datin and the mesh input files of the scalar code The Chaco package creates a file whose name must be partition The same file is read by the program after which creates the input file for the parallel SWEET code analogous to the mesh file for the sequential version of the code The file is thus called pa mesh with clear significance The pa_mesh file contains indication only on the first level of subdomains the one on which act all the explicit steps To create the second level of partition and the relative informations we must use different preprocessing codes First of all we must use the code matrix which creates the files to be used by the subsequent program setup The program setup reads the number of subdomains for the second level partition in the file datin 72 The program setup creates a series of input files for the different processors involved in the calculations The total number of these files depends on the number of processors and their size depends on the total number of sub domains When a large number of sub domains is requested the time needed by the setup code can be quite long up to severa
53. ground mesh STANDARDREFINE standard sub division of an element GREENREFINE green subdivision of an element GREENREREFINE a green element is replaced with one standard refined MAKEGREEN a standard element is green refined midpoint of an edge is joined with the vertex opposite to it STANDARDEREFINE central triangle of a standard refined element is deleted and replaced by the father GREENDELETE a green pair of triangles is replaced by the father GREENDEREFINE replacement of a standard refined element with the pair of green triangles which have generated it In addition to the routines listed above the parallel version of the code has several new routines Mainly they are parallel versions of corresponding sequential routines in this case their name is simply pa_ or _pa where stands for the name of the sequential routine The parallel code has also some new routines related to the Schwarz preconditioner for the conjugate gradient at Step 3 These routines will not be explained in detail PA_MAKELIST build the communication lists COMM ensemble of communication routines PARINIT initializes the parallel environment MULT_TEST_RUN access the parallel CG code substituting PA_CGPREC 12 1 Flow Chart In Figure 32 it is shown a schematic flow chart of SWEET sketching the structure of the main program The figure refers to the sequential code being the flow chart of the parallel code very similar 13 List of the Vectors The rol
54. h l d 3 6 Ve Ckp TE an Cep 9 037 kp VG p ols H c is the coefficient of friction that we have chosen to deduce from the Strickler formula o g AT Risk and U is the friction velocity at the bottom equal to U c u v 11 The form ot the model which has been presented above is valid only for fully turbulent flows Close to solid walls there are inevitably regions where the local Reynolds number of turbulence measured with yt is so small that viscous effects predominate over turbulent ones A special treatment is required in order to obtain realistic numerical predictions In SWEET the simple but efficient effective viscosity wall function approach has been taken Considering the existence of a local turbulence equilibrium at the solid boundary such that production by shear stress on the boundary and on the bottom is equal to dissipation the value of k and e at a distance from the solid wall are given by _ l x k ou aa U K 44 3 60 Cy l P 45 E ou Ir 45 where u v avr is the component parallel to the wall of the shear velocity These wall boundary conditions are valid at a distance 6 from the wall such that the local Reynolds number defined as u yt V is such that yt 20 100 Following the Hinze s hypothesis the condition for the parallel component of the velocity at the wall is vir u 1 lost v 46 where y 0 41 is the von Karman constant and
55. h discharge qymax maximum value of the magnitude of y component of the unit width discharge cflmax maximum value of the velocity CFL number frmax maximum value of the Froude number In the last row of the text displayed on the video the discharges referring to the open boundaries of the domain are listed Namely they are ordered in such a way that the discharges relevant to the 20003 20004 20005 20006 20007 20008 type bound aries are listed in the first row the discharges relevant to the 10003 10004 10005 10006 10007 10008 type boundaries are listed in the second row The value on the last row refers to the sum of the all discharges Negative values indicates flow entering the computational domain It is possible to have additional informations on some numerical and computational features of the code this is done by setting the warn flag value in the datin file according to the following table OUTPUT All the indication written above plus averaged execution times Parallel run will give also confirmation of the correct starting of all the processors on standard error unit 1 As with warn 0 plus an indication at the end of every temporal iteration on standard error unit 2 As with warn 1 plus execution times for every temporal iteration Warning on any pathline out of boundary are also given on standard error In a parallel run every processor will produce an output file 3 As with warn 2 for a sequential run Conver
56. h element and s is the mass defect in the e th element 10 2 The mesh refinement technique Any error estimator among the ones described in the section above allows to identify a set of elements of the mesh to be refined or coarsened Several techniques can be used 31 to this aim 26 1 Repositioning of the mesh r methods local refinement of the mesh is obtained moving nodes in order to minimize the distribution function of the error Of course this local refinement generates de refinement in the remaining part of the domain Since topology of the mesh does not change during repositioning this strategy is easy and cheap to implement the connectivity of the grid is unchanged Nevertheless it is a strategy not often used because of the constrain of using a fixed number of elements 2 Enrichment of the mesh h methods the triangles of the grid are divided in elements of lower average side length h As the error of the numerical solution behaves like ch with c and A constants these methods if used in a proper way ensure power convergence For this reason h methods are most popular although their implementation is more complex because at every subdivision the topology of the mesh changes The splitting of the elements can be made essentially in two ways 1 edge bisection midpoint of an edge marked to be refined is joined with the vertex opposite to this edge 2 standard refinement a marked triangle father is
57. h node and at each integration step the computation of the area of some triangles for each mesh node This formulation for the Lagrangian integration of the convective term does not require any matrix inversion It is then a local operation which needs for each node some information about the old solution in a cluster of elements around it The Lagrangian integration of the convective terms can naturally be performed in parallel in a domain decomposition framework Each processor carries out the integra tion in the nodes belonging to the sub domain assigned to the processor itself As the pathline can exit the sub domain care must be used when dealing with the nodes posed in proximity of its boundary The Lagrangian integration of the nodes belonging to the overlap region must be computed by the processor which succeeds in the reconstruction of the pathline With regard to the nodes in the proximity of the overlap a limit on the time step is dictated by the requirement that their pathline does not exit the sub domain to which they belong This means in practice that the CFL velocity number should be smaller than 2 v maxg At lt 2 72 ok ar lt 72 Such a condition is not restrictive in practical applications where the wave celerity is usually much larger than the fluid speed 8 3 Parallel Solution of the Linear System Equation 53 can be seen as a particular case of elliptic differential problem of the form 21 Figure 2
58. he horizontal velocities can be written as u x y 2 t ua x y t u x Y 2 1 15 v x y z t valz y t v y z t 16 where E o u x y 2 t dz 0 17 ho E J v x y z t dz 0 18 ho We recall the Leibnitz differentiation rule that will be useful in the next rel al Of x y Oa x Ob x Ox tro f e y dy he dx y P 2 a e dx Ma 2a Ox 19 2 1 Continuity Equation Integration of the continuity equation along the vertical yields du dv wle wl J Da E J ay ag ae Olho p oE alho z udz uez ulho Da Dili ode lea v ho Dy 20 Using the boundary equations 8 9 and simplifying we find OE hu 0d hv lt gt 21 ot Ox Oy 0 21 Note that only the assumption of constant density has been used to derive expression 21 2 2 Momentum Equation We integrate the horizontal momentum NS equations along the vertical considering each term separately we get Ou Lae il e ules 22 E Pl dz Da dra uu dz uu uu Ato 23 E na dz E uv dz uo uv ho A 24 ma dz uw e ww ng 25 Summing up and using the boundary conditions 8 9 these terms reduce to r 0 E o re ah ttg tga 26 Introducing the unknowns defined in 15 16 we get AA uet f tata de f ud gt f undt f uv dz 27 To close the problem we do the following additional hypothesis the sum of the terms involving u v p
59. he j th element estory i average of the error estimate at time step i In the parallel version of the code the two components of the discharge vectors are collected in a unique vector called qxy Nodes belonging to overlapping elements have negative inod values 14 Data structures When implementing algorithms of grid refinement and de refinement an important role in determining computational efficiency is played by the data structures used In particular when implementing the algorithm described in Section 10 2 one has to use efficient structures to determine which elements are adjacent to a given one which edges delimit it and which side has given vertices This information can be obtained using complex structures such as lists trees tables etc 34 Here we chose to use structures see Table at the end of this section as simple as possible taking into proper account the computational efficiency In the discussion which follows a general survey complete but deliberately simplified of the whole adaption module is given Information about which nodes must be refined is contained in a vector of integers named IERR NEL If IERR i gt 0 element i will be refined in standard way creating a new node at midpoint of each of the edges of the element in hand Coordinates of these new nodes are queued to the vector XV and the number of nodes is incremented As one or more of these nodes could be already created when refining an adjacent element
60. he pointer to the current element in the history tree e1 e2 e3 e4 are indices of the four elements obtained from a standard sub division of an element and ei_id e2_id e3_id e4_id are the corresponding identifiers For a more detailed description of these variables see sect 14 16 The Parallel Setup In this Section we illustrate how to setup the input for a parallel run starting from the input for the sequential version of SWEET The parallel code needs much more informations than the sequential code so the input system is more elaborated However it has been followed the strategy to maintain some crucial input and output files like the restart tem restart files so to have the maximum compatibility with the sequential input 16 1 Partitioning the Mesh In order to partition the mesh we make use of a public domain software call Chaco 9 Other packages are available for example Metis 11 and TopDomDec 10 the user can freely chose the software he prefers It must be noted that Metis uses the same input and output format used by Chaco and so the two programs are freely interchangeable while TopDomDec adopts a different data format We will refer in the following solely to the Chaco package Any partitioning algorithm contained in Chaco can be used It is particularly hard to say if an algorithm works better than the others the behaviour depending much on the shape of the original integration domain As a tendency we have noted that
61. iculty is overcome stating first the computational resource that can be addressed that is the maximum number of nodes to be used in the numerical simulation Then starting with a quite coarse mesh it is refined until that the desired number of nodes is reached When dealing with unsteady flow also coarsening is useful In this paper we are addressing smooth flow that is subcritical shallow water flow or at most locally transcritical flow In such regime there are no discontinuities in the physical variables and typically the 34 time dependency is due to the change of boundary conditions which is smooth in time and has a period of 12 24 hours As time goes by the flowfield changes and a proper mesh adaption strategy should modify the mesh refining it in a optimal way with respect to the adopted error indicator In this framework instead of keeping constant the number of nodes as was the case of steady flow it is more significative to keep constant the maximum error indicator of the grid at any time This ensure a constant control of the error all along the simulation 11 Test Cases To validate the numerical scheme and its sequential and parallel implementation we consider different examples The first two test problems have been specifically designed to test the discretization of the nonlinear terms in the equations and the mass conservation property of the numerical scheme In addition they are used to test for the presence of spur
62. iffness matrix for the tracer rhs for the tracer boundary conditions for the tracer conjugate gradient iterative solver with diagonal preconditioning biconjugate gradient iterative solver with diagonal preconditioning inserts the local contributions into the global matrix writes the value of the unknown in the spy nodes if steady 0 computes the discharge through the open boundaries writes the restart file some variables used in the adaption module are initialized It must be called once before any other call build all data structures see section 14 one of the error indicators is used to estimate the error of the numerical solution is the main module that calls all the others an integer ErrorFlag based on the error estimate is assigned to each triangle of the mesh all fields and nodes coordinates are stored to be used to interpolate fields on the new mesh de refinement of triangles where ErrorFlag lt 0 refinement of triangles where ErrorFlag gt 0 refinement of triangles where ErrorFlag 0 de refinement of triangles where ErrorFlagF 0 all the triangles of the mesh are standard refined 59 INTERPOLATE All physical fields in the new P1 nodes are obtained using FIELDSON values on the old P2 nodes If a new P1 node was P2NODES not an old P2 node then the value is obtained with the proper interpolation ADDALLP2NODES all P2 nodes are queued to vel DEREFINEALL the mesh is completely de refined The final mesh will be the back
63. inement When executing first for block only elements which have an error greater than a fixed error threshold thresh are refined NHang i and EType i are functions which respectively return number of hanging nodes and type standard or green of the triangle i In the second for block triangles which have more then one or green ones which have at least one hanging node are standard refined If some element has been refined maybe some other hanging node has been created and for this reason this last block is re executed till possible In the last block MakeGreen green refines triangles which have an hanging node to ensure consistency 33 of the mesh This algorithm converges in a finite number of iterations at most all the elements of the initial grid will be standard refined The grid refined with the algorithm listed above has a minimum number of green elements which are the elements deteriorating the quality of the initial mesh 10 2 1 Pre refinement and mesh enhancement If some of the characteristics of the solution of a given problem is known a priori it is possible in principle to decide some sort of pre refinement ot the mesh For this reason a package named prerefine has been prepared which read the initial mesh in SWEET format and a vector of integers containing flags to the elements to be refined and re write a file in the same format with the new mesh pre refined As already mentioned the refinement technique here adopted
64. ion in a closed basin induced by a tidal current This kind of problem commonly studied using SWEs have a great environmental interest An example is the coupling of a SW code with a transport model of nutrients as nitrogen and phosphorus enable to recognize conditions leading to eu trophication and then to anoxic crisis in basins characterized by a long time of retention 33 The domain used is the same of the steady state case see Fig 21 this time the two inflowing branches to the left are closed and the elevation measured in meters and imposed on the right open boundary is given by the following expression 1 2rt 2rt E t g cos DI cos DI 90 This BC simulates the ideal case of a tide of about one meter of amplitude typical of the Mediterranean area with a period of 48h due to the superimposition of a period of 12h and one of 24h see Fig 28 Some general considerations can be made about the solutions of such a problem First of all the values of the elevation field will be very high almost all the time because they are imposed in all the domain by the amplitude of the tide m On the contrary the values of the velocity and therefore of discharge will be very small because the tidal current varies very slowly 12h To have an idea of the amplitude of the true error of the numerical solution we performed a run using the background mesh of Fig 21 fine0 solution and then another run using the grid obtained by uni
65. ious oscillations arising due to the boundary conditions The third example is a demonstration of the parallel performance of the code 11 1 Jet in a Circular Reservoir The first problem is the simulation of a steady jet in a circular reservoir the details of this classical test case as well as the experimental results can be found in 19 The geometry of the boundary and the computational grid are shown in Figure 5 We use an eddy coefficient v 2 5 1074m s and a time step of 2 s The computed velocity field is shown in Figure 6 The solution does not differ much from the one shown in 6 and 19 the location of the gyre centers are sufficiently well described but the maximum computed velocities in the gyres are underestimated mainly in the region near the inflow Such a discrepancy is due to three dimensional effects and not to the staircase boundary that characterizes the finite difference contours used in the cited references 11 2 Hydraulic Jump The second test case that we consider is the steady 1D flow in a prismatic channel without diffusion and bottom friction effects Under these hypotheses the SWE reduce to q Q 0 h he 85 d 2 dh dho T 5 HT I Te 86 where Q is the constant unit width discharge If the bottom field is defined as 39 Figure 5 Jet circulation in a reservoir computational mesh gt 0 05 m s Figure 6 Jet circulation in a reservoir velocity fi
66. ique send instruction The linear systems local to the sub domains resulting from the Schwarz algorithm are governed by the matrix A which is positive definite Therefore we have decided to use again a CG procedure preconditioned with an incomplete LU decomposition ILUT 15 for their solution On the contrary the coarse grid system is always solved exactly by inverting the matrix Ag We will refer to the CG iterations at sub domain level used to solve the Schwarz preconditioning system as the local CG Since the local CG has to work just as a preconditioner the solution of the subsystems would preferably be carried out with a low degree of accuracy This would correspond to use an approximation of the preconditioning system In practice however we have noted that the convergence rate of the global CG is heavily dependent on the accuracy of the local solutions We thus have coupled the convergence thresholds of the local and the global CG s so that the values of the local residuals are always lower than the current value of the global CG residual In this way we have obtained a good overall convergence rate The pseudo code of the parallel shallow water model thus reads as follows do i 1 number of time steps solve Lagrangian transport in each of the N sub domains exchange boundary conditions among the N sub domains solve Step 2 in each of the N sub domains exchange boundary conditions among the N sub domains
67. is computation 37 0 040 T T T 4 040 0 030 4 020 0 020 a N E A 5 T 2 g 40 gt oO 3 5 oO 0 010 5 aa 3 980 0 000 Peee0000000000000000001000000I 00000 e computed 0 010 3 960 50 0 100 0 150 0 200 0 250 0 0 0 100 0 200 0 300 0 x m x m Figure 8 Water elevation and unit width discharge for the 1D test case 11 3 Parallel Computation on a Complex Geometry The parallel code has been used for the simulation of the marine circulation in the Bocche di Bonifacio the strait separating the Corsica and Sardinia islands in the Mediterranean sea In this model it is assumed that no stratification occurs and that the water circulation is essentially driven by the tidal boundary conditions the wind stress and the Coriolis force This hydrodynamic problem is characterized by a substantial complexity of the geometrical data We have used a mesh composed of 75053 elements and 38303 vertices The total unknowns of the problem are 38303 elevation nodal values and 151661 discharge nodal values for every component The size of this problem does not fit into the memory of a single processor with 128 MB of RAM and a minimum of two processors of an IBM SP2 must be used We first analyze the numerical behavior of the Schwarz algorithm with respect to the number of sub domains In Figure 10 is shown the CPU time required by one global CG iteration versus the number of sub domains N for a fixed nu
68. is file is shown in the subsequent tabular The format of the file can be deduced from the source of the code of from the enclosed example 65 dt niter turbo visc0 Strickl tracer viscT bati rest steady WX WY Coriolis ho x10 qx0 qy0 x1X qX rk itmax tol precond tolan teta outbc lumpi lump2 slip warn video rewrite time step number of time steps if 1 k e turbulent model is used In this case lump1 is switched to 0 implicit diffusion eddy viscosity If turbo 1 it is an initial value Strickler coefficient if 1 passive tracer model is activated if tracer 1 passive tracer diffusion coefficient if 1 reads bathymetry from the bati file 1 else assumes the uniform h0 value if 1 reads the initial conditions from the file restart tem else uses xi0 qx0 and qy0 initial conditions if 1 steady state calculation else unsteady files elev time disch time spy nodes must then be provided wind velocity components Coriolis parameter constant bottom depth when bati 0 constant elevation of the water when rest 0 initial value of the discharge when rest 0 elevation to be imposed at the X th boundary when steady 1 discharge to be imposed at the X th boundary when steady 1 number of steps to be used to determine the pathline in the lagrangian integration of momentum maximum number of iterations that can be performed to solve the linear systems maximum value of residua
69. it is very convenient to dispose of structures that contain elements adjacent to a given one and containing the edges delimiting it To build these structures in an efficient way we start with an array that is used in SWEET too to solve the Lagrangian part of time step here named VVER MAXAD NV This structure contains for every vertex of the mesh all the elements which have this vertex in common MAXAD is the maximum number of elements 63 adjacent to a node Starting from VVER we built the array VSID 2 NSID which contain for every side of the mesh indices of its ending points NSID is the number of sides in the grid To make an efficient search of an edge given the indices of the ending points sides in VSID are specified in such a way that VSID 1 i lt VSID 2 1i and VSID 1 i i 1 NV is or dered in increasing order A simple hash table can be realized using a vector of pointers PVSID NV to VSID that contains for every vertex the pointer to the first side to which this vertex belong Search of a side s of given ending point v and v can be made or dering index vj and v2 in such a way that v lt v2 and search of v2 is started from the position VSID 2 PVSID wv Disposing of VSID and PVSID SEL 3 NEL which contains for every triangle indices of its three sides can be easily calculated Lastly EEL 3 NEL which gives for every element indices of adjacent elements is calculated With these structure at our disposal the refinement
70. l minutes This time depends also on the total number of nodes of the global mesh Now we have all the input files for a run of our code let s briefly summarize what the program needs e datin file e bati file if needed bati 1 in datin e pa_mesh file created by after e dat tmp files created by setup e clev time disch time and spy nodes if needed steady 0 in datin e restart tem if requested rest 1 in datin this file can be created both by the sequential and the parallel version ot SWEET Every time that one wants to change the initial conditions of the problem or the output level of the code or the choice of the preconditioning then the file datin must be modified by hand Every time that one wants to change the number of the subdomains for the second level partition he must modify the rc run_vars file and rerun the setup code Every time that one wants to change the number of the subdomains for the first level partition he must rerun the chaco code and then the after and the setup codes Every time that one wants to change the mesh then he must recreate the whole input system starting from the matrix code A sketch of the dependency relations of the input files is given in Figure 34 To deal with the managing of the input files the Unix make command is of great help with a simple makefile file like the one shown in the following it will be necessary only to launch make and the entire cascade of programs will produ
71. l to stop the linear solver ignored by the sequential code For the parallel code if 1 Schwarz preconditioning is used else diagonal preconditioning is used in the CG for the elevation maximum tolerance to recover the pathline in the lagrangian step in percentage with respect to the average area of the triangles degree of implicitness of the scheme it must have a value between 0 5 and 1 usually equal to 1 type of boundary conditions to be imposed at the outflow if 1 stiffness matrix is lumped at Step 2 else consistent matrix is used This flag is automatically set to 0 whenever turbo 1 if 1 stiffness matrix is lumped at Step 3 else consistent matrix is used if 1 free slip boundary conditions are applied on the wall else no slip conditions are used four verbosity levels for the output Values range from 0 to 3 frequency of output on the standard output frequency of the restart file A restart file is always written at the end of the calculation 66 iref time interval in time steps between two calls to the adaption procedure if 0 the adaption is skipped adtype adaption type if 1 uniform refine if 0 refine if 1 derefine if 2 adaption eetype error estimator if 1 second derivative of if 2 second derivative of v if 3 mass defect see sect 10 1 reftol error tolerance fraction respect to the maximum error permitted during the adaption procedure 0 lt reftol lt 1 refmax maximum number of elements
72. lus the vertical average of the horizontal diffusion terms in 1 2 are supposed to depend on the average velocity as follows fe oa g du o 0 al uu det f On a2 dz da r tua 28 fe E o u o 0 d dz a7 a h a 2 Oy owe f Oy r a Oy se 29 where vk is a parameter which should account both for turbulence and vertical dishomo geneities The integration of the vertical diffusion term 1 2 gives E o u u lo Oz 2 dz v Bp lho Tle T 30 The right hand side terms are the wind stress and the bottom stress which are usually modeled as follows Cu WW 31 gual tig va K2h113 rle 32 T ho Repeating the same derivation for the y component and collecting the all contributions the shallow water equations finally have the following form Pihu Oria mil V v Vhia nZ _ fva Cu WW ptt 33 ee Mate sl V vV hva T fun Ca WW oe 34 OE Ata Olea _ 0 35 at Ox Oy Let s use from now on a different notation to adhere more strictly to what can be found in the source code of SWEET Let q z y t dr qy be the unit width discharge that is qs hua qy hva and w the wind stress tensor that is w CuW W and wy Cu W Wy with W W 2 W2 Then the Shallow Water Equations SWE from now on read q _ ala gt V qq h V vVq ghVE IF IB 20 xqtw 36 d _ Ot V q 0 37 Clearly
73. mal to the profile of the boundary This is done projecting the momentum equation on the normal direction 17 5 2 2 Close Boundaries On solid walls we can model the flow in two different ways we can reproduce the physical situation in which we find the water at rest or we can ignore the friction effect of the wall simply imposing a zero flux across the wall The two different models are usually referred as no slip and free slip boundary conditions No slip boundary conditions We impose a null velocity field all along the closed boundary thus q 0 on the boundary that is we impose a Dirichlet boundary condition on the unit width discharge Free slip boundary conditions We ask for a null flux across the wall by imposing a null normal component of the velocity and thus of the discharge along the close boundary q n 0 where n is the unit outward normal direction As we treat the convective terms with a lagrangian procedure we must pose this condition in a strong way It is impossible to obtain the normal derivative in a weak form The discretized momentum equations for the two components of the discharge give rise to two linear systems Adz dy 61 Aqy by 62 63 where A is the differential operator Instead of the two systems above we consider a unique system of the form Jas 64 where q qr qy and b b by We now couple the equations solving on the solid boundary qna ny 0 65 and A
74. mber of processors The Schwarz algorithm runs faster when a large number of sub domains is used until the communication overhead for the global CG becomes relevant eventually preventing further time savings To evaluate the overall code performance this result has to be considered together with the result shown in in Figure 11 where we illustrate how the number of global CG iterations varies with the number of sub domains The preconditioning capability of the 38 BARRA EOR ORBEA AREEEEE BOSS CRESS EO CASPER OS Ea PELATI EEESITIAG ISISITA SSSR ES OOO DY ISIS LR ES SERN SOL FE A TAa x Se ava a Na LETTA EEN NS TS OEE COOOL Ya YAYATA TATA KK ELSE KAYA YAYA YAYAJ ASS RENEE OCEAN Wea a RELA S ZLTO PRAIA VITALE AIA a Vava a SSRIS BEE ECR AOSE ERE AS SEARLE ORES ESRREKIAN A RA SRE RAIN SERIA N lt 9 o oc Q O SARDINIA for the Bocche di Bonifacio test case On the a detail of the computational mesh for the zone indicated in the left figure The entire mesh has 75053 elements the computational domain On the left Figure 9 4 0 0 0 99S IUL ht rig 120 160 200 240 80 Number of subdomains CPUt two processors 40 ired by one global CG iteration versus the Ime requ ioning t using Schwarz precondi Figure 10 the d
75. mputed this occurs twice in each iteration An efficient implementation of a standard preconditioner is not straightforward In fact neglecting the trivial case of diagonal preconditioning effective techniques such as incomplete LU decomposition are intrinsically sequential We can indeed say that the main effort in finding an efficient parallel iterative solver has to be spent in devising the appropriate preconditioner 8 4 Additive Schwarz Preconditioning for the Elliptic Problem In what follows we briefly outline the additive Schwarz preconditioner more details on the theory being available as example in Ref 12 The method of Schwarz has been originally proposed as a solver The underlying idea is to solve the elliptic problem separately on some portions of the original integration domain exchange information at the borders of the different portions and then iterate the procedure till convergence obtaining from the union of the solutions on sub domains the global exact solution of the original problem In recent years this view of the Schwarz method as a solver has been practically abandoned whereas its attractive features as a preconditioner have been exploited We define R as the restriction operator relative to the sub domain Q and A R AR Note that from a functional point of view A is the local stiffness matrix in Q as arises when imposing homogeneous Dirichlet boundary conditions on 00 If we consider M R AR wh
76. n 38 is slightly different from the momentum equation 36 for laminar flow All the differences are in the stress tensor accounting for turbulent diffusion It is not constant in space has diagonal part 2k involves the operator Vq coupling the two components of the momentum equation The Reynolds stress models only turbulent diffusion and does not account for momentum dispersion due to vertical non homogeneity of the horizontal velocities Such aspect is not addressed here and only turbulence modeling is discussed The vertically averaged turbulent kinetic energy k and the rate of dissipation of tur bulent kinetic energy e obey to the following equations Ot LWY Y w u Vh P Phe 40 de Ce o cy e Z tove 9 45 ve gt ET 41 where the constants c 0 09 c1 0 126 c 1 92 ce 0 07 are based on classical test cases and P is the production term due to horizontal gradient of velocity which expression is 2 dv dvi dv Pony So wy So jel dr dx Vv Vv Equations 40 41 are different from the ones usually referred as k e model The difference is in the presence of two source terms Py and P that were first proposed by Rodi et al 21 These terms account for production of kinetic energy and rate of dissipation of kinetic energy due to bottom friction The production terms Pk P are related to the vertically averaged velocity as follows U 3 Pi iL 42 U P ep 43 wit
77. n RS6K indicates the directory containing the PVM executables Now the pvme console is started and you should have the prompt pvme gt this is the PVMe console prompt it is not a shell prompt and only PVM special com mands can be executed from this prompt If in the starting operation any error message appears type immediately from the PVMe prompt the command halt and restart the PVMe If the message persists then consult your system administrator Let s suppose that you have successfully started the PV Me console and that now you have the PVMe prompt Try to type help this will show you all the PVMe special commands together with a brief explanation The main commands that you will probably encounter are e quit exits the PVMe console returning to the unix prompt This operation leaves the PVMe daemon pvmd3e running in batch mode on all the nodes contained in the hostfile file You will have the message pvmd still running indicating that everything is ok PVMe is a software which normally runs in batch mode so this is the standard operational mode From ANY unix prompt on ANY node of the partition you can reactivate the PVMe console and thus the PVMe prompt by simply typing pvme After this you must have the message pvmd already running indicating that the PVMe daemon is already alive If you do not receive this mes sage then for some reasons the daemon has been killed If this is the case type from the PVMe prompt the c
78. ng the mesh Once more in the case of tidal currents this is not of concern because relative errors of the elevation are very small to verify whether this could be a real problem for the adaption procedure an unsteady test case in which the mean value of the elevation is smaller should be conceived Therefore although the mean global error of the adapt solution is also smaller than that of fine Tab 2 we cannot say in general that the elevation field of the first solution is more accurate than the one of the second solution 56 Part III User Manual 12 Structure of the Code SWEET is partly written in Fortran 90 The use of the new Fortran standard allows to exploit some relevant features at the programming level However in order to maintain as much as possible the structure of the code only the dynamic memory allocation is currently used in SWEET SWEET is composed by a main routine plus nearly sixty subroutines some of them are optional or mutually exclusive their execution depending from the flag set in the datin file The role of each subroutine is described in the following list Initialization DATAREAD MESHREAD REREAD READBATI NORMBOUND INISPYSTORY INITVAR INITFE INIGLOSTIFF LISTBOUNDNODE LISTNEAREL SPERELEVBOUND ELEVBOUND Step 1 LAGRAN EVALUATE Subroutines of SWEET reads the datin input file reads the mesh input file reads the restart file reads the bati input file for bathymetry if ba
79. nt turbulent kinetic energy epsilon Y Axis gt S Co SES fier Erbe X Axis Figure 19 Prismatic channel with an abrupt enlargement rate of dissipation of turbulent kinetic energy 45 turbulent Y Axis X Axis Figure 20 Prismatic channel with an abrupt enlargement turbulent viscosity of typical situations that occur in river flow The main channel is 2 kilometers long and 100 meters large it has an abrupt enlargement that doubles its width a smaller inflowing branch a square island in the middle In the initial part the bottom of the channel has a constant depth of 3 meters in the final part the depth rapidly becomes 6 meters The inflow boundary conditions of the main channel and of the secondary channel are 300 and 100 cubic meters per second respectively At the outflow a constant water elevation is imposed The initial computational mesh is intentionally quite coarse being composed by 120 nodes only In fig 22 it is shown the velocity field computed on the background grid The simulation performed on the initial grid does not reveal recirculations neither behind the abrupt enlargement nor behind the island It has been chosen to refine the initial grid adaptively up to three levels of refinement the final mesh being composed by about 370 nodes In figg 23 25 are shown the refined meshes obtained by using different error indicators The use of the error indicator based on water elevation derivatives lea
80. ntegration Step 2 Step 3 6 0 Total II 5 F 40 Q a n 2 0 0 0 2 4 8 12 16 Number of processors Figure 13 Speed up of the whole shallow water code for the mesh composed by 75053 elements Due to memory requirements this problem runs on a minimum of two processors so the speed up are normalized by the two processors value the ideal speed up for 16 processors is thus 8 4 problem on a coarser mesh to have the possibility of also performing the simulation on a single processor The sequential code has been run using an efficient preconditioner that is an Incomplete LU decomposition ILUT The reduced mesh is composed by 39170 elements and 20169 vertices We have done both serial and parallel runs using a RISC processor of the system IBM SP2 with 128 MB of RAM In Figure 14 we report the timing 8 0 r r Sequential ILUT prec Schwarz coarse grid prec 6 0 T vo E i 4 0 2 0 2 4 6 8 Number of processors Figure 14 Implicit step time for the solution of the linear system on a mesh of 39170 elements The horizontal line refers to a sequential run using an ILUT preconditioner The other curve refers to parallel runs on 2 4 6 and 8 processors with the Schwarz preconditioner for the solution of the linear system at the implicit step The horizontal line represents the time needed by the scalar run for the case with ILUT preconditioning We can see that the Schwarz pr
81. ntered sense the mass variation inside a triangle during a time step is not exactly equal to the flux through the edges of the triangle itself The reason of this is twofold On one hand the use of quadratic polynomials to approx imate the discharge q yields to larger computational stencil than when using a linear representation Mass conservation checking must be done an a stencil consistent with the stencil of the scheme The mass conservation triangle by triangle can be properly advocated only for finite elements of mixed type when a special mass lumping is used In fact finite elements of mixed type RTO just recover the cell centered finite volume technique 28 Secondly we observe that the substitution of the momentum equation into the continuity equation that leads to eq 53 involves a new spatial derivation This is a usual approach in the finite elements context 29 but does ensure local mass conservation Conversely in the finite difference finite volumes framework such a substitution is usually carried out at an algebraic level equations 51 52 are first discretized in space and then the substitution is carried out without further derivatives This ensures local triangle by triangle mass conservation 30 The considerations above suggest to use the check of local mass balance as an error indicator for the present scheme We define then 1 1 e m Emid f dT 84 ee E lee ant fa dr 81 Here I is the contour of the e t
82. o Figs 29 30 Spatial correlation for the other two estimators not shown give similar results and for this reason we will use in the following tests only the estimator ez The criteria adopted during the adaption procedure is that described in section 10 2 1 with a call every 13 minutes In addition we added the constraint of a maximum number of P1 nodes about 360 Temporal evolution of the mean value of the estimated true error of the module of the discharge is showed in figure 28 In the same figure we also show the errors of the solutions fine0 fine and fine2 The average errors for the adapt case are very near to those of the solution fine see also Table 2 Moreover after about the first six hours of integration during which there is a slow relaxation of the initial condition to the imposed boundary conditions errors of the adapt solutions are between those of the solution fine and the solution fine2 In Table 2 the errors averaged spatially and temporally excluding the first six hours of integration are shown As can be seen although mean number of nodes used in the adapt run is lower than that of the fine run and about one fifth of that in the fine run the global errors 54 Figure 28 Spatial correlation between the e3 error estimators and the estimated true error as a function of the integration time Dotted line is the imposed elevation expressed in meters Figure 29 Average e
83. olynomial basis p adaption for discretization of the shallow water equations versus local mesh refinement where the the order of the polynomial approximation is kept unchanged h adaption The mesh adaption technique adopted see section 10 2 is based on the use of a background grid see for example 24 25 The numerical simulation starts on a grid possibly composed by few nodes which is the coarsest mesh used in the computation its nodes are neither moved nor suppressed Successive levels of refinement and coarsening lie on the background grid This technique has been adopted because of the very complicated geometry of the boundary that characterizes environmental applications in river coastal areas and so on By keeping a background grid the information about the position of the boundaries and bathymetry can be preserved and is not lost by further interpolations due to node movements Remarkably a relevant by product of the mesh adaption is that poor care has to be used in defining the initial grid with adaption any initial grid will be transformed into a near optimal discretization 26 A peculiar aspect of the mesh adaption for shallow water flow is the necessity to devise proper indicators of the numerical error that should drive the grid refinement and de refinement In the present paper three possibilities are proposed and investigated the second order derivative of the elevation field the second order derivative of the m
84. ommand halt and re run the command pvme hostfile to start again the daemon on every node of the partition e halt stops the PVMe daemon on all the nodes of the partition and then stops the PVMe console It stops also every PVM process running on the nodes After this you must rerun the PVMe daemon from the start e reset kills all the PVMe jobs running on all the nodes of the partition leaving untouched the PVMe daemon Use this command every time that your PVM job 80 aborts for any reason You should have the signal New epoch lt number gt indicating that everything is ready for a new PVM job When the PVMe daemon is properly running on the partition it is possible to run the application Consider that if you don t give an absolute path for the executable then PVM assumes that the executable is in the HOME pvm3 bin RS6K directory A 2 PVM Messages Both PVM and the specific program that use PVM can give some messages during the execution of the run A PVM job will run on every node of the partition we thus have to think to have several programs running together but completely independent one from the another except that for communications When one runs the application a single program on a given node is started This program then spawn the other programs one for each of the remaining nodes of the partition The standard output and the standard error for the father program are the shell from which it has
85. procedure does not present great difficulties because as already mentioned new nodes are added to the queue of the array coordinates XV and indices of new triangles to the queue of the VEL array The procedure of de refinement is a little more complicated because generally when vertices and triangles are eliminated holes are created in the structures which describe the triangulation To avoid such a problem we have adopted a de refinement procedure that every time it is executed re builds the mesh starting from the background mesh excluding those elements marked to be de refined In this way the same structures and procedures used for the refinement can be used also in the de refinement part of the code In order to be able to re build the mesh one has to know exactly its history that is for every element which does not belong to the background mesh one has to know of which element it is son and eventually of which element it is the father Data structures more suitable to store such information is tree In our code this structure called HISTORY is a vector of integers where the beginning of every refinement step is labeled by a zero Refinement operations are listed sequentially in HISTORY storing indices of the elements created The end of the re refinement step that is eventually the beginning of another is labeled by a zero Since a given element can be divided either in a standard way four elements or in a green way two elements
86. rob lems nevertheless the only mean to determine the best choice is always to test both the preconditioners on the given problem The choice of the preconditioner can be influenced also by the particular parallel hardware on which the code runs In a very schematic way we can say that the Schwarz preconditioner is particularly suitable when the parallel system has poor communication performances relatively to its computational speed since the Schwarz algorithm greatly reduces the amount of communication requested by the CG algorithm The diagonal preconditioning is in principle a better choice when the proces sors are connected through a very efficient network so that communication overhead is small 18 2 When Everything Else Fails please contact Luca Paglieri e mail luca crs4 it Phone 39 70 2796316 Fax 39 70 2796302 CRS4 Centre for Research Development and Advanced Studies in Sardinia via N Sauro 10 09123 Cagliari Italy 78 A PVM Quick Guide This is a quick guide to PVM 17 it deals only with the case of PVMe 2 1 on a IBM SP2 system with 4 1 Operative System Level PVM is a system composed by a message passing library and an underlying software running on every machine of the parallel system In order to be able to use PVM the software must be started on the chosen set of processors and the library must be linked to the other object files to form the executable PVM is a public software PVMe
87. s file is read only if bati 1 in the datin file It contains the bathymetry of each node when the bottom is not flat The file is composed by a number of rows equal to the number of nodes and in each row it is written the water depth of the respective node In case of constant bathymetry the depth value is specified in the datin file with the variable ho elev time This file is read only if steady 0 in the datin file It contains the Fourier components of the elevation value to be imposed on the boundary disch time This file is read only if steady 0 in the datin file It contains the Fourier components of the discharge value to be imposed on the boundary spy nodes This file is read only if steady 0 in the datin file It contains the list of the spy nodes that is the nodes for which the value of the transient solution is printed tracer data This file is read only if tracer 1 in the datin file It contains the list of nodes in which the scalar is not zero together with the corresponding value and the list of nodes in which the sources are located together with the value of the sources 68 15 2 Output Output on Video The output on the standard output of SWEET is printed every video iterations The value of video is set in the datin file The standard output of SWEET is the screen for the sequential code and the file proc0 output for the parallel code A typical output can be as the following TIME 20 sec STEP NUMBER 1 S
88. stimated true error of the discharge module as a function of the integration time solutions fine0 finel fine2 adapt Figure 30 Average estimated true error of the elevation as a function of the integration time solutions fine and adapt 59 Figure 31 Number of P1 nodes as a function of integration time in the adapt case ron name CDI ID CIO BA Table 2 Average maximum and average estimated true error of elevation module of discharge q and of velocity v NV is the number of P1 nodes in each of the meshes used in the adaption case NV is the average number during all the integration Only solutions after the first six hours are used in the temporal averages of the adapt solutions are between those of the other two Another indication that the adaption procedure is working correctly can be deduced from Fig 31 the number of P1 nodes as a function of the integration time is proportional to the error see also Figs 29 30 The estimated true error of the elevation has instead a more complicated course every time the mesh is changed and the interpolation procedure is executed a high frequency component of unphysical noise is introduced in the solution This has been also verified performing a run during which at fixed time intervals of 13 minutes we interpolate the discharge over P2 nodes without changi
89. subdivided into four similar triangles sons joining midpoints of the edges of the father The number of edges and triangles increases for a factor four and the local length of triangles is halved 3 Re meshing m methods to produce a highest quality triangulation creation de struction and repositioning of the nodes is allowed This is a more general procedure but also heavier from a computational point of view The choice of the more suitable mesh adaption procedure depends on the problem at the hand For example regularity of the element size length and shape can be a requirement or not In compressible flow simulations very stretched elements are useful because where shocks are located the flow direction is strongly biased Adapted grids for these problems are in general very irregular both in side length and in shape 22 The more appropriate strategy for these kind of problems seems to be the h method 2 1 In the present paper we chosed to use the refinement technique 2 2 of the h method Main reason for this choice is that the h methods keep the information on the original grid unchanged When a new node is added in the middle of an edge bathymetry velocity and water elevation in it are obtained by interpolating the values in the element In this way the original values of the bathymetry and of the physical unknowns in the initial mesh are never abandoned and the degradation of the solution due to the interpolation is minimized
90. t the wall The turbulent kinetic energy is essentially produced at the enlargement of the channel because of the transversal stresses in the fluid In such region is the maximum of turbulent kinetic energy which is about 0 03 square meters per square second The poor performance showed by the k model is in accordance with the well known inability of the method to simulate separated flows he N N ENNIO VA A A SAVVY PAVIA AIAIAIAININININININININA A AVAVAVAVAVAVAVAVAVAVAVAVAVAVAVAVAVAVAVAVAVANINNINISIN IN UNNI UNIS NO VV VANISSA TARRA RARI SR ASB ROKK LALA VAIO RE Sar p X Axis A By gt Z AZ O 4 N Vv Y Axis Figure 15 Prismatic channel with an abrupt enlargement computational mesh 11 5 Automatic Mesh Adaption steady state case To investigate numerically the performance of the mesh adaption strategy outlined above it has been chosen a test case involving several characteristic features ot shallow water flow In fig 21 may be seen the geometry of the channel and the initial computational mesh used for the present calculations The test is designed to collect in a sketchy fashion a number 43 elevation X Axis Figure 16 Prismatic channel with an abrupt enlargement water elevation velocity Figure 17 Prismatic channel with an abrupt enlargement velocity field 44 kappa Y Axis o 5 C X Axis Figure 18 Prismatic channel with an abrupt enlargeme
91. tep 2 residual 478D 06 iterations 62 Step 3 residual 958D 06 iterations 138 residual 760D 06 iterations 103 Error Max 0 543D 03 Min 0 271D 03 Average 0 520D 03 xires 7783D 02 qxres 8256D 01 qyres 5788D 01 vxmax 2823D 01 vymax 2107D 01 qxmax 2264D 00 qymax 1689D 00 cflmax 6588D 01 frmax 3184D 02 18 787 154 0 000 0 000 0 000 0 000 0 000 0 000 0 000 0 000 0 000 0 000 18 9417 The meaning of the displayed control parameters is as follows residual is the maximum of the absolute values of the residual it is always lower than the desired tol value iterations is the number of iterations of the linear solver which have been necessary to reach the desired residual Error Max Min Average are the maximum the minimum and the mean value obtained from the chosen error estimator xires maximum value of the magnitude of the difference between the new elevation and the one computed at the previous time step qxres maximum value of the magnitude of the difference between the new x component of the unit width discharge and the one computed at the previous time step qyres maximum value of the magnitude of the difference between the new y component of the unit width discharge and the one computed at the previous time step 69 vxmax maximum value of the magnitude of x component of the velocity vymax maximum value of the magnitude of y component of the velocity qxmax maximum value of the magnitude of x component of the unit widt
92. the elevation of the water in the refining coarsening stage we can use the estimate 81 where the right hand side has to be calculated using the computed solution un The error estimate 81 suggests to define the following non dimensional error indicator br dn ag Cm g max Dae o dQ 82 where Q is the computational domain and y is the k th linear basis function 2 From a more physical point of view considering the whole shallow water equations system it can be foreseen that there are typical behaviors of the flowfield that the error indicator e could not be able to detect properly For instance shear layers between 30 parallel velocities as may happen at inflow of branches into a channel could not be detected by the indicator 82 To this aim it would be more useful to use an indicator of the gradients of the solution depending on the velocity magnitude Extending in a heuristic way the definition 82 to the velocity field we propose di by Ci Tj do 83 2 m MAX v am max f where v is the magnitude of the velocity v 3 An important feature that a numerical scheme for shallow water flow should possess is mass conservation This property is accomplished by the scheme described above as the discrete equations are obtained from the continuity equations and are then consistent with it However the finite element scheme illustrated above does not ensure mass conservation in a finite volume cell ce
93. the flexibility ensured by unstructured grids is a remarkable advantage of the finite element technique when dealing with complex geometries as it is often the case in environmental flows and it makes the effort worthwhile The first step for a domain decomposition approach is the partition of the computa tional domain into a given number of sub domains The specifications characterizing such partitioning should be 20 1 Minimization of the number of neighbors for each sub domain 2 Minimization of the number of nodal values at interfaces between sub domains 3 Balancing the size i e the number of nodes of each sub domain this will result in a balancing of the computational load between the different processors To perform the partitioning we have tested three software packages Metis 11 Chaco 9 and TopDomDec 10 which implement several algorithms The interested reader may consult the given bibliography for details Our parallel procedure requires that the sub domains partially overlap each other This feature has been obtained by developing an ad hoc software 8 2 Lagrangian Integration of the Convective Term The Lagrangian integration of the convective term requires the calculation of the pathlines and the evaluation of the velocity where the pathline falls On an unstructured grid the major computational cost consists in recognizing which elements are crossed by the pathline In practice this operation requires for eac
94. ti 1 computes the versors normal to the open boundary initializes the output files for time dependent problems initializes the unknowns initializes the finite element machinery initializes the stiffness matrix builds the list of the nodes that belong to regions of the open boundary characterized by different values of the index inod builds the topological lists useful to recover the pathline when performing the lagrangian integration reads the input Fourier amplitudes and frequencies of the elevation at the open boundary computes the elevation to be imposed as boundary condition computes the momentum transport by the lagrangian method evaluates the value of the velocity in a given point of the element 57 Turbulence Model KEPS_IMPL KEPS_WALL KLOCSTIF K_RHS KEPS_DIRBOUND EPSLOCSTIF EPS_RHS Step 2 STEP2 QILOCSTIFF QKNOWN Step 2 lumping STEP2L QKNOWNL Step 3 elevation STEP3XI XILOCSTIFF XIKNOWNTERM XIDIRBOUND Step3 discharge STEP3Q Q2LOCSTIFF ELEVGRAD WALLNOSLIP WALLSLIP QDIRB the k e turbulence model module boundary wall function for the k local stiffness matrix for the k rhs for k boundary condition for k and local stiffness matrix for r h s for the Step 2 of the fractional step procedure in the case of consistent mass matrix see eq 50 computes the local stiffness matrix for discharge at Step 2 computes the right hand side in the equation for discharge at St
95. time a new node is created an efficient search of the coordinates of this node is made between the coordinates of the nodes of the old mesh If the new node was already in the old mesh then the old values of the fields are used otherwise the proper interpolation is executed quadratic interpolation for velocity and linear for the other fields Follows a summary of the principal data structures used in the adaption module VVER MAXAD j elements adjacent to vertex j VSID 2 j ending points of side j VSID 1 j lt VSID 2 j PVSID j pointer to the the first side on VSID which has vertex j as first extreme SEL 3 sides of the element j EEL 3 elements adjacent to the element j HISTORY tree where history of the grid is stored E_ID j identifier of the element j ETYPE j level of refinement of the element j if ETYPE j lt 0 the element is green 15 Sequential Input and Output In the SWEET package it has been added an example application that is thought to be useful for starting up in using the code In this Section we comment on which are the input and output files that are necessary to run SWEET and which is their format We suggest to read this Section comparing to what is written in the given example and then running the example itself 15 1 Input The input files read by the code SWEET are datin It is an ASCII file where are stored the informations concerning the different options that are to be run in the code The content of th
96. tions at every time step in order to decouple the physical contributions The discretization in time of the system 47 48 leads to the following equations to be solved Step 1 v q h vati yoX 49 Step 2 qtte hry t 3 n 2 3 n 1 3 n 2 3 q lq _ qn 1 3 n 1 3 n 1 3 Step 3 n 2 3 gig Agghrve nr rd ad 51 Phar Maye 2 52 The symbol v oX indicates the value of the velocity obtained by a Lagrangian integration using the method discussed in Section 5 1 At the third step the equations 51 and 52 are decoupled by subtracting the divergence of 51 from 52 One then solves the following Helmholtz type equation n 2 3 ent At V gP Vert ALV 2 ent g 53 q7 3 At qh HAV e 54 14 This new elevation is then used to solve equation 51 The spatial discretization of equations 50 52 is based on the Galerkin finite element method the basic theory of the Galerkin approach may be found for example in 1 3 and in reference 5 which treats the SWE The weak formulation of equations 49 52 is accomplished in a standard way and is not shown here An important aspect of the spatial discretization of equations 50 53 is that two different spaces of representation have been used for the unknowns the elevation is interpolated by P1 functions whilst the unit width discharge is interpolated by P2 functions As usual P1 is the set of piecewise linear functions on triangles an
97. tor times vector operations are performed in parallel on the distributed set of data Again the communication scheme has been designed to avoid blocking patterns The interprocessor communications doesn t make any use of pre defined collective message passing instructions but only the base instructions send and recv of the PVM 17 message passing library In this way the code portability is assured and its performance is reproducible As discussed in the previous Section the CG is preconditioned by an additive Schwarz algorithm To exploit all its capabilities we decided to have the number of sub domains for the Schwarz algorithm be independent from the number of processors We have thus introduced a second level of subdivision into domains so that several sub domains can be assigned to a single processor We first partition the domain into a number of portions N equal to the number of processors then each portion is further subdivided into the final sub domain pattern A schematic representation of this two levels partition is presented in Figure 3 26 SNIVWOddNS HO THAHT LSU BI OVERLAP REGIONS SNIVINOCANAS HO THAHT UNODHS communicatio communications Figure 3 The two level of partitions used in the code we imagine a run over eight processors and thus a first level partition of the global domain in eight sub domains The detail on the two first sub domains shows how every sub domain is further p
98. usually so small that the diffusive term can be treated explicitly without resulting in any additional unphysical constraints On the other hand when considering equation 51 for given 1 the equation is explicit The computational effort required by this scheme for the solution of algebraic systems therefore consists of the inversion of one symmetric matrix with size coinciding with the number of P1 nodes 5 1 Lagrangian Scheme for the Convective Terms At Step 1 the advective part of the momentum equation is integrated by a Lagrangian scheme 6 7 Rewriting the convective terms of equation 47 in Lagrangian form results in the solution of two coupled ordinary differential equations dv X t 1 lt 0 55 dX 7 VOR t 56 The curve X t is the characteristic line and its slope is the velocity itself so that at this stage it coincides with the pathline The velocity field plays a double role it is the unknown to be determined as well as the slope of the characteristic curve As we are interested in computing the solution at the nodes of the mesh let us consider the node with coordinates y The initial condition associated with equation 56 must be X N y 57 To integrate equation 56 we need to know the slope of the characteristic curve at y at time which i unfortunately is the unknown velocity itself Therefore the slope of the characteristic line has to be approximated in some way for instance by a
99. ve along the coordinate j of the k th P1 basis function in the element i derivative along the j th coordinate of the k th basis function of the element i evaluated in the quadrature node integral of the product of the phi01 i phi02 j shape functions divided by the area P1 basis function i on the reference triangle evaluated in the quadrature node j P1 mass matrix contribution of the element i in column j and row k P2 mass matrix contribution of the element i in column j and row k local mass matrix component k of the versors normal to the boundary node j lumped mass matrix in node j total depth of the water in node j elevation of the water in node j elevation of the water in node j at the previous time step elevation to be imposed on the j th part of the boundary component k of the velocity in the node jJ components of the unit width discharge in the node j 62 qxold j qyold j components of the unit width discharge in the node j at the previous time step qxhalf j qyhalf j components of the unit width discharge in the node jJ at Step 2 of the previous time step he i average bottom depth in element i inod j type of boundary of the node j the convention is as follows 0 internal node 10001 wall 10002 discharge imposed 2000x x 3 4 elevation imposed 3000x x 3 4 corners between wall and elevation imposed boundary velocity is zero but elevation is assigned e j error estimate of the numerical solution in t
100. y algebraic procedure and it does not require to build a coarse triangulation There are no theoretical results concerning this operator We have therefore resorted to numerical investigations to test its effectiveness in Section 11 3 some computational results are shown and the performance of this algebraic coarse grid operator is discussed After these details it clear that the choice of this Schwarz preconditioner has been guided by its intrinsic parallelism and its ability to ensure when the coarse grid correction is used a behavior independent on the number of sub domains and thus on the number of processors used in the calculation 8 6 Parallelization of the k model As explained in Section 6 the equations for the k and quantities are solved using the same numerical methods for the integration of the advective and diffusive terms of the equation for the momentum that is a lagrangian integration for the advective terms and a conjugate gradient to solve the linear system arising from a semi implicit formulation of the diffusive term Thus no new algorithm are introduced in the code and the parallelization of the k e model follows the same strategy stated above 25 8 7 Parallelization of the Transport of the Scalar Tracer Exactly the same considerations given above hold for the advection diffusion equation for the passive tracer 9 Parallelization Implementation Details Even if the sub domains overlap each other
101. y and in the practical applications this may not be the case To overcome this difficulty we relax the original requirement and at this step we impose a Neumann condition on the part of the boundary where the value of the elevation is not originally given In the test cases one can observe that this procedure works well in practice In the integration of the weak formulation of eqs 50 and 51 the lumping technique has been adopted for the mass matrices of q By the term mass lumping we intend the use of a low order quadrature formula for the evaluation of the integrals involving the non differential terms yielding a diagonal stiffness mass matrix It is well known that for P2 elements a nontrivial diagonalization has to be performed as may be the case of P1 otherwise a singular matrix is recovered see appendix 8 of reference 5 This difficulty has been overcome in the following way each triangle of the mesh is divided into four parts by connecting the midpoints of the sides it is then possible to use the three vertex points rule on each subtriangle The total integral is then the sum of the subintegrals and automatically leads to a diagonal mass matrix A possible objection to this approach is that the mass lumping technique is known 15 to produce large phase errors for unsteady problems which are precisely the ones we are interested in However at Step 2 no wave type phenomena are involved and the dissipation coefficient is
102. zero order extrapolation in time Assuming the use of a second order Runge Kutta scheme to integrate equation 56 the algorithm is as follows At X y Vv 58 X t y Atv X 59 and equations 55 immediately give vy v XE v X t 60 16 As the Lagrangian integration requires the primitive form of the equations the fourth term that appears in the left hand side of 51 has been added to ensure consistency with equation 47 which is written in conservative form We note that apart from this term the discrete counterpart of equations 49 52 requires the inversion of symmetric matrices only However this consistency term is of minor relevance in all the flows in which the typical time scale is much larger than the time step as is the case of tides Therefore the usual Conjugate Gradient CG algorithm can be confidently used in this kind of simulation The Lagrangian discretization of the transport terms has many attractive features it avoids spurious oscillations arising due to the centered treatment without the inclusion of any unphysical viscosity coefficient and it eliminates any restriction on the time step However when using unstructured grids the pathline reconstruction which requires the knowledge of the element in which the foot of the pathline falls consists of a greater algorithmic effort than that on structured grids In practice this difficulty has been overcome in the code by defining an

User Manual Version 2.0 - P-arch

Contents

Download Pdf Manuals

Related Search

Related Contents