Home

User's Guide - Centro Científico Tecnológico CONICET Santa Fe

1. Element in processor 0 Element in processor 1 O Unknown in processor 0 Unknown in processor 1 Figure 26 IISD deccomposition by subdomains is associated to each node and no Dirichlet boundary conditions are imposed so that each node corresponds to one unknown We split the nodes unknowns in three disjoint subsets Li 2 and I such that the nodes in L are not connected to those in La i e they not share an element and then the FEM matrix elements A j and Aj with L and j La are null The matrix is split in blocks as follows E Ai Ar Ago 0 A dj 0 An 157 Arr Aor As _ Azo AIL As Now consider the system of equations Ax b 158 139 which is split as Ar XL ALrx1 bz 159 Ary xy Ar xy by Now consider eliminating xz from the first equation and replacing in the second so that we have an equation for x Arr Ary AT App Xr by Art AZ bL 160 Ax b We consider solving this system of equations by an iterative method such as GMRES for instance For such an iterative method we have only to specify how to compute the modified right hand side b and also how to compute the matrix vector product y Ax Computing the matrix product involves the following steps 1 Compute y Ay x 2 Compute w Ay x 3 Solve Az z w for z 4 Compute v Ay z 5 Addy y v involving three matrix products with matrices A
2. upgrading the pointer void pack char amp buff const Extracts the object from the buffer upgading the buffer 153 void unpack const char amp buff Print the object void print Then the user can load objects into the buffer and finally call the flushO method to dump the actual content of the buffer to the output The flush is equivalent to e sending all objects to the server deleting them from the original node e sorting them according to the operator lt defined and e calling print on all of them on the server The underlying container is a list so that you can manipulate it with the standard list ac cessors but the ideal is to push elements with push_back KeyedObject obj or pushing a clean object with push_back and then access the elements with back Note that you must implement the copy constructor for the KeyedObject class Objects with the same key are not overwritten i e if several elements with the same key are loaded on the same or differente processors then all of them are printed to the output As the sorting algorithm used is stable objects with the same key loaded in the same processor remain in the same order as were entered Typical usage is as follows include lt src syncbuff h gt class KeyedObject define methods as declared above Es SYNC_BUFFER_FUNCTIONS KeyedObject int main SyncBuffer lt KeyedObject gt sb Insert objects sb push_back obj
3. Name of file where to read the nodes for the print some feature found in file ns cpp int RENORM_flag default 0 Flag for launching RENORM process found in file ns cpp int report_option_access default 1 Print after execution a report of the times a given option was accessed Useful for detecting if an option was used or not found in file ns cpp int report_total_liquid_volume default 0 Print total liquid volume for Level Set Method found in file ns cpp int reuse_mat default 0 Use fractional step or TET algorithm found in file ns cpp string save_file default outvector out The name of the file to save the state vector found in file ns cpp 67 string save_file_pattern default outvector d out The pattern to generate the file name to save in for the rotary save mechanism found in file ns cpp string save_file_some default outvsome out Name of file where to save node values for the print some feature found in file ns cpp int save_file_some_append default 1 Access mode to the some file If 0 rewind file If 1 append to previous results found in file ns cpp int solve_system default 1 Solve system before print_linear_system_and_stop found in file ns cpp string solver default petsc Type of solver May be iisd or petsc found in file ns cpp string solver_mom default petsc Type of solver for the projection and momentum steps fr
4. The aquifer element set is 2D linear triangle or quadrangle elements A per node property n represents the height of the aquifer bottom to a given datum The corresponding unknown for each node is the piezometric height or the level of the freatic surface at that point The equation for the aquifer integrated in the vertical direction is o Ot where Qag is the aquifer domain S is the storativity K is the hydraulic conductivity and Ga is a source term due to rain losses from streams or other aquifers S o n V K gt n V y Gas on Dag x 0 tl 22 6 10 Surface Flow 6 10 1 2D Saint Venant Model The stream element set represents a 2D or 1D stream of water It has its own nodes sep arated from the aquifer nodes whose coordinates must coincide with some corresponding node in the aquifer A constant per node field represents the stream bottom height hy with reference to the datum That is why normally we have two coordinates and the stream bottom height for each node The equations for the 2D Saint Venant open channel flow are the well known mass and momentum conservation equations integrated in the vertical direction If we write this equations in the conservation matrix form we have OU F U OE e On G U i 1 3 on Qs x 0 4 23 where Qst is the stream domain U h hw hv is the state vector and the advective flux functions in eq 23 are h2 F U hw hw g hwv 2 24 h
5. int row_indx col_indx Elementary matrix size nen nen double Ae I aas define row_indx col_indx and fill Ae for int j 0 j lt nen j for int k 0 k lt nen k ierr MatSetValue A row_int j col_indx k Ae j nen k ADD_VALUES BLOCK UPLOADING Ii ens ierr same stuff as before MatSetValues A nen row_int j nen col_indx k Ae ADD_VALUES In PETSc FEM the computed elemental matrices can be uploaded in the global matri ces with both methods as selected with the block_uploading global option set to 1 by 143 default i e use block uploading However in some cases block uploading can be actually slower due to the use of masks A mask is a matrix of the same size as the elemental matrix with 0 s or 1 s indicating whether some coefficients are structurally i e not for a particular state null Moreover the mask do not depends on the particular element but it is rather a property of the terms being evaluated in the Jacobian For instance for the Navier Stokes equations the Galerkin term only has non null coef ficients for the velocity unknowns in the continuity equation while the pressure gradient term only has coefficients for the pressure unknowns in the momentum equations Both terms together have a mask as shown in figure 29 When the application writer codes such a term he defines the mask At the moment of uploading the elements if block_uploading is in
6. Azz and Az and to solve the system with Azz As the matrix Azz has no elements connecting unknowns in different processors the solution system may be computed very efficiently in parallel 16 2 1 Interface preconditioning To improve convergence of the interface problem 160 some preconditioning can be in troduced As the interface matrix is never built true Jacobi preconditioning i e using the diagonal part of the Schur complement P diag A where P is the preconditioning matrix can not be used However we can use the diagonal part of the interface matrix P diag Azz matrix or even the whole interface matrix P Azz Using the diagonal preconditining helps to reduce bad conditioning due to refinement and inter equation bad scaling If the whole interface matrix is used P Ay7 then the linear system A w x has to be solved at each iteration This can be done with a direct solver or iteratively In the 2D case the connectivity of the interface matrix is 1D or 1D like so that the di rect option is possible However in 3D the connectiviy is 2D like and the direct solver is much more expensive In addition the interface matrix is scattered among all processors The iterative solution is much more appealing since the fact that the matrix is scattered among processors is not a problem In addition the interface matrix is usually diagonally dominant even when the whole matrix is not For instance for the Laplace equation in 2D the st
7. F2 U hv hwv hv Iz where h is the height of the water in the channel with respect to the channel bottom u w v is the velocity vector and g is the acceleration due to gravity As in eq 22 G represents the gain or loss of the river the source term is G U Gs gh Sox Sfx fehu Cyoalo gh Soy Sty fehw Croll 25 where So is the bottom slope and Sj is the slope friction 1 1 Sta Gr ola a Gan Ch ezy model m 2 2 Sfo ola S fy vgl Manning model where C and n the Manning roughness are model constants Generally the effect of coriolis force related to the coriolis factor fe must be taken in account in the case of great 46 lakes wide rivers and estuaries The coriolis factor is given by fe 2wsinw where w is the rotational rate of the earth and y is the latitude of the area under study The free surface stresses in eq 25 are expressed as the product between a friction coefficient and a quadratic form of the wind velocity w w wy and Ce ts Pair 27 p where Cg 1 25x10 014 if w lt 1 m s Cm 0 5x10 801 if1 m s lt w lt 15 m s 28 Cy 2 6x10 if w gt 15 m s 6 10 2 1D Saint Venant Model When velocity variations on the channel cross section are neglected the flow can be treated as one dimensional The equations of mass and momentum conservation on a variable cross sectional stream in conservation form are OA s t 4 OQ A s t
8. The following options apply to all the modules 4 4 1 Read mesh options This options are read in the read_mesh routine e int additional_iprops default 0 int additional properties used by the element routine found in file readmesh cpp e int additional_props default 0 Additional properties used by the element routine found in file readmesh cpp e int check_dofmap_id default 0 Checks that the idmap has been correctly generated found in file readmesh cpp e int debug_element_partitioning default 0 Prints element partitioning found in file readmesh cpp e int local_store default 0 Defines a locker for each element found in file readmesh cpp e int max_partgraph_vertices default INF Maximum number of vertices admissible while computing the partitioning graph found in file readmesh cpp 22 e int ncore default 0 For multi core architectures this represent the number of cores per node found in file readmesh cpp e string partitioning_method default metis Set partitioning method May be set to metis hitchhiking nearest_neighbor or random found in file readmesh cpp e int print_dofmap_id default 0 Prints the dofmap idmap object found in file readmesh cpp e int print_hostnames default 0 Print hostnames for nodes participating in this run found in file readmesh cpp e int print_partitioning statistics default 0 Print graph statistics fo
9. gt __END_HASH__ 1342 lt more connectivities here gt __END__ELEMSET__ elemset nsi_tet 4 name volume_elemset_2 _table_include volume_elemset_1 includes properties from the previous elemset __END_HASH__ 5 4 76 45 lt more connectivities here gt __END__ELEMSET__ 30 5 3 The global table If one table is called global_options then all other hash tables inherit their properties without needing to explicitly include it For instance in this case table global_options viscosity 0 023 tau_fac 0 5 __END_HASH__ elemset nsi_tet 4 name elemset_1 __END_HASH__ 5 4 76 45 lt more connectivities here gt __END__ELEMSET__ the elemset elemset_1 gets a value for viscosity from the global hash table of 0 023 The global_options table may include other tables as well Note In previous versions of the code the table keyword may be omitted for the global_options table For instance the previous example can be entered as global_options viscosity 0 023 tau_fac 0 5 __END_HASH__ This usage is obsolete and will be deprecated 5 4 Reading strings directly from the hash table Once inside the element routine values can be retrieved with routines from the TextHashTable class typically get_entry for instance char geom thash gt get_entry geometry geom this returns a pointer to the internal value string geom this is documented with the TextHashTable class You can then read values from it with ro
10. lt lt EOT feature and then inserting in the appropriate places for instance lt common_options lt lt EOT option1 valuel option2 value 2 EOT gt elemset typel 4 props 18 lt common_options gt option3 value3 __END_ELEMSET__ elemset type2 3 props lt common_options gt option4 value4 __END_ELEMSET__ that expands to elemset typel 4 props option1 valuel option2 value 2 option3 value3 __END_ELEMSET__ elemset type2 3 props option1 valuel option2 value 2 option4 value4 __END_ELEMSET__ Note the use of the underscore just before the gt terminator in the first ePerl block This tells to ePerl not to include the semicolon terminator see the ePerl documentation for further details The terminator EOT stands for End Of Text and may be replaced by any similar string It must appear in a line by itself at the end of the text to be included 19 4 3 4 Conditional processing ePerl allows conditional processing as with the C preprocessor with if else endif constructs as in lt fmethod iterative gt if method eq iterative maxits 100 tolerance le 2 else maxits 1 tolerance 1le 10 endif expands to maxits 100 tolerance 1le 2 Also lines starting with c are dicarded as comments Conditional preprocessing and comments are enabled with the P flag so that make sure you have this flag enabled when preprocessing the ep1 file probably
11. op_prev_2 branch point ar ce op_0 1 op_1_1 op_e_1 y y y op_0 2 op_1_2 op_e 2 we AAA Vers 5 Loni y _ 1 f 1 1 I pl 1 AE Loma Ta 1 l l I i y y Y op 0 N op_1_N op_e_N eee leave branch lt y op_pos_1 op_pos_2 AA AAN op_pos_N Figure 10 Cache list produced when branch 0 is chosen op_1_1 op_1_2 conditional block for branch 1 op_1_N else FastMat2 choose lt N gt op_e_1 op_e_2 conditional block for the else branch op_e_N FastMat2 leave op_prev_1 operations posterior to the branch op_prev_2 op_prev_k The branch call tells the library that several branchs will start from there and a branch point is created Then each conditional block code must start with choose lt j gt 89 op_prev_1 y op_prev_2 op_pos_N Figure 11 Cache list produced when branch 0 is chosen where lt j gt is a number that must be unique among all other branches Finally when leav ing all the branches we must call leave in order to tell the library that the mainstream of the cache list must be retaken Branches can be nested at any level The call to branching is not needed if the execution path is the same for all executions of the loop
12. In this way the storage needed is two integers per coefficient with an overhead of two integers per row for the terminator and there is only one dynamic array growing at the same time To insert a new coefficient say i j we traverse the i th row checking whether the j coefficient is already present or not and if the terminator is found j is inserted in its place pointinf to the new terminator which is placed at the end of the dynamic array 16 The PFMat class The PFMat class is a matrix class that acts either as a wrapper to the PETSc Mat or to other representations of matrix solvers Currently there is the basic PETSc class named PETScMat and a class called IISDMat for Interface Iterative Sub domain Di rect method that has a special solver that solves the linear system by solving iteratively over the interface nodes while solving with a direct method in the sub domain interiors this is commonly referred as the Domain Decomposition Method 16 1 The PFMat abstract interface The create member should create the matrix from the matrix profile da and the dofmap For the PETScMat matrix class it calls the old compute_prof routine calculating the d_nnz and o_nnz arrays and calling the MatCreate routine For the IISD matrix it has to determine which dofs are in the local blocks and create the appropriate numbering The set_value member is equivalent to the MatSet Values routine and allows to enter values
13. Ot Os G 4 1 Q 1 Q aaa AGO AGH oe Oh Co 8 4 egy a O E on Oa x 0 1 29 where A is the cross sectional area Q is the discharge Gs s t represents the gain or loss of the stream i e the lateral inflow per unit length of channel s is the arc length along the channel v Q A the average velocity in s direction v the velocity component in s direction of lateral flow from tributaries the Boussinesq coefficient 8 f u2dA u the flow velocity at a point and a the wind direction measured from a positive line tangent to s in flow direction The bottom shear stresses are approximated by using the Chezy or Manning equations P h S f x at Che zy model 30 h 48 S ny y le Manning model a AB h y where P is the wetted perimeter of the channel and a is a conversion factor a 1 for metric units 6 10 3 Kinematic Wave Model When friction and gravity effects dominate over inertia and pressure forces and if we neglect the stress due to wind blowing and the coriolis term the momentum equation becomes S Sf 31 47 and eq 29 0A h _ IQA Ot Os where A depends through the geometry of the channel on the channel water height h The flow rate Q under the KWM model is only a function of A through the friction law Q yA 33 where y Cp S1 2 P 1 and m 3 for the Ch zy friction model and y an7 s1 2 p 2 3 and m 5 3 for the Manning model S dhy ds is the slope of the s
14. These routines allow to convert matrices from or to arrays of doubles and Newmat matrices e FastMat2 set const Matrix amp A Copies to argument from Newmat matrix e FastMat2 set const double a Copy from array of doubles e const FastMat2 export double a const exports to a double vector e FastMat2 export double a exports to a double vector e const FastMat2 export Matrix amp A const Exports to a Newmat matrix e const FastMat2 export Matrix amp A const Exports to a Newmat matrix 9 4 6 Static cache operations These routines control the use of the cache list e static void activate_cache FastMatCacheList cache_list_ NULL Acti vates use of the cache e static void Deactivates use of the cache e static void reset_cache void Resets the cache e static void void_cache void Voids the cache e static void branch void Creates a branch point e static void choose const int j Follows a branch e static void leave void Leaves the current branch e static double operation_count void Computes the total number of operations in the cache list e static void print_count_statistics Print statistics about the number of operations of each type in the current cache list 10 Hooks Hooks are functions that you pass to the program and then are executed at particular points called hook launching points in the execution of the program The particular hook launching points may depend on the application but in order t
15. gt assures the translation when needed 4 3 7 ePerlini library Some useful constants and functions are found in the file eperlini pl This may be included in the user data file with the following line lt require eperlini pl gt Initializes ePerl It includes a definition for PI r trigonometric and hyperbolic functions and others A common mistake when using preprocessing packages like ePerl is to edit manually the preprocessed file dep1 instead to edit the epl file In order to avoid this we write protect the dep1 file for instance the section in the Makefile is replaced by depl epl if e 0 then chmod w 0 rm 0 fi eperl P lt gt 0 chmod w 0 In addition inclusion of the eperlini pl library inserts the following comment in the included file 21 DON T EDIT MANUALLY THIS FILE This files automatically generated by ePerl from the corresponding epl file 4 3 8 Errors in ePerl processing If the preprocessing stage with ePerl gives some error on STDERR preprocessing is stopped no ePerl output is given For instance if you include a directive like lt atanh 2 gt the output looks like ePerl Error Perl runtime error interpreter rc 255 Contents of STDERR channel atanh argument x must be x lt 1 In such a case you have to fix the ePerl commands prior to any further debugging of the PETSc FEM run 4 4 General options
16. one can be used to make some initialization normally it is called only once The last one clear_fun is called after all calls to eval_fun For simple functions you may not need the init and clear functions for instance if you want a linear ramp from 0 at t 0to 1 at t 1 then it can be done with a simple file funO cpp this File fun0 cpp include lt math h gt include lt src ampli h gt INIT_FUN EVAL_FUN if t lt 0 return 0 else if t gt 1 return 1 else return t CLEAR_FUN You then have to do a 132 make fun0 efn that will create the fun0 efn shared object file Dynamically loaded functions can be used using the fixa_amplitude dl_generic clause and then giving the name of the file with the ext_filename option For instance lt previous lines here gt fixa_amplitude dl_generic ext_filename fun0 efn __END_HASH__ 1 1 1 lt more node dof val combinations here gt __END_FIXA__ You can also have dynamically loaded functions that use parameters loaded via the table of options at run time For this you are passed the TextHashTable object entered by the user to the init function You then can read parameters from these but in order that the init function do anything useful it has to be able to pass data to the eval function Normally you define a struct or class to hold the data create it with new or malloc put the data in it and then pass the
17. we set u wA sin wt The analytic solution can be found easily The equation for v is Ov y with boundary conditions v 0 Aw sin wt and v L 0 The solution can be found by searching for solutions of the form y 107 Replacing in 106 we arrive at the characteristic equation iw v 108 whose solutions are A EE Na yw v E 109 Setting the boundary conditions the solution is P e 2 L _ e7A u L v x Imx e PE NINE ANT 110 In the example we set y 100 w 20007 6283 We choose to have 16 time steps in one period so that Dt 6 2x 1079 The resulting profile velocity is compared with the analytical one in figure 8 4s moving plate gt _ gt plate at rest gt u t l l l l l l l l l l J y SS Figure 7 Oscillating plates 75 Figure 8 Velocity profile for the oscilating plate problem 8 4 Linear advection diffusion in a rectangle File sine ep1 This is an example for testing the advdif module The governing equa tions are 0 0 tu 0 in0 lt 1 lt Lo ly lt o0 t gt 0 A cos wt cos ky atx 0 t gt 0 A ar ba at x Lz t gt 0 0 at 0 lt x lt Lz y lt t 0 As the problem is perdiodic in the y direction we can restrict the analysis to one quart wave length i e if Ay 27 k Ly A 4 then the above problem is equivalent to o o L 4 uc DAG in 0 lt g lt Lo 0 lt y lt Lyt gt O o A
18. 3 5 amplitude 2 3 etc Recall that each node field pair may depend on some free degrees of freedom those in U in 123 and some others fixed those in U For those fixed there is an array containing both an amplitude i e a double and a pointer to an Amplitude object If the condition don t depends on time then the pointer is null 14 8 3 How to add a new temporal function If none of the built in time dependent functions fit your needs then you can add you own temporal function Suppose you want a function of the form f 2 eee 155 ay 54 gt 0 Follow these steps 1 Give a name to it ending in _function We will use my_own_function in the following 2 Declare it with a line like AmplitudeFunction my_own_function In the core PETSc FEM this is done in dofmap h 3 Write the definition You can find typical definitions in file tempfun cpp You can use the macro SGETOPTDEF lt type gt lt name gt lt default_value gt for retrieving values from the hash table defined by the user in the data file 4 Register it in the temporal function table with a call such as the following prefer ably after the call to Amplitude initialize_function_table in the main For instance Amplitude initialize_function_table Amplitude add_entry smooth_ramp amp smooth_ramp_function lt add this line In the core PETSc FEM this is done inside the Amplitude initialize_function_table 5 Use
19. 9 1 2 Current matrix views a k a masks 2 OLA Set Operan sacas a ae SY Eade ee haw a 0 1 4 Dimension matching sa se s sorata a ds pa ON aana Ae ew 9 1 5 Automatic dimensioning a sasssa csere ec reren os 9 1 6 Concatenation of operations e e e e 9 2 Caching the adresses used in the operations oaoa a e 9 2 1 The FastMat2 operation cache concept 9 2 2 Branching is not always needed a 9 2 3 Cache mismatch sa 62 eee A ee a eaa ee es 9 2 4 When a cache mismatch is produced 20 O25 Bianchi Arias ica ke Ge aoe aa AR a a 9 2 6 Debugging tools 1 oaa a 92 7 Multithreading reantrancy ecos ee ee ee es O23 Debugging FastMatl code na oi a ee Be es O20 Pasties tip occiso a ee a y A 9 3 An older version of cache structure 2 2 2 0 0002022 eee 94 Synopsis Of OPerablons ss sro som b gese we ee a we es 9 4 1 One to one operations 0000 ee ee ee O42 Ineplace operations sa siria eek wR ER Ee aa a a a 9 4 3 Generic sum operations sum over indices 9 4 4 Sum operations over all indices 0 0005 e 9 4 5 Export Import operations s ss gb espn reaa penea 9 4 6 Static cache operations lt s sa sasora cmane u eee eee 10 Hooks 10 1 Launching hooks The hook list i ia semsa ee eH eae wR eR eS 10 2 Dynamically loaded hooks gt sa s aa aoras a ah ma ap aana Ka ee eee Da Shell Hook copar ek ae ek ee
20. End 2005 Key CLUSTER CHILE Code C 13680 4 Nro 23 Title C lculo paralelo en problemas de mec nica computacional a trav s del uso de una red de computadores personales Director Dres Marcela Cruchaga Norberto Nigro Financing agency Programa de Colaboracion Cientifico Acad mica entre Argentina Brasil y Chile 2000 2001 C 13680 4 Fundaci n Andes Begin 2000 End 2001 157 15 14 13 12 11 10 Key PIP PAR Code PIP 02552 2000 Title Generaci n de recursos de c lculo paralelo para mec nica computacional Director V E Sonzogni Fi nancing agency CONICET Begin 2000 End 2002 Key CAI D Code CAI D 2000 43 Title Desarrollo de algoritmos para c lculo paralelo Director Victorio Sonzogni Financing agency UNL Begin 2000 End 2002 Key PROA Code PICT 6973 99 Title Desarrollos en Mec nica Com putacional utilizando t cnicas de PRogramaci n Avanzada Director Sergio Idelsohn Financing agency ANPCyT Begin 2000 End 2003 Key MELT Code PID 99 76 Title MELT Modelado de Emulsificaci n de metales en estado L quido y sus efectos Termomec nicos Director A Cardona Financing agency ANPCyT FONCyT Begin 2000 End 2002 Key FLAGS Code PID 99 74 Title FLAGS Simulaci n num rica en gran escala de la interrelaci n entre el FLujo de Aguas Superficiales y el FLujo de AGuas Subterr neas Director S Idelsohn Financing agency ANPCyT FONCyT Begin 2001 End 2004
21. If you want to issue more complex commands then perhaps it s a better idea to bundle them in a script and then execute the script from the hook 98 hook_list shell_hook hello hello my_script_hook where you have previously written a my_script_hook script with something like bin bash This is my_script_hook file echo Hello world inside Probably you want to perform some actions depending on which stage you are so that you can pass the stage name to the command by including a s output conversion token in the command For instance hello echo Hello world stage s Moreover you can have also the time step currently executing and the current simulation time by including a d and a f output conversions for instance hello echo Hello world stage s step 4d time tf The order is important That is the first argument is the step a C string the second the time step an integer and the last the simulation time a double In fact basically what PETSc FEM does is to build string with sprintf and then execute it with system like sprintf command your_command stage step time system command see the Glibc manual for more info about sprintf and system If you need for some reason to switch the order then use a parameter number like hello echo Hello world time 3 f step 2 d stage 1 s If you want to do some things depending on the stage then perhaps you can write something like this h
22. This usually happens when the condition refer to some global option that is uniform over all elements For instance if branch 0 corresponds to include turbulence model A and branch 0 to model B then the same branch is executed for all the elements and there is no need to call the static functions Loops executed a non constant number of times Another special case is when there are loops inside the body of the outer loop Note that no special branching is needed in general if the loop is executed a fixed number of times since the sequence of operations is not altered from one execution to another For instance consider the following piece of code Case A Inner code executed a fixed number of times for int k 0 k lt N k N very large Outer loop block_before for int 11 0 11 lt 3 11 inner_block Operations that act on the 90 same matrices block_after Then the cache list generated in the first execution of the loop will be block_before inner_block inner_block inner_block block_after and this is OK since the number of times inner_block is executed is always the same If the operations that are performed inside the loop are the the same for all executions of the loop but are executed an irregular number of times then we can use a sequence as follows Case B Inner code executed a variable number of times FastMatCachePosition cpl for int k 0 k lt N k N very large
23. X 1 compute if options amp COMP_SOURCE The source vector 6 6 1 Options General options 39 double Courant default 0 6 The Courant number found in file adv cpp double Dt default 0 Time step found in file adv cpp double atol default 1e 6 Absolute tolerance when solving a consistent matrix found in file adv cpp int auto_time_step default 1 Chooses automatically the time step from the selected Courant number found in file adv cpp int consistent_supg_matrix default 0 Uses consistent SUPG matrix for the temporal term or not found in file adv cpp double dtol default 1e 3 Divergence tolerance when solving a consistent matrix found in file adv cpp int local_time_step default 1 Chooses a time step that varies locally Only makes sense when looking for steady state solutions found in file adv cpp int maxits default 150 Maximum iterations when solving a consistent matrix found in file adv cpp int measure_performance default 0 Measure performance of the comp_mat_res jobinfo found in file adv cpp int nfile default 1 Sets the number of files in the rotary save mechanism see 7 2 found in file adv cpp int nrec default 1000000 Sets the number of states saved in a given file in the rotary save mechanism see 7 2 found in file adv cpp int nsave default 10 Save state vector frequency in steps found in file adv cpp int n
24. a rs CC FastMat2 leave c prod a b A OK a has not mask Efficiency As we mentioned before When caching is enabled there is a gain in speed of ten to one hundreth and the library is very performant Of course the first execution of loop is not cached and represents and overhead that has to be amortized by executing the loop in cached mode many times The average speed increases when the number of executions of the loop is increased The cut point i e the number of executions of the loop for which the excution speed falls to one half the speed obtained for very large number o execution is currently between 10 and 30 so that for loops larger than 200 the overhead time spent in building the caches is negligible Another issue is the memory required by the caches First there is some space required by the caches themselves and then there is a copy of the addresses of the elements involved For instance in a a set b operation with a and b of size n x m say we have to store 2mn addresses Usually this overhead in memory requirement is negligible since the amount of variables and operations needed in the element routines are very small as compared with the size of the problem itself However some care must be taken when caching large inner loops For instance in code A section 9 3 if the inner loop is executed a constant but very large number M of times then the amount of the cache required is proportional to
25. amp A const int m 0 Maximum over all selected indices e FastMat2 min_abs const FastMat2 amp A const int m 0 Min of absolute value over all selected indices e FastMat2 max_abs const FastMat2 amp A const int m 0 Max of absolute value over all selected indices 9 4 4 Sum operations over all indices When the sum is over all indices the resulting matrix has zero dimensions so that it is a scalar You can get this scalar by creating an auxiliar matrix with zero dimensions casting with operator double as in FastMat2 A 2 3 3 Z assign elements to A double a double Z sum A 1 1 or using the get function double a Z sum A 1 1 getQ without arguments which returns a double In addition there is for each of the previous mentioned generic sum function a companion function that sums over all indices The name of this function is obtained by appending _a11 to the generic function double a A sum_square_all The list of these functions is e double sum_al1 const Sum over all indices e double sum_square_all const Sum of squares over all indices e double sum_abs_al1 const Sum of absolute values over all indices e double min_all const Minimum over all indices e double max_all const Maximum over all indices e double min_abs_all const Minimum absolute value over all indices e double max_abs_al1 const Maximum absolute value over all indices 95 9 4 5 Export Import operations
26. y_wall_plus 71 164
27. 0002 eee eee 50 6 12 1 Linear absorbing boundary conditions 51 6 12 2 Riemann based absorbing boundary conditions 52 6 12 3 Absorbing boundary conditions based on last state 53 6 12 4 Finite element setup e 53 6 12 5 Extrapolation from interiors o e e mos 54 6 12 6 Avoiding extrapolation sa o aacra cresp sad tareuas 55 6 12 7 Flux functions with enthalpy ooo a 56 6 12 8 Absorbing boundary conditions available 57 6 12 9 Related Options gt cn rie be ea a aoe eb a a eb Re ES 59 6 12 10 Absorbing wall boundary conditions o 60 0 12 11 Related OpONS s s s aos sa Bok aa a a ea eR Re Go 61 The Navier Stokes module 62 71 LES implementation o o gt e aa noreste a ew a 62 GLI The wall elemet oc coa ec sed neado paai sar 62 7 1 2 The mixed type boundary condition o sosoo e e 62 7 13 The van Driest damping factor Programming notes 64 T AE aa si leas de Ae be a Bo ad ee O ee 65 Ta Mesh movement uk aok kor a a ta a ee cla 72 Tests and examples 74 8 1 Flow in the anular region between to cylinders 74 8 2 Flow in a square with periodic boundary conditions 74 8 3 The oscilating plate problem 0 020000 ee eee 74 8 4 Linear advection diffusion in a rectangle 2 204 76 9 The FastMat2 matrix class 91 e o A a a ee a a E a 9 L 1 Exampl e 2 2 au sopd a ade a ee eo ae ee
28. 2 nu 3 ndof 1 The unknown for these nodes is the height u of the stream free water surface with reference to the stream bottom The channel shape and friction model and coefficients are entered via properties described below If the stream level is above the freatic aquifer level hyt u gt then the stream losses water to the aquifer and vice versa The equation for the aquifer integrated in the verical direction is o z S 6 m 6 V K o n V4 Y Ga 16 where S is the storativity and G is a source term due to rain losses from streams or other aquifers The equation for the stream is according to the Kinematic Wave Model KWM ap proach OAU QAH e 1 Ot Os E ng 43 Where u is the unknown field that represents the height of the water in the channel with respect to the channel bottom as a function of time and a linear arc coordinate along the stream A is the transverse cross section of the stream and depends through the geometry of the channel on the channel water height u Q is the flow rate and under the KWM model is a function only of A through the friction law Q 7A 18 where y Cy S1 2 P 1 and m 3 for the Ch zy friction model and y an 91 2 p 2 3 and m 5 3 for the Manning model where S dhp ds is the slope of the stream bottom P is the wetted perimeter and Ch a and n are model constants G represent the gain or loss of the stream and the main component is the loss to the aqu
29. AR A i a ia 10 4 Shell hooks with make osasse corape cumga Poad a pe es 11 Gatherers and embedded gatherers 11 1 Dimensioning the values vector aooo e eee eee 11 2 Embedded gather rs 2 6 2 6 bi seaga a kam ES ao ae ka 11 3 Automatic computation of layer connectivities o 11 4 Passing element contributions as per element properties 11 5 Parallel aspetta o s nc ee koae a Rk RARA 11 6 Creating a gatherer 2 e 12 Generic load elemsets 12 1 Linear generic load elemset o o e eee eee 12 2 Functional extensions of the elemset 12 3 The flow reversal elemset e 12 4 Examples of use of flow reversal elemset 0 o 17 77 77 78 79 79 79 79 79 79 81 81 82 82 84 84 84 85 85 93 93 94 94 95 96 96 96 97 97 98 100 101 102 103 105 106 107 107 13 Visualization with DX 13 1 Asynchronous synchronous communication s ooo 13 2 Building and loading the ExtProglmport module 13 3 Inputs outputs of the ExtProglmport module 13 4 DX hook options 14 The idmap class 14 1 Permutation matrices 14 2 Permutation matrices in the FEM context s lt a na ssu csc Sariwa 14 3 A small example 14 4 Inversion of the map 14 5 Design and efficiency restrictions e eee 14 6 Implementation 14 7 Block matrices 14 7 1
30. Absorbing boundary conditions The basic concept about these b c s is to impose the in going components from outside reference values while leaving the outgoing com ponents free If w u v p is the state vector of the fluid at some node j on the outlet boundary see figure 16 then Vw u are the eigen components where V 119 is the change of basis matrix The absorbing b c may be written as u V 1 127 wow N w w w Where w is the in going component taken from the reference value wWye and the other components we are free i e they go in U Non linear Dirichlet boundary conditions In some cases Dirichlet boundary conditions are not expressed in terms of the state variables used in the computations but on a non linear combination of them instead For instance consider the transport of moisture and heat through a porous media and choose temperature T and moisture content H as the state variables On an external boundary we impose that the partial pressure of water should be equal to its external value Py Pw atm As the partial pressure of water which may be related to relative humidity is a complex non linear function of T and H through the sorption isotherms of the porous media and the saturation pressure of water it results in a non linear link of the form P T H Pwatm By the moment we consider only a linear relation since the non linear case doesn t fit in the representation 123 The non l
31. Example 14 8 Temporal dependent boundary conditions 14 8 1 Built in temporal functions 2 00 14 8 2 Implementation details 0 o o 14 8 3 How to add a new temporal function 14 8 4 Dynamically loaded amplitude functions 14 8 5 Use of prefixes 14 8 6 Time like proble nce se ia k eR eh ee ee oa Se 15 The compute_prof package 15 1 MPI matrices in PETSc 15 2 Profile determination 16 The PFMat class 16 1 The PPMat abstract interlace so c r coia c m o kea e 16 2 IISD solver 16 2 1 Interface preconditioning e 16 3 Implementation details of the IISD solver 16 4 Efficiency considerations 17 The DistMap class 17 1 Abstract interface 17 2 Implementation details 17 3 Mesh refinement 17 3 1 Symmetry group generator 2 2 4 66 ee a be aa ee 1142 Canonical ordering cer cr A a 17 4 Permutation tree 17 5 Canonical ordering 17 6 Object hashing 18 Synchronized buffer 18 1 A more specialized class 112 113 114 114 115 116 116 116 120 123 123 123 124 124 125 126 131 131 132 134 136 136 136 137 138 138 138 140 141 143 144 145 145 147 148 150 150 151 151 153 19 Authors 156 20 Grants received 157 21 Symbols and Acronyms 159 Sil ALTOS ca A a A eS 159 22 Symbols 159 1 LICENSE The PETSc
32. FILE KeyedLine output field of the KeyedLine class Also the static int KeyedLine print_keys flag controls whether the key number is printed at the beginning of the line or not For instance the following code sends output to the output dat file without key numbers 155 FILE out fopen output dat w KeyedLine output out KeyedLine print_keys 0 kbuff flush fclose out Also the member int KeyedOutputBuffer sort_by_key default 1 controls whether to sort by keys prior to printing then you can set kbuff sort_by_key 0 if you want to disable sorting Some notes regarding usage of this class are e You can have several SyncBuffer lt gt s or Keyed0utputBuffer s at the same time and you can flush them independently e Memory usage All items sent to the buffer with push are kept in memory in a temporary buffer When flush O is called all objects are sent to the master printed and al buffers are cleared So that you must guarantee space enough in memory for all this operations e Implementation details Data is sent from the nodes to the master with point to point MPI operations which is far more efficient than writing all nodes to a file via NFS Sorting of the objects by key in the master is done using the sort algorithm of the list lt gt STL container which is O N log N operations 19 Authors Mario A Storti CIMEC PETSc FEM kernel NS and AdvDif modules Fluid Structure
33. Groundwater flow In the previous section the equation that governs subsurface flow was established In order to obtain a well posed PDE problem initial and boundary con ditions must be superimposed on the flow domain and on its limits The initial condition for the groundwater problem is a constant hydraulic head in the whole region that obeys levels observed in the basin history Now consider a simply connected region Q bounded by a closed curve OQ such that 48 Ng U OQ U Ngo ON If the stream is partially penetrating and connected in a Hydraulic sense to the aquifer we set p o on Ng x 0 t K 9 ne 00 on 00 x 0 t 37 Klo m C 9 h on Ngo X 0 t where dy is a given water head oo is a given flux normal to the flux boundary 00 and C the conductance at the river stream interface If a fully penetrating stream is considered o Klo n C h on Msg x 0 4 38 Finally for a perched stream o Kl m Clhy h on Dgo x 0 1 39 Surface Flow Fluid Boundary We recall that the type of a flow in a stream or in an open channel depends on the value of the Froud number F u c where c gh is the wave celerity a flow is said e fluvial for u lt c e torrential for u gt c Saint Venant equations Considering a Cauchy problem for the time like variable m where the solution is given in the subspace 4M t 0 as U U x t 0 and is determined at subsequent values of
34. J 2 2 2 FastMatCacheList cache_list FastMat2 activate_cache amp cache_list Compute area of elements chrono start for int ie 0 ie lt nelem ie 4 FastMat2 reset_cache for int k 1 k lt 3 k 4 int node ICONE ie k 1 x ir 1 k set amp XNOD node 1 0 rsQ rsQ set x ir 1 2 rest x ir 1 1 9 pp YY o set x ir 1 3 b rest x ir 1 1 J ir 1i 1 set a ir 1 2 set b qa double area J rs det 2 total_area area 85 if ie 0 4 minarea area maxarea area if area gt maxarea maxarea area if area lt minarea minarea area printf total_area g min area g max area g ratio g n total_area minarea maxarea maxarea minarea printf Total area OK s n fabs total_area 1 lt 1e 8 YES NOT double cpu chrono elapsed FastMat2 print_count_statistics printf CPU g number of elements d n rate g sec Mel g Mflops n cpu nelem cpuxte6 nelen nelem FastMat2 operation_count cpu 1e6 FastMat2 void_cache FastMat2 deactivate_cache The typical use can be seen in the example shown in section 89 1 1 First we have to create a FastMatCacheList object as shown in line 3 and activate it with as in line 4 The outer loop here is the loop over elements For the second and following executions of the body of the loop you must rewind the cache list this is done in line 8 with the FastMat2 reset_c
35. Key GERMEN CFD Code PIP 0198 98 Title Germen CFD GEneraci n de Recursos b sicos para la aplicaci n de los M todos Num ricos en din mica de fluidos computacional Director M Storti Financing agency CONICET Begin 1999 End 2001 Key PEI CFD Code PEI 231 97 Title PEI 231 CONICET Dise o mec cnico asistido por CFD Director N Nigro Financing agency CONICET Begin 1998 End 1999 Key PEI NAVAL Code PEI 232 97 Title PEI Nro 232 CONICET M todos Num ricos en Hidrodin mica Naval y Costera Director M Storti Financing agency CONICET Begin 1998 End 1999 Key SINUS PIM B Title Proyecto Alpha de la Comisi n Europea Sinus Pim B Modelisation et Simulation Numeriques en Ingenierie Mecanique Director S R Idelsohn y V Ruas Univ Paris VI Laboratoire de modelisation en Mecanique Financing agency CONICET CEE Begin 1997 End 1999 Key CMES Title Proyecto Alpha de la Comisi n Europea CMES Computer Methods in Engineering Science Director S R Idelsohn y G Beer Institut fiir Baustatik Graz Austria Financing agency CONICET CEE Begin 1997 End 1999 Key TUCANO Title Proyecto Alpha de la Comisi n Europea TU CANO Transatlantic University Industry Cooperation Director S R Idelsohn y S Mac Neill Univ de Birmingham Financing agency CONICET CEE Begin 1997 End 1999 Key CONICET FNRS Title Convenio CONICET Fonds National de la Recherch
36. Outer loop block_before FastMat2 get_cache_position cp1 int n irand 1 5 for int 11 0 11 lt nm 11 4 FastMat2 jump_to cp1 inner_block Operations that act on the same matrices FastMat2 resync_was_cached block_after Here the number of times the inner block is executed may vary randomly from 1 to 5 irand m n returns an integer number randomly distributed between m and n The FastMatCachePosition class objects store the position of the actual computation in the cache list So that the call to jump_to at the start of the loop restarts the position in the cache to the desired one After leaving the loop we call to resync_was_cached in order to resync the cache list This is OK if the inner loop is executed at least once the in the first execution of the outer loop If it happens that in the first execution of the loop the inner loop is not entered then the cache list will contain block_before block_after and when the inner block will be entered in subsequent executions of the loop and error will arise since there will be missimng caches To fix this we have to combine this with branching as here FastMatCachePosition cpl for int k 0 k lt N k N very large Outer loop 91 pasi Previous block FastMat2 branch Allows conditional execution FastMat2 choose 0 FastMat2 get_cache_position cp1 n irand 0 5 if k 0 n 0 This is the critical case n 0 the first execution
37. can be done automatically see 11 3 e The size of the elements in the normal direction are not required to be equispaced i e the distances 1 6 and 6 11 are not required to be equal or similar This is important because normally the layers of nodes are refined towards the surface as shown in the figure e The lines of nodes are not required to be normal to the surface e However it is required that each row of nodes for instance the row 1 6 10 must lay on a smooth curve This is required since in the process of computing high normal derivatives a Taylor expansion is computed in terms of this curves so the error depends on the higher derivatives e g the curvature of the line The typical invocation is as follows elemset nsi_tet_les_full 4 geometry cartesian2d __END_HASH__ 127 6 6 7 12 11 2387 __END__ELEMSET__ elemset visc_force_integrator 6 geometry line2quad __END_HASH__ 126711 12 2 3 7 8 12 13 3489 13 14 __END__ELEMSET__ Note that the geometry is special line2quad means that the surface geometry are lines and the corresponding volume elemset is composed of quads The other two possibilities implemented so far are tri2prism and quad2hexa see figure 13 Typical invocation is as follows The typical invocation is as follows 104 tri2prism quad2hexa Figure 13 Embedded gatherer elements in 3D elemset visc_force_integrator 9 geometry tri2prism __END_HASH__ 1234567809 __END__ELEMSET__ and el
38. can be stored in another integer array iident n so that both Q and Q7 can be stored at the cost of n integer values for each Also both multiplication and inversion operations as y Qx y QT x x Q y and x Q 7 y can be done with n addressing operations 14 2 Permutation matrices in the FEM context Permutation matrices are very common in FEM applications for describing the relation between node field pairs and degrees of freedom abbrev dof s For instance consider a flow NS or Euler application in a mesh like that shown in figure 16 At each node we have a set of unknown fields both components of velocity and pressure u v and p In a first description we may arrange the vector of unknowns as ul U1 Pl u Gee is 122 UNnod UNnod PNnod 116 Figure 16 Node field to dof map Example for NS or Euler in a duct The length of this vector is Nnoq X Nao and this may be called the node field this accounts for the NF subindex description of the vector of unknowns However we can not take this vector as the vector of unknowns for actual computations due to a series of facts e Not all of them are true unknowns since some of them may be imposed to a given value by a Dirichlet boundary condition e There may be some constraints between them for instance in structural mechanics two material points linked by a rigid bar or a node that is constrained to move on a surface or line In CF
39. derivatives of the eigenvalues with respect to the node coor dinates which are the most expensive part can be computed analytically In this way we can compute the derivatives of the functional with the solution of only one eigenvalue problem The second derivatives can be computed similarly as 101 2 2 2 OF 0 F OA E OF OX 102 OFpityj OAgAr OL pi OL yg OAg OL pitos The first and second derivatives of F with respect to the eigenvalues are still computed numerically whilst the second derivatives of the eigenvalues can be computed by differen tiating numerically the first derivatives This amounts to O nen eigenvalue problems for computing the first and second derivatives of the distortion functional the residual and Jacobian This cost may be further reduced by noting that the eigenvalues are invariant under rotations and translations and simply scaled by a dilatation or contraction so that from the nen Ndim Ndim 1 displacements only Ndim Ndim 1 2 1 should be really computed but this is not implemented yet For tetras in 3D this implies a reduction from 12 distortion functional computation to only 5 We will show now how the derivatives of the eigenvalues are computed analytically It can be shown that zuk id Ed ane 103 73 Note that this is as differentiating 98 but only keeping in the right hand side the change in the matrix G and discarding the rate of change of the eigenvectors It can
40. effect then PETSc FEM computes the envelope of the mask i e the rectangular mask that contains the mask in order to make just one call to MatSetValues In this case the envelope is just a matrix filled with 1 s so that block uploading pays the benefit of us ing the faster MatSetValues routine with the cost of loading much more coefficients than the original mask In addition the PETSc matrix will be bigger with the corresponding increase in RAM demand and CPU time in computing factorizations IISD solver and matrix vector products PETSc solver In a future such a combination of terms will be loaded more efficiently with two calls to MatSetValues The conclusion is that if the terms to be loaded have a very sparse structure but a dense envelope then may be block uploading is slower The worst case is a diagonal like mask Note that also it s not sufficient to have a sparse structure of the elemental matrix but also the application writer has to compute and return the mask Finally note that you can always check whether block uploading is faster or slower by activating the time statistics for the elemset the report_consumed_time and run a large example with both kinds of uploading u u u p momentum eq pressure gradient ter x y z y SS o0 0 0 1 momentum A _ 0 0 0 1 momentum e mask i i 0 0 0 1 momentum 1___1___17 0 continuity us cont equation Galerkin term Figure 29 Example of mask fo
41. fact that all the indices 0 to 3 are present in the first column of the table above If we chose node 2 as the first index then we can chose any of the reamining as the second node 0 1 3 Once we choose the second index say for instance 1 there is only one possibility for the reamining two the numbering 3 1 0 2 The remaining possibility 3 1 2 0 would gnerate an inverted triangle Part of the tree is shown in 39 Every possible permutation is a path in the tree from the first node at level 0 to the last node which is a leave 150 root 1 s TN 41 a Pin ae a 1 G N O u N Figure 39 Tree representing all the permutations for the ordered tetra geometry Now given two possible node orderings for a given geometrical object and the tree that describes the permutations for its shape we simply have to follow the path that makes one of them to fit in the other If we arrive to an internal node without possibility to follow then the geometrical objects are distinct If we reach a leave then the objects are the same 17 5 Canonical ordering Another possibility to determine whether two node oderings are congruent is to have a uniquely determined ordering that can be computed from the node sequence itself We we call this ordering the canonical ordering for the geometrical object If this is possible then given two orderings we can bring both of them to the canonical ordering and then compare them
42. h C i that is lt Ctr1 h gt lt Ctrl i gt this is the key binding for info lookup symbol You will get in the minibuffer something like Describe symbol default print_internal_loop_conv If you press RET then you jump to the corresponding page in the info file From there you can browse the whole manual Pressing s Info search allows to search for text in the whole manual When done you can press q Info exit or x my Info bury and ki11 defined in tools petscfem el1 27 If you don t know exactly what option you are looking for then you can search in the manual or launch info lookup symbol with the start of the command and then use completion to finish writing the option If you want to copy some option from the info manual to the data file then you can use the usual keyboard or mouse copy and paste methods of Emacs Also pressing c my Info lookup copy keyword in the info manual copies the name of the currently visited option to the kill ring Then in the data file buffer press as usual C y yank to paste the last killed option If you paste several options from the manual then you can navigate between them by pasting the last with C y and then going back and forth in the kill ring with M y yank forw usually M stands for pressing the lt Alt gt or lt Escape gt key For more information see the Emacs manual specially the documentation for the info and info lookup modes 5 Text hash tables In many cases
43. h T Too e A nonlinear Newtonian cooling term q f T T 108 Figure 15 Generic load element double layer 12 1 Linear generic load elemset In the simplest case the load is of the form q hU q single layer 117 q h Ucut U q double layer ae This is implemented by the lin_gen_load elemset This elemset may be single layer or double layer see figures 14 and 15 Double layer elements can represent a lumped thermal resistance for instance a very thin layer of air inside a material of higher con ductivity In the double layer case the number of nodes is twice than in the single layer case Double layer is activated either by including the double_layer option or either if the number of nodes is twice that one specified by the geometry The options for the lin_gen_load elemset are e int double_layer default 0 Whether there is a double or single layer of nodes found in file genload cpp e string geometry default cartesian1d Type of element geometry to define Gauss Point data found in file genload cpp e double var_len hfilm_coeff default no default Defines coeffcients for the film flux function May be var_len 0 no AT driven load or var_len ndof ndof a full matrix of relating the flux with AU found in file linhff cpp e double var_len hfilm_source default no default Defines constant source term for the generic load on surfaces May be of length 0 null load or nd
44. implemented as mixed type e The van Driest damping factor introduces non localities in the sense that the tur bulent viscosity at a volume element depends on the state of the fluid at a wall 7 1 1 The wall elemset Wall boundary conditions have been implemented via a wall elemset This is a surface element that computes given the velocities at the nodes the tractions corresponding to this velocities for a given law of wall Also this shear velocities as stored internally in the element so that the volume elements can get them and compute the van Driest damping factor This requires to find for each volume element the nearest wall element This is done before the time loop with the ANN Approximate Nearest Neighbor library 7 1 2 The mixed type boundary condition The contribution to the momentum equations from the wall element is Rgs tyN dy 78 where Rip is the contribution to the residual of the p th momentum equation of the node i N is the shape function of node i and tp are the tractions on the surface of the element Xe The wall law is in general of the form f y 79 where u is the tangent velocity u the shear velocity u Tw p and y yu v the non dimensional distance to the wall We have several possibilities regarding the positioning of the computational boundary We first discuss the simplest that is to set the computational boundary at a fixed yt position Note that this means that the real
45. in the Makefile file Note that PETSc FEM comments those starting with numeral may collide with the ePerl preprocessing directives so that when commenting out lines in PETSc FEM input files it is safer to leave a space between the character and the commented text commented text OK commented tex but dangerous 4 3 5 File inclusion In addition to the inclusion allowed in the internal preprocessor via the command ePerl has his own inclusion directive for instance some text include home mstorti PETSC petscfem doc options txt another text and provided file options txt contains 20 __INCLUDE__ File options txt opioni valuel option2 value2 then the previous block expands to some text File options txt opioni valuel option2 value2 another text Including with the internal preprocessing directive __INCLUDE__ has the advantage of not creating an intermediate file On the other hand including with the ePerl directive allows recursive ePerl preprocessing and more versatility in defining the inclusion path with the INC list see Perl documentation 4 3 6 Use of ePerl in makefiles Usually user data files have extension dat When preprocessing with ePerl the convention is to use epl suffix for the file written by the user with ePerl commands i e the files to be preprocessed and suffix depl1 for the preprocessed file A line in the Makefile of the form depl epl eperl P lt
46. in the matrix For the IISDMat class it sets the value in the appropriate block Arz Art Apr or Arr PETSc matrices In addition for the Azz local local block it has to buffer those values that are not in this processor this can happen when dealing with periodic boundary conditions or bad partitionings for instance an element that is connected to all nodes that belong to other processor This last case is not the most common but it can happen Once you have filled the entries in the matrix you have to call the assembly_begin and assembly_end members as in PETSc The solve member solves a linear ssytem associated to the given operator The zero_entries is also the counterpart of the corresponding PETSc routine The build_sles member creates internally the SLES needed by the solution included the preconditioner It takes as an argument a TextHashTable from where it takes a series of options The destroy_sles member has to be called afterwards in order to destroy it and free space The monitor member allows the user to redefine the convergence monitoring routine The view member prints operator information to the output 16 2 IISD solver Let s consider a mesh as in figure 26 partitioned such that a certain number of elements and nodes belong to processor 0 and others to processor 1 We assume that one unknown 138
47. of strings to the hook_list namely the type pf hook and the name of the hook This last one is a unique identifier that makes that hook unique hook_list lt hook type 1 gt lt hook name 1 gt lt hook type 2 gt lt hook name 2 gt For instance hook_list shell_hook compress dx_hook my_dx_hook dl1_hook coupling_hook Here we added a shell hook that probably will compress some files during execution the DX hook in order to visualize and a dynamically linked hook that will couple the run with another program Each hook will after take their own options from special options in the table The hook_list entry must be unique so that you have to group all your hooks in a single hook_list entry This is not limiting because you can add as much hooks as you want but it is rather syntactically cumbersome because you end up with a long string Also it becomes difficult to comment out some hooks while keeping others The hooks are executed in the order as you entered them in the hook list so that in the previous case you will have at init time the init part of the compress hook to be executed before the init part of the my_dx_hook and finally the init part of the coupling_hook 10 2 Dynamically loaded hooks The easiest way to code a dynamically loadable hook is with a class You need to in clude the corresponding headers hook h and dlhook h and the class may define the hook 97 functions for all some or none of the hook launchi
48. of the loop for int 11 0 1l lt n 11 4 FastMat2 jump_to cp2 dd inner_block FastMat2 resync_was_cached FastMat2 leave sabi posterior block Off course if the number of times the inner loop is executed is very large and the most time consuming part is the execution of this loop then it may be convenient to choose this loop as the outer one Masks can t traverse branches Another restriction is that if branching is used the mask that is active at a certain FastMat2 cached operation must be the same independently of the path that the code have followed for instance consider the following code FastMat2 a b c resize and set a b c for int j 0 j lt N j FastMat2 branch if condition 4 FastMat2 choose 0 5 UNO EL a O E B operate on masked a FastMat2 leave c prod a b A Wrong a may have or not the mask set J When the code reaches the prod method at line A it can have executed or not the block inside the if so that the mask set in line B may or may not be active at line A This is clearly an error and to avoid it the safest way is to always reset the masks at the outlet of a branched block like in line C as follows FastMat2 a b c resize and set a b c 92 for int j 0 j lt N j FastMat2 branch if condition 4 FastMat2 choose 0 a is ir B operate on masked a
49. position od the boundary y changes during iteration In this case 79 can be rewritten as Tw glu tt 80 where or 62 The traction on the wall element is assumed to be parallel to the wall and in opposite direction to the velocity vector that is tp g U Up 83 Replacing in 78 the residual term is Rip f g u upN dE 84 The Jacobian of the residual with respect to the state variables needed for the Newton Raphson algorithm is ip j N d amp Jip ja a u up A ow 3 deg 2 ma Oujq PO but l that A dup OUulp l bujq bujq Y y p Ni is l pg Nj Similarly U Up Up Up Ni UmpNm 88 and m e 89 Oujq P Ousq so that r u Up 2N 90 bujq u 90 Replacing in 85 Jipj f 90 Sng upu N N d 91 63 7 1 3 The van Driest damping factor Programming notes This is a non standard issue since the computation of one volume element requires in formation of other wall elements First we compute the wall element that is associated to each volume element assemble is called with jobinfo build_nneighbor_tree This jobinfo is acknowledged only by the wall elemsets which compute their geometrical center and put them in the data_pts STL array Then this is passed to the ANN package which computes the octree All this is cached in the constructor of a WallData class After this a call to assemble with jobinfo get_nearest_wall_element is acknowledged by all t
50. sent through the output_field_list output tab Basically a field is constructed for each possible combination of positions connections and data objects This may seem a huge amount of fields but in fact as the arrays are passed internally by pointers in DX the additional memory requirements are not large At the time of writing this this is a set of nn ne ne fields since nn 1 A name is generated automatically for each field Some information is send to the DX Message Window Also it s very useful to put Print modules downstream of the ExtProgImport module in order to see which arrays and fields have been created The communication between DX and PETSc FEM is done through a socket PETSc FEM acts as a server whereas DX acts as a client PETSc FEM opens a port option dx_port and DX connects to that port the port input tab in the ExtProgImport module Currently the standard port for the DX PETSc FEM communication is 5314 DX can communicate with PETSc FEM running in the background ad even on other machine the serverhost input tab At each time step DX sends a request to PETSc FEM which answers sending back arrays and fields 13 1 Asynchronous synchronous communication e In synchronous mode steps gt 0 PETSc FEM waits each steps time steps a con nection from DX Once DX connects to the port PETSc FEM transmits the required data and resumes computation This is the appropriate wa
51. t If the subspace t 0 is bounded by a surface r x then additional conditions have to be imposed on that surface at all values of t This defines an initial boundary value problem A solution for the system of the first order PDE s can be written as a superposition of wave like solutions of the type corresponding k to the n eigenvalues of the matrix A n oF n k 1 dim being n the outward unit normal to the boundary edge n Us Y Wye M8 on Mg ex 40 a 1 where summation extends over all eigenvalues Aq As the problem is hyperbolic n initial conditions for the Cauchy problem have to be given to determine the solution That is equal number of conditions as unknowns must be imposed at t 0 For initial boundary value problem the n boundary conditions have to be distributed along the limits at all values of t according to the direction of the propagation of the corresponding waves If the wave phase velocity the a eigenvalue of AF n i e k wave projected in the interior normal direction n is positive the information is propagated inside the domain Hence the number of conditions to be imposed for the initial boundary value problem at a given point of OI is equal to the 49 number of positive eigenvalues of A n at that point The total number of conditions remains equal to the total number of eigenvalues i e the order of the system For the treatment of the boundary conditions we will use the one dimensiona
52. that the eigenvalues are decreasingly ordered ie Aj gt Ag if j lt k So that the positive eigenvalues are the first n ones The boundary conditions for the incoming waves 46 can be written as 1 U Uo 0 j 1 n 51 where 1 is a row of S i e a left eigenvalue of Ay If U is close to Up we can write 51 as U dU 0 j 1 n 52 If this differential forms were exact differentials i e if 1 U dU dw U for all 1 nqor 53 for some functions w U then we could impose as absorbing boundary conditions w U w Uo fj lin 54 Let s call to these w functions invariants As there are naof invariants we can define as a new representation of the internal state the w variables Note that as Qw OU S7 and S is non singular due to the hyperbolicity of the system the correspondence between w and U is one to one Assume that the value at the boundary reaches a steady limit value of U i e U 0 t gt Ux for t 00 55 If all the waves were incoming nt ndor then the set of boundary conditions 54 would be a set of ngo non linear equations on the value U As the correspondence U w is one to one the boundary conditions would mean w U w Uo and then Us Up But if the number of incoming waves is nt lt naof then it could happen that Us Uo In fact for a given Up the limit value Us would belong to a curved n dimensional curvilinear manifold Even if the lim
53. the same local variables without branching or looping Special operations have to be executed if there are branching conditions if sentences that alter the order of execution of the operations in the loop For instance an if sequence like this op_prev_1 operations previous to the branch 87 op_prev_2 op_prev_k if lt condition 0 gt 4 op_0_1 op_0_2 conditional block for branch 0 op_0_N else if lt condition 1 gt 1 op_1_1 op_1_2 conditional block for branch 1 op_1_N else op_1_1 op_1_2 conditional block for the else branch op_1_N op_pos_1 operations posterior to the branch op_pos_2 op_pos_k If branch 0 is followed in the first execution of the block then the cache will look like that shown in figure 11 When in a subsequent execution of the loops another branch is chosen say branch 1 then when reading trying to execute operation op_1_1 the library will find in the cache list a cache corresponding to operation op_0_1 This is solved by creating branch points in the cache list and choosing the appropriate branch as shown in the following code see figure 10 op_prev_1 operations previous to the branch op_prev_2 op_prev_k FastMat2 branch if lt condition 0 gt FastMat2 choose 0 op_0_1 op_0_2 conditional block for branch 0 op_0_N else if lt condition 1 gt 4 FastMat2 choose 1 88 op_prev_1 y
54. they are not taken into account for the partitioning Another reasons is very bad partitioning that may arise in some not so common situations Consider for instance figure 28 Due to bad partitioning a rather isolated element e belongs to processor 0 while being surrounded by elements in processor 1 Now as nodes are assigned to the highest numbered processor of the elements connected to the node nodes p g and r are assigned to processor 1 But then nodes q and r will belong to the local subset of processor 1 but will receive contributions from element e in processor 0 However the solution is not to define this matrices as PETSc because so far PETSc doesn t support for a distributed LU factorization The solution we devised is to store those Ary contributions that belong to other processors in a temporary buffer and after to send those contributions to the correct processors directly with MPI messages This is performed with the DistMatrix object A_LL_other 16 4 Efficiency considerations The uploading time of elements in PETSc matrices can be significantly reduced by using block uploading i e uploading an array of values corresponding to a rectangular sub matrix not necessarily being contiguous indices instead of uploading each element at a time The following code snippets show the both types of uploading ELEMENT BY ELEMENT UPLOADING The global matrix Mat A row and column indices both of length nen
55. through the eigenvalues A then we can compute the new coordinates x as cad 0 99 OX pj A possible distortion functional is F Ag Hyg 7 4 Y Ag Ar 100 qr This functional has several nice features Is minimal whenever all the eigenvalues are equal the distortion is minimal It is non dimensional so that an isotropic dilatation or contraction doesn t produce a change in the functional The non linear problem 99 can be solved by a Newton strategy by computing the first and second derivatives of F with respect to the node displacements the residual and Jacobian of the system of equations by finite difference approximations However this turns to be too costly in CPU time for the second derivatives since we should compute for each second derivative three evaluations of the functional and there are Nen Nen 1 2 where nen Ndof Ne is the number of unknowns per element ne is the number of nodes per element and Nq4of the number of unknowns per node here ndof Ndim second derivatives to compute Each evaluation of the functional amounts to the computation of the tensor metrics G and to solve the associated eigenvaue problem so that an analytical expression to at least the first derivatives is desired The derivatives of the distortion functional can be computed as OF OF AX Ox o Aq Ob and then the derivatives with respect to the eigenvalues can be computed still numerically while we will show that the
56. total and are generated by a rotation and an inversion e Oriented triangles are described by numbering their nodes in a specified direction Their symmetries are 3 en total generated by a rotation e Unoriented quadrangles are described by numbering their nodes in clockwise or counterclockwise rotation Symmetry generators are 1 2 3 0 rotation and 0 3 2 1 inversion there is a total of 8 permutations 148 2 2 O O Figure 35 Generators for quadrangle Oriented quadrangles are identical to unoriented quadrangles but do not include the inversion 4 rotations G G inv D Figure 36 Generators for tetras Unoriented tetrahedra are described by numbering one of the faces and then the oposite node Symmetries are generated by permutation of any two of the faces for instance 1 2 0 3 and 3 0 2 1 and inversion 1 0 2 3 24 permutations in total Oriented tetrahedra is identical to unoriented tetrahedra without inversion 24 per mutations in total Figure 37 Generators for hexas Unoriented hexas are described by numbering one of the faces as a quad and then the opposite face in correspondence with the first face Symmetries are generated by 90 rotations for any two of the faces not opposite for instance 1 2 3 0 5 6 7 0 and 1 5 6 2 0 4 7 3 and inversion 1 0 2 3 232 permutations in total Oriented hexahedra is identical to unoriented hexahedra without inversion 112 per mutations in tot
57. us spent in performing multiplications and additions and negligible CPU time is spent in auxiliary operations 9 2 1 The FastMat2 operation cache concept The idea with caches is that they are objects class FastMatCache that store the adresses needed for the current operation In the first pass through the body of the loop a cache object is created for each of the operations and stored in a list This list is basically an STL list of cache objects When the body of the loop is executed the second 79 time and the following the adresses of the matrix elements are not needed to be recom puted but they are read from the cache instead The use of the cache is rather automatic and requires some intervention by the user but in some cases the position in the cache list can get out of sync with respect to the execution of the operations and severe errors may occur The basic use of caching is to create the cache structure FastMat2 CacheCtx2 ctx and keep the position in the cache structure synchronized with the position of the code The process is very simple when the code consists in a linear sequence of FastMat2 operations that are executed always in the same order In this case the CacheCtx2 object stores a list of the cache objects one for each FastMat2 operation As the operations are executed the internal FastMat2 code is incharge of advancing the cache position in the cache list automatically A linear sequence of cache operations that are e
58. your rights with two steps e copyright the software and e offer you this license which gives you legal permission to copy distribute and or modify the software Also for each author s protection and ours we want to make certain that everyone understands that there is no warranty for this free software If the software is modified by someone else and passed on we want its recipients to know that what they have is not the original so that any problems introduced by others will not reflect on the original authors reputations Finally any free program is threatened constantly by software patents We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses in effect making the program proprietary To prevent this we have made it clear that any patent must be licensed for everyone s free use or not licensed at all The precise terms and conditions for copying distribution and modification follow 2 2 GNU General Public License Terms and Conditions for Copying Distribution and Modification 0 This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License The Program below refers to any such program or work and a work based on the Program means either the Program or any derivative work under copyright law that is to say a work containing the Program or a
59. yr _ vr k Al k de A k 1 T k 1 0 k gt 1 where Vm are the Lagrange multipliers for imposing the new conditions Note that if 7 is an incoming wave A gt 0 then the equation is of the form Ujo Vrefo 0 yrel yr yoti g jo JO Sl JO A At Aj ra Vj im 0 69 1 1 Uk mil y ikt o o b gt At E 2h q AS Note that due to the m Lagrange multiplier we can solve for the vj values from the first last rows the value of the multiplier 1m adjusts itself in order to relax the equations in the second row On the other hand for the outgoing waves A lt 0 we have Vj lm 9 1 1 Vio Vo o Y Vo 0 At aa 70 n 1 n n 1 n a Ms E ay aT At 2h 55 So that the solution coincides with the unmodified orginal FEM equation and vj im 0 Coming back to the U basis we have I User Uo _ Une EN IT Uret Uim 0 Ugh UG UT US a up ur U eUl A 0 k gt 1 At de 2h And finally coming back to the FEM equations 59 TI Uer Uo _ Urner F IT Uet Uim 0 Fo Ugt UT 11 Uim RET FLUJO Up Ug Rp n 1 n 1 n ly _ n 1 F U 1 Uk Uki R In conclusion in this setup we do not need to make extrapolations to the variables and then there is no need to have a structured line of nodes near the boundary It s only required to have an additional fictitious node at the boundary in order to hold the Lagrange multiplier unknowns Um and to add the
60. 1 EP Sa sb push_back obj1 flush the buffer sb flush The macro SYNC_BUFFER_FUNCTIONS takes as argument the name of the basic object class and generates a series of wrapper functions This should be done in the templates themself but due to current limitations in template specialization it has to be done through macros 154 18 1 A more specialized class A more simple class derived from SyncBuffer lt gt has been written This class is based on SyncBuffer lt gt using as KeyedObject the class KeyedLine which has simply an integer and a C string Typical usage is include lt src syncbuff h gt include lt src syncbuff2 h gt KeyedOutputBuffer kbuff AutoString s for E s clear Load string s with cat_sprintf functions TP as Load in buffer kbuff push k s kbuff flush Also you can directly use a printf like semantics in the following way include lt src syncbuff h gt include lt src syncbuff2 h gt KeyedOutputBuffer kbuff int key for E vel Qed Load internal string with printf and cat_printf functions kbuff printf one line d d d fWn i j k a kbuff cat_printf other line d d dWn m n p Push in buffer with key q push also clears the internal string kbuff push q kbuff flush If you want to dump the buffer on another stream like a file for instance then you have to set the static
61. 1 2 in a Also we can use x set y with y a Newmat matrix Matrix y or an array of doubles double xy 9 1 4 Dimension matching The x set y operation requires that x and y have the same viewed dimensions As the ir 1 2 operation restricts index to the value of 2 x ir 1 2 is seen as a row vector of size 2 and then can be copied to a If the viewed dimensions don t fit then an error is issued 9 1 5 Automatic dimensioning In the example a has been dimensioned at line 3 but most operations perform the di mensioning if the matrix has not been already dimensioned For instance if at line 3 we would declared FastMat2 a only without specifying dimensions then at line 12 the matrix is created and dimensioned taking the dimensions from the argument The same applies to set Matrix amp but not to set double since in this last case the argument double don t posses information about his dimensions Other operations that define dimensions are products and contraction operations 9 1 6 Concatenation of operations Many operations return a reference to the matrix return value FastMat2 amp so that operations may be concatenated as in A ir 1 k ir 2 j 9 2 Caching the adresses used in the operations If caching is not used the performance of the library is poor typically one to two orders of magnitude slower than the cached version and the cached version is very fast in the sense that almost all the CPU time
62. 6 00000 _END_FIXA__ Ooo0oorFRroo 00000 NNNNNNNI 14 8 1 Built in temporal functions The following temporal functions are currently available in PETSc FEM ramp Double ramp function See 22 Q ci b t 4 de cda 138 dits t t ift lt t lt t 126 where the slope s is _ 2 01 The parameters are start_time t default 0 The starting time of the ramp start_value default 0 The starting value of the ramp slope s default 0 The slope in the ramp region end_time t2 default start_time The end time of the ramp end_value 2 default start_value The end value of the ramp Only one of slope and end_value must be specified If the end values are not defined then they are assumed to be equal to the starting ones If the end time is equal to the starting time then the end time is taken as oo i e a single ramp starting at start_time A Q to ty Figure 22 Ramp function smooth_ramp Smooth double ramp function hyperbolic tangent See fig ure 23 d tpo 0100 2 T 2 a TN 5 140 This function is somewhat analogous to ramp in the sense that it goes from a starting constant value to another final one but smoothly Beware that the start and end values are reached only for t gt oo switch_time t default 0 The time at which passes by the mean value between o and 1 time_scale 7 default none This somewhat
63. 777 freatic surface Figure 3 Aquifer stream system Transverse 2D view This module solves the problem of subsurface flow in a free aquifer coupled with a surface net of 1D streams To model such system three elemsets must be used an aquifer 42 aquifer node Figure 4 Aquifer stream system Discretization system representing the subsurface aquifer a stream elemset representing the 1D stream and a stream_loss elemset representing the losses from the stream to the aquifer or vice versa see figures 2 and 3 The aquifer elemset is a 2D elemset with triangle or quadrangle elements see figure 4 A per element property eta represents the height of the aquifer bottom to a given datum The corresponding unknown for each node is the piezometric height or the level of the freatic surface at that point On the other hand the stream elemset represents a 1D stream of water It has its own nodes separate from the aquifer nodes whose coordinates must coincide with some corresponding node in the aquifer For instance the triangular aquifer element in the figure is connected to nodes n1 n2 and n3 while the stream element is connected to nodes n4 and n5 nl and n5 have the same coordinates but different unknowns and also n2 and n4 A node constant field so called H fields represents the stream bottom height hp with reference to the datum So that normally we have for each node two coordinates and the stream bottom height ndim
64. 95 2004 Title Integraci n de procesos del complejo suelo agua planta para una mejor planificaci n h drica en la cuenca inferior del R o Salado Director Dr Leticia Rodr guez Financing agency FONCyT Begin 2006 End 2007 Key PIP 2005 Code PIP 5271 Title Mec nica Computacional en Prob lemas de Multif sica Director Dr M A Storti Financing agency CONICET Begin 2005 End 2008 Key CAI D 05 Code CAI D 2005 10 64 Title M todos Num ricos para Resoluci n de Problemas Multif sica Director Dr S R Idelsohn Financing agency Universidad Nacional del Litoral UN Litoral Begin 2005 End 2007 Key PROTIC Code PAV 127 Subproy 4 Title PROTIC Red para la Promoci n de las Tecnolog as de la Informaci n y las Comunicaciones Subproyecto 4 Centro Virtual de Computaci n de Alto Rendimiento Di rector Dres Alejandro Cecatto Guillermo Marshall Financing agency ANPCyT FONCyT Begin 2005 End 2006 Key LAMBDA Code PICT 12 14573 2003 Title LAMBDA Laboratorio virtual para el An lisis y simulaci n computacional de problemas Mul tif sicos Basados en ecuaciones Diferenciales Acopladas Director Dr S Idelsohn Financing agency ANPCyT FONCyT Begin 2005 End 2007 Key PME CLUSTER Code PME 209 Title Cluster del Litoral Red de laboratorios para la resoluci n de problemas de la f sico matem tica aplicados a la ingenier a Director Dr S Idelsohn Financing agency ANPCyT FONCyT Begin 2004
65. Approximate Nearest Neighbor problem Also refers to the library developped by David Mount and Sunil Arya http www cs umd edu mount ANN 22 Symbols e u v Streamwise and normal components of velocity 159 References 1 M Mallet T J R Hughes A new finite element method for cfd Iv a discontinuity capturing operator for multidimensional advective diffusive systems Comp Meth in Applied Mechanics and Engineering 58 1986 160 Index a_bar 44 A_van_Driest 65 69 A_LL_other 142 absorbing boundary conditions 119 activate_debug 65 activate_debug_memory_usage 65 activate_debug_print 65 activate_turn_wall 61 additional_iprops 22 additional_props 22 additional_tau_pspg 69 ALE_flag 59 61 69 alpha 65 ANN 64 asm_lblocks 25 asm_overlap 25 asm_sub_ksp_type 25 asm_sub_preco_type 25 asm_type 25 atol 25 40 auto_time_step 40 axisymmetric 69 B1 44 beta_supg 41 block uploading 143 block_uploading 143 boundary conditions bsorbing 119 eriodic 118 on linear Dirichlet 120 BufferPack 145 C_smag 69 C_volume 65 cache_grad_div_u 69 Ch 44 characteristic component 51 check_dofmap_id 22 chunk_size 23 COMP_MAT_PROF 137 compact_profile_graph_chunk_size 25 compute_prof 137 161 compute_prof package 136 Conditional processing 20 connected dof s 138 conservative variables 34 consistent formulation 36 consistent_supg_matrix 40 Courant 39 d_nnz 138
66. D these constraints arise when periodic boundary conditions are considered e Some reordering may be necessary either for reducing band width if a direct solver is used or either due to mesh partitioning if the problem is solved in parallel e Also non linear constraints may arise for instance when considering absorbing boundary conditions or non linear restrictions So that we assume that we have a relation Urr QU QU 123 where U is the reduced array of unknowns U representing the externally fixed values and Q Q appropriated matrices Follows some common cases Dirichlet boundary conditions If the velocity components of node 1 are fixed then we have Nadia 124 VU UL 117 so that the corresponding file in Q is null and the file in Q has a 1 in the entry corresponding to the barred values a is Figure 17 Slip boundary condition Slip boundary conditions For the Euler equations if node j is on a slip wall and ni is the corresponding normal then the normal component of velocity is null and the tangential component is free In 3D there are two free tangential components and one prescribed normal component The normal component may be prescribed to some non null value if transpirations fluxes are to be imposed The corresponding equations may look like this Uj Ujtte Ujnn j jttx k 125 Uj Ujtty UjnNy where uj is the free tangential component and tj the prescribed normal comp
67. E CONNECTIVITY FILE gt s__ END HASH _ Note that dof 4 corresponds to the temperature field lt INLET TEMPERATURE gt is the temperature that the user wants to be imposed at the boundary if the flow is re verted lt PENALIZATION COEFFICIENT gt should be large enough so that the temperature approaches lt INLET TEMPERATURE gt However a large value has the disadvantage of in creasing the condition number of the linear system In 3D with a surface of triangular panels the input file section would be 1elemset flow_reversal 3 2npg 4 3 geometry triangles 4 dofs 5 5 coefs lt PENALIZATION COEFFICIENT gt 6 refvals lt INLET TEMPERATURE gt 7 data lt SURFACE CONNECTIVITY FILE gt s__ END HASH _ This term can be used also with the dofs of the velocity itself so that a lumped Darcy term that tends to prevent the reversal of the flow is added In this case the input file section should be 1elemset flow_reversal 3 2npg 4 3 geometry triangles E Penalize temperature AND velocity lumped Darcy term s dofs 123 5 6 coefs COEFV COEFV COEFV COEFT 7refvals 0 0 0 lt TEMPERATURA DE ENTRADA gt s data lt ARCHIVO DE CONECTIVIDADES DE SUPERFICIE gt s_ END HASH _ where COEFV is a penalization coefficient for velocity and COEFT is a penalization coefficient for temperature The reference value for velocity are null i e if the flow is reverted then the penalization term tends make it as small as possible 111 13 Visualizati
68. EM distribution To built it you have to compile first the petscfem library and then cd to the dx directory and make make This should build the dx epimport file which is a dynamically loadable module for DX This one altogether with the dx epimport mdf which is a plain text file describing the inputs outputs of the module and other things are the files needed by DX in order to run the module To load this module in DX you can do this either at the moment of launching DX with something like dx mdf path to epimport mdf or well from the dxexec window menu File Load Module Descriptions 13 3 Inputs outputs of the ExtProglmport module input steps type integer default 0 Description Visualize each steps time steps 0 means asynchronously e input serverhost type string default localhost Description Name of host where the external program is running May be also set in dot notation e g 200 9 223 34 or myhost mydomain edu e input port type integer default 5314 Description Port number e input options type string default NULL Description Options to be passed to the external program e input step type integer default 1 Description Input for sequencer This value is passed to the dx_hook hook but currently is ignored by it It could be used in a future in order to synchronize the DX internal step number with the PETSc FEM one e input state_file type string default NULL An ASCII fi
69. FEM package is a library and application suite oriented to the Finite Ele ment Method based on PETSc Copyright C 1999 2008 Mario Alberto Storti Norberto Marcelo Nigro Rodrigo R Paz Lisandro Dalcin Ezequiel Lopez Laura Battaglia Gus tavo Rios Rodriguez Centro Internacional de Metodos Numericos en Ingenieria CIMEC Argentina Universidad Nacional del Litoral UNL Argentina Consejo Nacional de In vestigaciones Cientificas y Tecnicas UNL Argentina This program is free software you can redistribute it and or modify it under the terms of the GNU General Public License as published by the Free Software Foundation either version 2 of the License or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU General Public License for more details You should have received a copy of the GNU General Public License along with this program if not write to the Free Software Foundation Inc 59 Temple Place Suite 330 Boston MA 02111 1307 USA 2 GNU GENERAL PUBLIC LICENSE Version 2 June 1991 Copyright C 1989 1991 Free Software Foundation Inc 675 Mass Ave Cambridge MA 02139 USA Everyone is permitted to copy and distribute verbatim copies of this license document but changing it is not allowed 2 1 Preamble The licenses for most software are designed to
70. Interaction Linear Solvers and Preconditioners Norberto M Nigro CIMEC PETSc FEM kernel NS and AdvDif modules Multi phase flow Rodrigo R Paz CIMEC AdvDif module Hydrology module Absorbent B C Com pressible Flow ALE Fluid Structure Interaction Preconditioners Lisandro D Dalcin CIMEC PETSc FEM Kernel Python Extension Language Project Linear Algebra Preconditioners Ezequiel Lopez CIMEC Mesh Relocation Algorithms mesh move Laura Battaglia CIMEC Free Surface Algorithms Gustavo Rios Rodr CIMEC Adaptive Refinement Also many people from the following institutions has contributed directly or indirectly to the generation of this code 156 CIMEC International Center for Computational Methods in Engineering Santa Fe Ar gentina http www cimec org ar INTEC Instituto de Desarrollo Tecnol gico para la Industria Quimica http www intec unl edu ar CONICET Consejo Nacional de Investigaciones Cientificas y T cnicas http www conicet gov ar UNL Universidad Nacional del Litoral http www unl edu ar 20 Grants received The development of this code has been developed under financement from the following grants 23 22 21 20 19 18 17 16 Key PICT 2006 VES Code PICT 1506 2006 Title C lculo distribuido en mec nica y multif sica computacional Director Mag Victorio Sonzogni Fi nancing agency FONCyT Begin 2008 End 2010 Key PICTO 2004 Code PICTO 232
71. M Then even if as discussed before no operations like those used in code B are required it may be adviceable to spend some time in insert these calls in order to reduce memory cache and overhead time Again in the limit of M very large it will be more convenient to choose this loop as the outer one 9 4 Synopsis of operations 9 4 1 One to one operations These are operations that take one FastMat2 argument as in FastMat2 amp add const FastMat2 amp A The operations are from one element of A to the corresponding element in this The one to one operations implemented so far are e FastMat2 set const FastMat2 A Copy matrix e FastMat2 add const FastMat2 amp A Add matrix e FastMat2 rest const FastMat2 amp A Substract a matrix 93 e FastMat2 mult const FastMat2 amp A Multiply element by element like Matlab e FastMat2 div const FastMat2 amp A Divide matrix element by element like Matlab e FastMat2 axpy const FastMat2 amp A const double alpha Axpy operation element by element this alpha A 9 4 2 In place operations These operations perform an action on all the elements of a matrix e FastMat2 set const double val 0 Sets all the element of a matrix to a constant value e FastMat2 scale const double val Scale by a constant value e FastMat2 add const double val Adds constant val e FastMat2 fun scalar_fun_t function Apply a function to all elements e FastMat2 f
72. PETSc FEM A General Purpose Parallel Multi Physics FEM Program User s Guide by Mario Storti Norberto Nigro Rodrigo Paz Lisandro Dalc n Ezequiel L pez Laura Battaglia Gustavo Rios Rodriguez Centro Internacional de M todos Computacionales en Ingenier a CIMEC Santa Fe Argentina http www cimec org ar petscfem version mstorti v33 start 7 gedae2aa clean date Mon Aug 16 17 13 59 2010 0300 processed date Tue Aug 24 19 17 54 2010 0300 August 24 2010 This is the documentation for PETSc FEM current version mstorti v33 start 7 gedae2aa clean a general purpose parallel multi physics FEM program for CFD applications based on PETSc PETSc FEM comprises both a library that allows the user to develop FEM or FEM like i e non structured mesh oriented programs and a suite of application programs It is written in the C language with an OOP Object Oriented Programming philosophy but always keeping in mind the scope of efficiency Contents LICENSE GNU GENERAL PUBLIC LICENSE 2i Preamble o ce ac A Aa a AA A 2 2 GNU General Public License Terms and Conditions for Copying Distri bution and Modification oo waapa d Caah aea 2 3 Appendix How to Apply These Terms to Your New Programs The PETSc FEM philosophy 3 1 The three levels of interaction with PETSc FEM 3 2 The elemset concept o General layout of the user input data file 4 1 Preprocessing the u
73. PG stabilization term found in file nsitetlesfm2 cpp double tau_pspg_fac default 1 Scales the PSPG stabilization term found in file nsitetlesfm2 cpp double tau_supg_fac default 1 Scales the SUPG stabilization term found in file nsitetlesfm2 cpp double temporal_stability_factor default 0 Adjust the stability parameters taking into account the time step If the steady option is in effect which is equivalent to At 00 then temporal_stability_factor is set to 0 found in file nsitetlesfm2 cpp int weak_form default 1 Use a weak form for the gradient of pressure term found in file nsitetlesfm2 cpp Elemset bcconv_ns_fm2 e string geometry default cartesian2d Type of element geometry to define Gauss Point data found in file bccnsfm2 cpp e int weak_form default 1 Use a weak form for the gradient of pressure term found in file bccnsfm2 cpp Elemset wall 70 e string geometry default cartesian2d Type of element geometry to define Gauss Point data found in file wall cpp e double rho default 1 Density found in file wall cpp e double y_wall_plus default 25 The y coordinate of the computational boundary found in file wall cpp Elemset ns_sup_g This element imposes a linearized free surface boundary condition e double free_surface_damp default 0 Cif free_surface_set_level_factor tries to keep the free surface level constant by add
74. RRANTY for details type show vw This is free software and you are welcome to redistribute it under certain conditions type show c for details The hypothetical commands show w and show c should show the appropriate parts of the General Public License Of course the commands you use may be called something other than show w and show c they could even be mouse clicks or menu items whatever suits your program You should also get your employer if you work as a programmer or your school if any to sign a copyright disclaimer for the program if necessary Here is a sample alter the names Yoyodyne Inc hereby disclaims all copyright interest in the program Gnomovision which makes passes at compilers written by James Hacker lt signature of Ty Coon gt 1 April 1989 Ty Coon President of Vice This General Public License does not permit incorporating your program into propri etary programs If your program is a subroutine library you may consider it more useful to permit linking proprietary applications with the library If this is what you want to do use the GNU Library General Public License instead of this License 12 3 The PETSc FEM philosophy 3 1 The three levels of interaction with PETSc FEM As stated in the PETSc FEM description it is both a library and an application suite That means that some applications as Navier Stokes Euler inviscid flow shallow water ge
75. This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer in accord with Subsection b above The source code for a work means the preferred form of the work for making mod ifications to it For an executable work complete source code means all the source code for all modules it contains plus any associated interface definition files plus the scripts used to control compilation and installation of the executable However as a special exception the source code distributed need not include anything that is normally distributed in either source or binary form with the major components compiler kernel and so on of the operating system on which the executable runs unless that component itself accompanies the executable If distribution of executable or object code is made by offering access to copy from a designated place then offering equivalent access to copy the source code from the same place counts as distribution of the source code even though third parties are not compelled to copy the source along with the object code 4 You may not copy modify sublicense or distribute the Program except as expressly provided under this License Any attempt otherwise to copy modify sublicense or distribute the Program is void and will automatically terminate your rights under this License However parties who have received copies or rights from y
76. _state_from_file default 0 Read states from file instead of computing them Normally this is done to analyze a previous run If 1 the file is ASCII if 2 then it is a binary file In both cases the order of the elements must be u 1 1 u 1 2 u 1 3 u 1 ndof u 2 1 u nnod ndof where u i j is the value of field j at node i found in file dxhook cpp string dx_split_state default Generates DX fields by combination of the input fields found in file dxhook cpp int dx_state_all_fields default dx_split_state_flag Generates a DX field with the whole state all ndof fields found in file dxhook cpp int dx_steps default 1 Initial value for the steps parameter found in file dxhook cpp 115 e int dx_stop_on_bad_file default 0 If dx_read_state_from_file is activated and can t read from the given file then stop Otherwise continue padding with zeros found in file dxhook cpp 14 The idmap class 14 1 Permutation matrices This class represents sparse matrices that are close to a permutation Assume that the n x n matrix Q is a permutation matrix so that if y x are vectors of dimension n and y Qx 119 then Y TPG oe where P is the permutation associated with Q For instance if P 1 2 3 4 2 4 3 1 then y Qx 121 This matrix can be efficiently stored as an array of integers of dimension ident n such that ident j 1 P j Also the inverse permutation
77. able geometry cartesian2d ndim 2 npg 4 physical properties Dt 1 time step viscosity 1 next lines flags end of the properties hash table __END_HASH__ element connectivities physical properties per element 1342 1 1 2 3 0 7 3564 1 2 2 1 0 8 5786 1 3 2 2 0 9 lt more element connectivities and physical props here gt 193 195 196 194 1 5 2 0 0 8 195 197 198 196 1 6 2 1 0 7 197 199 200 198 1 7 2 2 0 6 199 201 202 200 1 8 2 3 0 5 next lines flags end of elemset connectivities __END_ELEMSET__ Here we define that properties cond cp and emiss are to be defined per element We add to each element connectivity line the three properties There are two ways to access these properties The corresponding values are stored in an array elemprops of length Nprops X Nelem Where Nprops is the number of per element properties 3 in this example and nejem is the number of elements in the mesh Also there is a macro ELEMPROPS k iprop that allows treating it as a matrix So you can access this values with 32 cond ELEMPROPS k 0 conductivity of element k cp ELEMPROPS k 1 specific heat of element k emiss ELEMPROPS k 2 emissivity of element k 5 7 Taking values transparently from hash table or per element table With the tools described so far you can access constant properties on one hand the same for the whole elemset and per element properties Now given a property
78. absorbing boundary condition is almost evident Assuming that we want to solve the equation on the semiplane x gt so that x 0 is a boundary then the corresponding absorbing boundary condition is A 0 45 ee 0 if A gt 0 ingoing boundary 46 vjextrapolated from interior otherwise outgoing boundary This can be summarized as TiVo 0 47 where II diag 1 sign A 2 48 is the projection matrix onto the space of incoming waves As the II is a diagonal matrix with diagonal elements 1 or 0 it trivially satisfies the projection condition 1111 rr Coming back to the U basis we obtain the following first order linear absorbing bound ary condition I Uo0 U 0 Up 0 49 where I SI1 7s 50 This condition is perfectly absorbing for small amplitude waves around the state Up The main problem with it is that as the limit state at the boundary U is not known a priori we have to use some a priori chosen state Up 4 U and then the projection matrix used IT U will not be equal to U and then not fully absorbing We call Up the reference state for the absorbing boundary condition It can even happen that the eigenvalues for the actual state at the boundary change sign with respect to the reference state 5l 6 12 2 Riemann based absorbing boundary conditions Let n n be the number of incoming outgoing waves i e the number of positive negative eigenvalues of Ap and assume
79. absorbing boundary equation first row of 72 for these nodes 6 12 7 Flux functions with enthalpy When the flux function has an enthalpy term that is not the identity then the expressions for the change of basis are somewhat modified and also the projectors An advective diffusive system with a generalized enthalpy function H U is an extension of the form 2 and can be written as OF U ay AU es 73 The heat equation can be naturally put in this way Also the gas dynamics equations for compressible flow can be put in this form if we put the equations in conservative form but use the primitive variables U p u p as the main main variables for the code This has the advantage of using a conservative form of the equations and at the same time allows an easy imposition of Dirichlet boundary conditions that are normally set in terms of the primitive variables In this case U are the primitive variables and the generalized enthalpy H U is the vector of conservative variables We call the generalized heat content matrix Cp as OH U e 4 and 2 can be put in quasi linear form as OU OU iae oa 2 56 Note that this can be brought to the quasi linear form 42 i e without the Cp if we multiply the equation at left by C 1 and define new flux Jacobians as C Ay 76 So that basically the extension to systems with generalized enthalpy is to replace the Jacobians by the modified Jac
80. ache operation see figure 9 The deactivate_cache call at line 43 causes that subsequent operations after this line will not be cached This call is required if subsequent calls to FastMat2 cached operations will be executed Otherwise there will be an error since those posterior calls will not have a corresponding cache in the cache list The void_cache call deletes the cache claiming the space used by it It s not required but it is a good idea in order to save memory space It may be required if the loop will be called with other dimensions For instance a FEM loop may be called first for triangles and then for quadrangles and in this two calls the dimensions of the involved matrices are different The general layout of a code section which uses caching is like this Initialization Not cached operations FastMatCacheList cache_list Cache list object FastMat2 activate_cache kcache_list activates the cache for j 0 j lt N j Outer loop N very large number FastMat2 reset_cache Rewinds the cache list op_1 86 op_2 op_3 op_N Cached operations End of outer loop FastMat2 void_cache free memory FastMat2 deactivate_cache deactivates cache reset_cache Figure 9 Cache of operations for linear segment of code Branching Caching is straightforward if the sequence of operations are executed lin early with
81. actional step May be iisd or petsc found in file ns cpp double start_comp_time default 0 Time to start computations found in file ns cpp string stdout_file default none If set redirect output to this file found in file ns cpp int steady default 0 Flag if steady solution or not uses Dt inf If steady is set to 1 then the computations are as if At oo The value of Dt is used for printing etc If Dt is not set and steady is set then Dt is set to one found in file ns cpp int stop_mom default 0 After computing the linear system for the predictor momentum step print right hand side and solution vector and stops found in file ns cpp int stop_on_step default 1 After computing the linear system for the predictor momentum step print right hand side and solution vector and stops found in file ns cpp int stop_poi default 0 After computing the linear system for the poisson step print right hand side and solution vector and stops found in file ns cpp int stop_prj default 0 After computing the linear system for the projection step print right hand side and solution vector and stops found in file ns cpp double tol_newton default 1e 8 Tolerance to solve the non linear system global Newton found in file ns cpp int update_jacobian_iters default 1 Update jacobian only until n th Newton subiteration Don t update if null found in file ns cpp 68 e int upd
82. address via the fun_data pointer Later in subsequent calls to eval_fun it is passed the same pointer so that you can recover the information previous a static cast Of course in order to avoid memory leak you have to free the allocated memory somewhere this is done in the clear function For instance the following file defines a ramp function where youcan set the four values to fo t1 fi include lt math h gt include lt src fem h gt include lt src getprop h gt include lt src ampli h gt struct MyFunData double f0 f1 tO t1 slope F INIT_FUN 4 int ierr Read values from option table TGETOPTDEF thash double t0 0 TGETOPTDEF thash double t1 1 TGETOPTDEF thash double f0 0 TGETOPTDEF thash double f1 1 create struct data and set values MyFunData d new MyFunData fun_data d d gt f0 0 d gt f1 f1 133 d gt t0 t0 d gt t1 t1 d gt slope f 1 f0 t1 t0 EVAL_FUN 4 MyFunData d MyFunData fun_data define ramp function if t lt d gt t0 return d gt f0 else if t gt d gt t1 return d gt f1 else return d gt f0 d gt slope t d gt t0 CLEAR_FUN 4 clear allocated memory MyFunData d MyFunData fun_data delete d fun_data NULL Another possibility perhaps more simple would be to use a global MyFunData object but if several fixa_amplitude entries that use the same function are used then the data crea
83. agorinsky constant found in file nsitetlesfm2 cpp e int cache_grad_div_u default 0 Cache grad_div_u matrix found in file nsitetlesfm2 cpp e int fractional_step default 0 Assert fractional tep isnotused foundin file nsitetlesfm2 cpp double ndim G_body default null vector Vectorof gravityacceleration mustbeconstant foundin file nsitetlesfm2 cpp e string geometry default cartesian2d Type of element geometry to define Gauss Point data found in file nsitetlesfm2 cpp 69 int indx_ALE_xold default 1 Pointer to old coordinates in nodedata array excluding the first ndim values found in file nsitetlesfm2 cpp double jacobian_factor default 1 Scale the jacobian term found in file nsitetlesfm2 cpp int LES default 0 Add LES for this particular elemset found in file nsitetlesfm2 cpp int npg default none Number of Gauss points found in file nsitetlesfm2 cpp double pressure_control_coef default 0 Add pressure controlling term found in file nsitetlesfm2 cpp int print_van_Driest default 0 print Van Driest factor found in file nsitetlesfm2 cpp double residual_factor default 1 Scale the residual term found in file nsitetlesfm2 cpp double rho default 1 Density found in file nsitetlesfm2 cpp double shock_capturing_factor default 0 Add shock capturing term found in file nsitetlesfm2 cpp double tau_fac default 1 Scale the SUPG and PS
84. al Unoriented prisms are described by numbering one of the triangular faces as a tri angle and then the opposite face in correspondence with the first face Symmetries are generated by 180 rotations of any two of the quad faces 4 35 1 0 2 for instance a 120 rotation of any of the triangular face 1 2 0 4 5 3 for instance and an inversion 1 0 2 4 3 5 for instance There are 12 permutations in total 149 Figure 38 Generators for prisms e Oriented hexahedra is identical to unoriented hexahedra without inversion 6 per mutations in total 17 3 2 Canonical ordering One of the most common operations when manipulating these geometrical objects is given two geometrical objects of the same type to determine whether they represent the same object The brute force solution is to apply all the permutations to one of them and check if one of the permuted indices coincide with the other node sequence 17 4 Permutation tree In order to make it more efficient we can store all the permutations for a given shape in a tree like fashion Consider for instance the oriented tetrahedral shape Their permu tations are the following 0 1 2 3 0 2 3 4 0 3 1 2 1 0 3 2 1 2 0 3 1 3 2 0 2 0 1 3 2 130 2 3 0 1 3 0 2 1 3 1 0 2 3210 We can describe the generation of a new node numbering in the following way First we can take any of the nodes as the first node of the new numbering These is seen from the
85. aring and reuse of software generally 11 WARRANTY BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE THERE IS NO WARRANTY FOR THE PROGRAM TO THE EXTENT PER MITTED BY APPLICABLE LAW EXCEPT WHEN OTHERWISE STATED IN 10 WRITING THE COPYRIGHT HOLDERS AND OR OTHER PARTIES PRO VIDE THE PROGRAM AS IS WITHOUT WARRANTY OF ANY KIND EL THER EXPRESSED OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU SHOULD THE PRO GRAM PROVE DEFECTIVE YOU ASSUME THE COST OF ALL NECESSARY SERVICING REPAIR OR CORRECTION 12 INNO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER OR ANY OTHER PARTY WHO MAY MODIFY AND OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE BE LIABLE TO YOU FOR DAMAGES INCLUDING ANY GENERAL SPECIAL INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBIL ITY OF SUCH DAMAGES END OF TERMS AND CONDITIONS 2 3 Appendix How to Apply These Terms to Your New Programs If you develop a new program and you want it to be of the greatest possible use to the public
86. as plain sequences One possibility is to take the caonical order as that one that gives the lower node sequence in lexicographical order This has the advantage that one the canonical orderign is known for both objects the comparison is very cheap Also it can be used as a comparison operator for sorting geometrical objects i e the order between two objects is the lexicographical order between the two node sequences in its canonical form The canonical order can be computed efficientlyusing the permutation tree described above 17 6 Object hashing Another useful technique for comparing objects is by comparing first some scalar function of the node indices For instance we can compute a hash sum for the object go as n 1 H go S h nodegoj 162 j 0 where node go j is the j node of object go H is the hash sum for the geometrical object go whereas h is a scalar hashing function A very simple possibility is to take h as the identity h a x In that case the hash of the object is simply the sum of their node indices The idea is that even for this simple hashing function we can compare first the hash sums of two objects for determining if they are equal If there is a high probability of the objects being unequal then in most cases this simple check will suffice If the hash sum are equal then the full comparision procedure as described above must be performed Note that the hash sum is and must be independen
87. at matrices 1 print inmedi ately 2 gather events non immediate printing found in file iisdmat cpp int print_internal_loop_conv default 0 Prints convergence in the solution of the GMRES iteration found in file iisdmat cpp double rtol default 1e 3 Relative tolerance to solve the monolithic linear system Newton linear subiteration found in file iisdmat cpp int use_compact_profile default LINK_GRAPH Choice representation of the profile graph Possible values are 0 Adjacency graph classes based on STL map set demands too much memory CPU time OK 1 Based on dynamic vector of pair of indices with resorting demands too much CPU time RAM is OK 2 For each vertex wee keep a linked list of cells containing the adjacent nodes Each insertion is O m where m is the average number of adjacent vertices This seems to be optimal for FEM connectivities found in file iisdmat cpp 26 4 5 Emacs tools and tips for editing data files GNU Emacs http www gnu org software emacs is a powerful text editor that can colorize and indent text files according to the syntax of the language you are editing Emacs has modes for most languages C C Pascal Fortran Lisp Perl We have written a basic mode for PETSc FEM data files that is distributed with PETSc FEM in the tools petscfem el Emacs Lisp file Another mode that can serve for colorization is the shell script mode In order to associate the epl
88. ate_jacobian_start_iters default INF Update jacobian each n th Newton iteration found in file ns cpp e int update_jacobian_start_steps default INF Update jacobian each n th time step found in file ns cpp e int update_jacobian_steps default 0 Update jacobian each n th time step found in file ns cpp e int update_wall_data default 0 If 0 compute wall_data info only once Otherwise refresh each update_wall_data steps found in file ns cpp e int use_iisd default 0 Use IISD Interface Iterative Subdomain Direct or not found in file ns cpp e int verify_jacobian_with_numerical_one default 0 Computes jacobian of residuals and prints to a file May serve to debug computation of the analytic jacobians found in file ns cpp e int volume_control_flag default 0 Flag for VOLUME CONTROL in level set found in file ns cpp Elemset nsitetlesfm2 e double A_van_Driest default 0 van Driest constant for the damping law found in file nsitetlesfm2 cpp e double additional_tau_pspg default 0 Add to the tau_pspg term so that you can stabilize with a term independently of h Mainly for debugging purposes found in file nsitetlesfm2 cpp e int ALE_flag default 0 Flag to turn on ALE computation found in file nsitetlesfm2 cpp e string axisymmetric default none Add axisymmetric version for this particular elemset found in file nsitetlesfm2 cpp e double C_smag default 0 18 Sm
89. be shown that the conttribution of the other two terms is an antisymmetric matrix so that the contibution to the rate of change of the eigenvalue is null Now from 94 Gij ON ON Ji 104 OT i 0 amp dj oti so that i oe d ae Fat og Ju FE vas 105 which is the expression used 8 Tests and examples In this section we describe some examples that come with PETSc FEM They are inten tionally small and sometimes almost trivial but they are very light weight and can be run in a short time at most some minutes They are put in the test directory and can be run with make tests command The purpose of this tests is to check the corect behaviour of PETSc FEM but also they can serve people to understand the program 8 1 Flow in the anular region between to cylinders File sector dat This example tests periodic boundary conditions The mesh is a sector of circular strip 2 72 lt r lt 4 48 0 lt 0 lt 7 4 and we impose The governing equation is Au 0 where u is a velocity vector of two components On the internal radius we impose u 0 and u t where t is the tangent vector On the radii 9 0 and 0 7 4 we impose periodic boundary conditions so that it is close to flow in the region between two cylinders with the internal cylinder fixed and the external one rotating with velocity 1 But the operator is not the Stokes operator However in this case the flow for this operator is divergence fre
90. c FEM once you write the flux function for a particular advective diffusive problem you get absorbing boundary conditions with none or little extra work There are basically two types of absorbing boundary conditions e Linear based on the Jacobian of the flux function assuming small perturbations about a reference value e Based on Riemann invariants require the writer of the flux function to provide the Riemann invariants for the flux function Needs the user to write the Riemann invariants and 50 6 12 1 Linear absorbing boundary conditions Starting with the conservation form of an advective system 2 and assuming small per turbations about a mean fluid state U x t Up U x t and no source term then we obtain the linearized form aul uy dle iy Ye eel 42 A 42 where we assume further that the flow only depends on x the direction normal to the boundary Let S be the matrix of right eigenvectors of Ag so that AoS SA 43 where A diag A1 Ana are the eigenvalues of Ap Assuming that the system is hyperbolic then such a diagonal decomposition is possible for any state Up with real eigenvalues and a non singular matrix S Multiplying 42 at left by S7 and defining V S71U we obtain a decoupled system of equations OV OV BE A 0 44 Now the equation for each characteristic component v of V is a simple linear transport equation with constant transport velocity A Ov Ot so that the
91. ch equations that is added For instance when imposing conditions on primitive variables it is usual to discard the energy equation if pressure is imposed to discard the continuity equation if density is imposed and to discard the j th component of the momentum equation if the j th component of velocity is imposed On solid walls the energy equation is discarded if temperature is imposed Some of these pairings equation unknown are more clear than others 53 So that when imposing absorbing boundary conditions we have not only to specify the new conditions but also which equations are discarded Note that this is not necessary if all the variables are imposed at the boundary for instance in a supersonic outlet This suggests to generate appropriate values even for the outgoing waves for instance by extrapolation from the interior values 6 12 5 Extrapolation from interiors For a linear system in characteristic variables 45 we could replace all the first row of equations by 0 n oa A i 60 ee Cpp J 1 Naot which can be put in matricial form as yn tl _ IV 0 m dal 61 My V7 Vp 0 p 0 where the c s are appropriate coefficients that provide an extrapolation to the value vati from the values at time yhti Note that these represents 2ngof equations but as Ty have rank n there are in total naof linearly independent equations As the nt 1 lt j lt naor rows in the first row of equations c
92. ch of the matrix arguments 9 2 5 Branch arrays If some cases you need branching to a certain number of branches and doing it by hand is cumbersome You should need a branch object for each branch so you can either define a plain STL vector lt gt of branches or use the FastMat2 CacheCtx2 Branchv class Consider the following code FastMat2 CacheCtx2 ctx FastMat2 CacheCtx2 Branch bi FastMat2 CacheCtx2 Branchv b2v 5 FastMat2 x amp ctx 2 5 3 int N 10000 ctx use_cache 1 double sum 0 for int j 0 j lt N j ctx jump b1 82 x fun rnd int k rand 5 ctx jump b2v k x ir 1 k 1 sum x norm_p_all x rs ctx use_cache 0 At each iteration of the loop you randomly generate a matrix x of 5x3 Then you randomly pick a row of it take its norm and accumulate on variable sum If you omit in the code the ctx jump b2v k line then there is only the main b1 branch and when the norm_p_all instruction is executed it can be reached with a different value of k so that it will not give an error but it will compute with the k stored in the cache As mentioned above you can either define a plain STL vector lt gt of branches or use the FastMat2 CacheCtx2 Branchv class as shown above The branch array can be given dimensions either at the constructor as above or with the init method FastMat2 CacheCtx2 Branchv b2v later b2v init 5 Multi dimensional arrays can b
93. conductivity is found after calling load_props in propel conductivity_indx define COND propel conductivity_indx Other properties DEFPROP propa define PROPA propel propa_indx DEFPROP propb define PROPB propel propb_indx DEFPROP propc define PROPC propel propc_indx DEFPROP propp 33 define PROPP propel propp_indx DEFPROP propq define PROPQ propel propq_indx Total number of properties loaded int nprops iprop Set error if maximum number of properties exceeded assert nprops lt MAXPROP code loop over elements for int k el_start k lt el_last k 4 check if this element is to be processed if compute_this_elem k this myrank iter_mode continue Load properties either from properties hash table or from per element properties table load_props propel elprpsindx nprops amp ELEMPROPS k 0 more code use physical element property double veccontr wpgdet COND dshapex t dshapex e First we allocate for 100 entries in elprpsindx and propel arrays and set the counter prop to 0 Then we call DEFPROP for properties conductivity and propa thru propq After this we check that the maximum number of properties to be defined is not exceeded and enter the element loop After checking as usual if the element needs processing we call Load_props in order to
94. copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries so that distribution is permitted only in or among coun tries not thus excluded In such case this License incorporates the limitation as if written in the body of this License 9 The Free Software Foundation may publish revised and or new versions of the General Public License from time to time Such new versions will be similar in spirit to the present version but may differ in detail to address new problems or concerns Each version is given a distinguishing version number If the Program specifies a version number of this License which applies to it and any later version you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation If the Program does not specify a version number of this License you may choose any version ever published by the Free Software Foundation 10 If you wish to incorporate parts of the Program into other free programs whose dis tribution conditions are different write to the author to ask for permission For software which is copyrighted by the Free Software Foundation write to the Free Software Foundation we sometimes make exceptions for this Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sh
95. cos wt cos ky at x 0 t gt 0 do p 0 an 0 at x eb 112 0 in0 lt x lt Lr 0 lt y lt Ly t gt 0 0 at y Ly do 0 aty 0 On ee We can find the solution proposing a solution of the form bx eft iwt Replacing in 112 we arrive to a characteristic equation in k of the form iw pu k 8 113 This is a quadratic equation in 8 and has two roots say 12 The general solution is px Y t Re cre coe a 76 9 The FastMat2 matrix class 9 1 Introduction Finite element codes usually have two levels of programming In the outer level a large vector describes the state of the physical system Usually this vector has as many entries as the number of nodes times the number of fields minus the number of fixations i e Dirichlet boundary conditions This vector can be computed at once by assembling the right hand side and the stiffness matrix in a linear problem iterated in a non linear problem or updated at each time step through solution of a linear or non linear system The point is that at this outer level you perform global assemble operations that build this vector and matrices At the inner level you perform a loop over all the elements in the mesh compute the vector and matrix contributions of each element and assemble them in the global vector matrix From one application to another the strategy at the outer level linear non linear steady temporal dependent etc and the physics o
96. cpp e string geometry default cartesian2d Type of element geometry to define Gauss Point data found in file advecfm2 cpp e int lumped_mass default 1 Use lumped mass found in file advecfm2 cpp e int weak_form default 1 Use the weak form for the Galerkin part of the advective term found in file advecfm2 cpp Flux function ffeulerfm2 Euler eqs e double gamma default 1 4 The specific heat ratio found in file ffeulerfm2 cpp e int shock_capturing default 0 Add shock capturing term found in file ffeulerfm2 cpp e double shock_capturing_threshold default 0 1 Add shock capturing term if relative variation of variables inside the element exceeds this found in file ffeulerfm2 cpp e double tau_fac default 1 Scale the SUPG upwind term found in file ffeulerfm2 cpp Flux function ffswfm2 Shallow water eqs e double gravity default 1 Acceleration of gravity found in file ffswfm2 cpp Al e int shock_capturing default 0 Add shock capturing term found in file ffswfm2 cpp e double shock_capturing_threshold default 0 1 Add shock capturing term if relative variation of variables inside the element exceeds this found in file ffswfm2 cpp e double tau_fac default 1 Scale the SUPG upwind term found in file ffswfm2 cpp 6 7 The hydrology module aquifer bottom datum free surface of freatic aquifer Figure 2 Aquifer stream system FE E 7
97. de code BOPT g_c and debug the code This is the simplest situation you should debug here until you reach a working code that gives the results you expect Note currently PETSc FEM does not use multi threading so that we refer here to deactivating multithreading for other code that may be using FastMat2 as for instance the trceprtf code for particle tracking 84 e Enable caching and activate check_labels Debug until no errors are found and check that the results are the same as with the non cached code re checking e Go to optimized mode and re check e Go to multithread mode and re check 9 2 9 FastMat2 tips e Do not leave caching activated in the whole code After the loops that are most time consuming set ctx use_cache 0 e Some initialization code or other isolated operations are better to be left uncached e You can access the raw storage an array of doubles inside the matrix with double aptr a storage_begin This can be used for complex operations that are not easy to code with the FastMat2 methods 9 3 An older version of cache structure NOTE This version of cache structure called CacheCtx1 will be probably de clared obsolete in a future We encourage users to use the new CacheCtx2 version The functions for manipulating the cache structure are different for FastMat2 CacheCtx1 than for FastMat2 CacheCtx2 The example above is rewritten as follows Chrono chrono FastMat2 x 2 3 2 a 1 2 b 1 2
98. debug_compute_prof 23 debug_element_partitioning 22 degree of freedom 116 delta_time 65 diameter 44 displ_factor 65 DistMap 144 DistMatrix 142 dof 116 domain decomposition 138 double_layer 109 Dt 40 65 dtol 25 40 dx_auto_combine 115 dx_cache_coords 115 dx_coords_scale_factor 115 dx_coords_scale_factoro 115 dx_do_make_command 115 dx_node_coordinates 115 dx_port 115 dx_read_state_from_file 115 dx_split_state 115 dx_state_all_fields 115 dx_steps 115 dx_stop_on_bad_file 115 efficiency 143 element properties er element 32 element_weight 23 elemset 13 elemset properties 28 embedded Perl 17 envelope mask 144 ePerl 17 ePerl n makefiles 21 ePerlini 21 epsilon_fdj 23 Euler equations 34 File inclusion 20 finite difference method 36 flux jacobians 35 fractional_step 65 69 fractional_step_solver_combo 65 fractional_step_use_petsc_symm 65 free_surface_damp 71 free_surface_set_level_factor 71 friction_law 45 fs_eq_factor 71 gamma 41 gas dynamic equations 34 gather_file 65 geometry 41 69 71 109 get_double 31 get_int 31 gmres_orthogonalization 25 gravity 41 idmap 116 iisd_subpart 24 iisd_subpart_auto 24 TISDMat 138 iisdmat_print_statistics 24 impermeable 45 indx_ALE_xold 69 interface preconditioning 140 interface_full_preco_fill 24 interface_full_preco_maxits 24 interface_full_preco_pc 24 interface_full_preco_relax_fact
99. ds number may be obtained by continuation in Re and then we can use it as the external tome like parameter In general we can perform continuation in a set of parameters so that the time_data variable should be an array of scalars Multiple right hand sides Here the time like variable may be an integer the number of right hand side case 15 The compute_prof package 15 1 MPI matrices in PETSc When defining a matrix in PETSc with MatCreateMPIAIJ you tell PETSc see the PETSc documentation e How many lines of the matrix live in the actual processoor m 136 e For each line of this processor how many non null elements it has in the diagonal block d_nnz j e For each line of this processor how many non null elements it has in the off diagonal block o_nnz j In PETSc FEM this quantities are computed in two steps e Call assemble with ijob COMP_MAT_PROF or ijob COMP_FDJ_PROF and appro priated jobinfo in order to tell the element routine which matrix are you com puting This returns a Libretto dynamic array da containing the sparse structure of the given matrix e function compute_prof takes da as argument and fills d_nnz o_nnz 15 2 Profile determination Let us describe first how the sparse profile is determined in the sequential one processor only case We have to determine for each row index i all the column indices j that have a non null matrix en
100. e Scientifique FNRS entre el Grupo de Tecnolog a Mec nica del INTEC y el Laboratoire de T chniques A ronautiques et Spatiales Universidad de Lieja B lgica Investigaci n en Mec nica Computacional Director A Cardona y M G radin Financing agency CONICET FNRS B lgica Begin 1996 End 1997 Key CAI D 94 Code CAI D 94 004 024 Title M todos Num ricos en Mec nica de S lidos y Fluidos Director S R Idelsohn y A Cardona Financing 158 agency Universidad Nacional del Litoral UNLit Begin 1994 End 1995 2 Key GERMEN Code PICT 51 Title FONCyT PICT 51 GERMEN GEneraci n de Recursos b sicos para la aplicaci n de los MEtodos Num ricos Director Dr Sergio Idelsohn Financing agency FONCyT Begin 1998 End 2001 1 Key COLA Code PID 026 Title Simulaci n Num rica de Procesos de Colada Continua Director S R Idelsohn Financing agency SECYT BID Be gin 1996 End 1999 21 Symbols and Acronyms 21 1 Acronyms CFD Computational Fluid Dynamics DX IBM Data Explorer ePerl embedded Perl FDM Finite Difference Method FEM Finite Element Method HISD Interface Iterative Sub domain Direct method KWM Kinematic Wave Model see 6 7 LES Large Eddy Simulation MPI Message Passing Interface OOP Object Oriented Programming Perl Practical Extraction and Report Language PETSc Portable Extensible Toolkit for Scientific computations SUPG Streamline Upwind Petrov Galerkin ANN
101. e Up used for the Riemann based absorbing boundary conditions but on the internal state That means for in stance that if the internal computational scheme tends to produce some error due to non conservativity or rounding errors the internal state would drift constantly A good compromise may be to use Riemann based 54 or linear absorbing boundary conditions 49 at inlet and absorbing boundary conditions based on last state 58 at outlet As the error tends to propagate more intensely towards the outlet boundary it is preferably to use strongly absorbing boundary conditions there whereas the linearly absorbing or Riemann invariant boundary conditions upstream avoid the drift 6 12 4 Finite element setup Assume that the problem is 1D with a constant mesh size h and time step At so that nodes are located at positions 1 kh k 0 00 Let U be the state at node j time step n FEM discretisation of the system of equations with natural boundary conditions leads to a system of the form Fo Ug Up Rot F Uj Up Uy RIH 59 n 1 n 1 n ly __ n 1 F U Uy Uki R where the F are non linear functions and the Ry possibly depends on the previous values Uy Imposing boundary conditions means to replace some of the equations in the first row k 0 for other equations Recall that in order to balance the number of equations and unknowns we must specify which of the equations are discarded for ea
102. e and the gradient of pressure has no component in the angular direction so that the solution do coincide with the Stokes solution In the output of the test sector sal we check that the solution is aligned with the angular direction uy uy at the outlet section 0 7 4 8 2 Flow in a square with periodic boundary conditions File lap_per dat This is similar to example sector dat but in a region that is the square 1 lt x y lt 1 u is imposed on all sides u 1 and in the tangential direction in the counter clockwise sense i e u 0 1 in z 1 u 41 0 in y 1 By symmetry we can solve it in 1 th of the domain i e in the square x y gt 0 We could solve it also in 14th of the region the triangle y gt 0 y lt x z lt 1 8 3 The oscilating plate problem File oscplate dat This is the flow produced between two infinite plates when one is at rest and the other oscillating with amplitude A and frequency w see figure 7 This serves 74 to test a problem with temporal dependent boundary conditions The problem is one dimensional and the resulting field is u 0 p cnst and v v x We set parameters lenght between plates L 1 viscosity y 1 and we model the problem with a strip 0 lt x lt 1 0 lt yS lt 0 l and set periodic boundary conditions between y 0 1 and y 0 At the plate at rest x L we set u p 0 and at x 0 and at the moving plate x 0
103. e boundary The second node is the node with Lagrange multipliers and the third node is used to set the reference value The data would look like this see figure 6 elemset gasflow_abso2 3 normal lt nx gt lt ny gt __END_HASH__ lt nl gt lt n2 gt lt n3 gt __END_ELEMSET__ 58 absorbing element K lt n lagrange multipliers O n 1 10O n y reference state outgoing wave Figure 6 Absorbing element end_elemsets fixa lt n4 gt 1 lt rho ref gt lt n4 gt 2 lt u_ref gt lt n4 gt 3 lt vref gt lt n4 gt 4 lt p ref gt __END_FIXA__ e As before the normal property is used for computing the direction normal to the boundary e If the use_old_state_as_ref flag is set to true 1 then the reference state is taken as the state o the state of the boundary node at the previous time step In this case the state of the third node is ignored On the other hand if it is set to false 0 then the state of the third node is used as the reference state e This type of boundary condition doesn t need the implementation of the Riemman invariants methd but it needs the methods get_CpO and get_Ajac 6 12 9 Related Options e int ALE_flag default 0 Do correction for wave characteristic computation do to mesh velocity ALE found in file advabso cpp e int ndim default 0 Dimension of the space found in file advabso cpp 59 double array normal default none Defines the normal
104. e created just give more integer arguments Currently up to 4 dimensions are allowed Please note that in the case above if just the same columns is visited always then the branching is not needed for instance int eh following code int k 3 for int j 0 j lt N j ctx jump b1 x fun rnd x ir 1 k sum x norm_p_all x rs the second branching is not neededm because we use the mask always with the third column Moreover in the following case for int j 0 j lt N j ctx jump b1 x fun rnd for int k 1 k lt 5 k x ir 1 k sum x sum_square_all x rs 83 you don t need the branching either because all the columns are visited always in the same order i e the sequence of operations that are seen by the cache system are fun rnd sum_square_al1 with x masked to column 1 sum_square_all with x masked to column 2 sum_square_all with x masked to column 3 Sum_square_all with x masked to column 4 sum_square_all with x masked to column 5 x x x x x x so that the code above with only one branch will work OK 9 2 6 Debugging tools If you receive segmentation violation errors and you suspect that it is due to the caching system then you can enable a self check by issuing the ctx check_labels instruction In this way a check string identifying the operations is generated and stored in the cache Then each time the cache is reused the string s
105. e is greater than one then the state vector are continued to be stored in file file1 out and so on When the number of files nfile is reached the saving continues in file 0 66 More precisely the saving mechanism is described by the following pseudo code Read state vector from initial_state file into x 0 n 0 for i 0 i lt nstep i advance x n to x n 1 if n nsaverot 0 j lt n nsaverot k lt j nrec l1 lt j nrec if k 0 rewind file 1 append state vector to file l found in file ns cpp int nsome default 10000 Sets the save frequency in iterations for the print some mechanism The print some mechanism allows the user to store the variables of some set of nodes with some frequency The nodes are entered in a separate file whose name is given by a print_some_file entry in the general options one node per line The entry nsome indicates the frequency in steps at which the data is saved and save_file_some the name of the file to save in found in file ns cpp int nstep default 10000 The number of time steps found in file ns cpp int print_linear_system_and_stop default 0 After computing the linear system solves it and prints Jacobian right hand side and solution vector and stops found in file ns cpp int print_residual default 0 Print the residual each nsave steps found in file ns cpp string print_some_file default lt none gt
106. e mesh for sub partitioning the dof graph in the IISD matrix found in file iisdcr cpp double pc_lu_fill default 5 PETSc parameter related to the efficiency in growing the factored profile found in file iisdcr cpp 24 int print_interface_full_preco_conv default 0 Flags whether or not print the convergence when solving the preconditioning for the interface problem when using use_interface_full_preco found in file iisdcr cpp int print_Schur_matrix default 0 Print the Schur matrix don t try this for big problems found in file iisdcr cpp int use_interface_full_preco default 0 Chooses the preconditioning operator found in file iisdcr cpp int use_interface_full_preco_nlay default 1 Number of layers in the preconditioning band should be nlay gt 1 found in file iisdcr cpp int asm_lblocks default 1 Chooses the number of local blocks in ASM found in file iisdmat cpp int asm_overlap default 1 Chooses the overlap of blocks in ASM found in file iisdmat cpp string asm_sub_ksp_type default preonly Chooses the preconditioning for block problems in ASM method found in file iisdmat cpp string asm_sub_preco_type default ilu Chooses the preconditioning for block problems in ASM method found in file iisdmat cpp string asm_type default restrict Chooses the restriction extension type in ASM found in file iisdmat cpp double atol default 1e 6 Absolute
107. e petscfem_step 3 petscfem_time 0 3 hello_post NN NN AHA HA 100 make petscfem_step 100 petscfem_time 10 hello_pre make petscfem_step 100 petscfem_time 10 hello_post make petscfem_step 2 petscfem_time 0 hello_close Inside the Makefile you can use the make variables petscfem_step and petscfem_time For instance you can do the Hello world trick by adding the targets In Makefile hello_init echo In init Do more things in init stage hello_pre echo In pre Do more things in pre stage HH hello_post echo In post Do more things in post stage HH hello_close echo In close Do more things in close stage For instance I love to gzip my state files with a command like this In the PETSc FEM data file hook_list shell_hook compress In the Makefile compress_init compress_pre compress_post for f in state tmp do echo gzipping f gzip f f done compress_close 11 Gatherers and embedded gatherers Typically gatherers are reduction operators over the element sets i e instead of assem bling a matrix or vector the gather operations produce some kind of global data as for instance e The volume surface length of an elemset 101 e The integral of some quantity fields coordinates or functions of the them over the element set for instance total concentration total energy total linear or angular momen
108. e we have to branches 80 branch b1 operations x fun and x norm_p_all 0 branch b2 operation x scale so that we define two branch objects b1 b2 and do the corresponding jumps 9 2 2 Branching is not always needed However branching is needed only if the instruction sequence changes during the same execution of the code For instance if you have a code like this FastMat2 CacheCtx2 ctx ctx use_cache 1 for int j 0 j lt N j ctx jump b1 Some FastMat2 code if method 1 Some FastMat2 code for method 1 else if method 2 Some FastMat2 code for method 2 More FastMat2 code If the method flag is determined at the moment of reading the data and then is left unchanged for the whole execution of the code then it is not necessary to protect do branching since the instruction sequence will be always the same Moreover if you perform a hevy loop N large with some value for method and then you change method then branching is still not needed provided that the cache structure ctx is rebuilt as in the code above However if the ctx is a global variable and method changed you only have to clear the cache structure ctx is some global variable if method last_used_method ctx clear ctx use_cache 1 for int j 0 j lt N j LD shes 9 2 3 Cache mismatch The cache process may fail if a cache mismatch is produced For instance consider the following variation of t
109. ed gt Radius of the channel found in file stream h e double Rf default 1 Resistivity including perimeter of the stream to loss to the aquifer found in file stream h e double roughness default 1 Roughness coefficient for the Manning formula a k a n found in file stream cpp e string shape default string undefined Choose channel section shape may be circular rectangular or triangular found in file stream cpp e double wall_angle default lt required gt Width and height of the channel found in file stream h e double wall_angle default lt required gt Aperture angle of channel found in file stream h e double width default lt required gt Width of the channel found in file stream h e double width_bottom default lt required gt Width of bottom of channel found in file stream h 6 8 The Hydrological Model cont The implemented code solves the problem of subsurface flow in a free aquifer coupled with a surface net of 2D or 1D streams 2D Saint Venant Model 2DSVM 1D Saint Venant Model 1DSVM and Kinematic Wave model KWM To model such system three element sets must be used an aquifer system representing the subsurface aquifer a 2D or 1D depending on the chosen model stream element set representing the stream and a 2D or 1D stream loss element set representing the losses from the stream to the aquifer or vice versa 45 6 9 Subsurface Flow
110. effectively load element properties in propel and after this we can use them as propel conductivity_indx1 and so on Macro shortcuts COND PROPA are handy for this 6 The general advective elemset 6 1 Introduction to advective systems of equations Advective system of equations are of the form U OF U Ot Ox where U is the state vector of the fluid in conservative variables Examples of these are the inviscid fluid equations the Euler or gas dynamic equations the shallow G 2 34 water equations and scalar advective systems that represents the transport of a scalar property like temperature or concentration of a component by a moving fluid The conservative variables for the Euler equations are U pu 3 pe for the Euler equations In general U is a vector of ngof components F U is the flux vector t can be thought as a matrix of Ndof X Ndim components The row index corresponds to a field value whereas the column index is a spatial dimension G is a source vector For the 2D or 3D Euler equations it is null but if we consider 1D flow in a tube of varying section then it has a source term in the momentum equation due to the reaction on the wall Also for the shallow water equations there is a reaction term in the momentum balance equations if there is a varying bathymetry The relation of the flux vector with the state vector is the heart of the advecti
111. element e 6 116 e dump_props_file is the name of the file where the per element values are written If it contains a d format sequence then it is replaced by the time step number using printf and friends e dump_props_freq is the frequency at which the values are dumped to file 11 5 Parallel aspects Of course PETSc FEM is in charge of adding the contributions on different processors The resulting sum of contributions over all elements in all processors The sum is available not only in the master rank 0 but in all processors as with an MPI_Allreduce call This is important since this global sum can be used also in hooks in order to perform computations For instance a gatherer can be used to compute the force on a body and this force can be passed to a hook to compute the movement of the body 11 6 Creating a gatherer Typically a gatherer is created by deriving from the virtual class gatherer and imple menting the set_pg_values method which is in charge of computing the integrands 107 class gatherer Y Ll public E perform several checks and initialization void init add Gauss point contributions void set_pg_values vector lt double gt amp pg_values FastMat2 amp u FastMat2 kuold FastMat2 amp xpg FastMat2 amp Jaco double wpgdet double time e The init function may be used for initialization The TGETOPTDEF thash macro can be used for extracting options of the elemse
112. emset visc_force_integrator 12 geometry quad2hexa __END_HASH__ 123456789 10 11 12 __END__ELEMSET__ 11 3 Automatic computation of layer connectivities Sometimes it is somewhat cumbersome to compute the connectivities of the embed ded gatherer connectivities since it involves the finding of layers of nodes inside the adjacent volume element This can be done automatically by PETSc FEM if the identify_volume_elements is activated and the name of the adjacent volume element is passed through the option volume_elemset for instance elemset nsi_tet_les_full 4 geometry cartesian2d name viscous_fluid 105 __END_HASH__ 1276 6 712 11 2387 __END__ELEMSET__ elemset visc_force_integrator 6 volume_elemset viscous_fluid identify_volume_elements geometry line2quad __END_HASH__ 121111 231111 341111 __END__ELEMSET__ Note that the nodes in the inner layers of nodes are replaced by 1 s With this setting the code will inspect the connectivity of the viscous_fluid elemset and find the nodes corresponding to the inner layers and replace the 1 s by the correct n ode number 11 4 Passing element contributions as per element properties For some applications it is desirable to have the individual element contributions instead of having their sum For instance in a fluid structure application involving a fluid and a deformable solid it does not suffice only with the integral of the forces to compute the evolution of the solid bu
113. encil of the interface matrix on a planar interface in a homogeneous grid of step h with bilinear quad elements is 18 1 3h Such matrix has a condition number which is independent of h and is asymptotically k A71 5 3 In the case of using tri angular elements by the way this is equivalent to the finite difference case the stencil 140 is 14 1 2h whose condition number is Arr 3 This low condition number also favors the use of an iterative method However a disdvantage for the iterative solution is that non stationary iterative solvers i e those where the next iteration 2 doesn t depend only on the previous one f x like CG or GMRES can not be nested inside other non stationary method unless the inner preconditioning loop is iterated to a very low error This is because the conjugate gradient directions will lose orthogonality But using a very low error bound for the preconditioning problem may be expensive so that non stationary iterative methods are discarded Then the use of Richardson iteration is suggested Sometimes this can diverge and some under relaxation must be used Options controlling iteration for the preconditioning problem can be found in section 4 4 3 16 3 Implementation details of the IISD solver e Currently unknowns and elements are partitioned in the same way as for the PETSc solver The best partitioning criteria could be different for this solver than for the PETSc iterative so
114. epends on the gradient of the depth H and in the 1D Euler equations on the area of the tube section A z This is taken into account by PETSc FEM by assuming that the user enters in the nodedata section nu ngim ny quantities per node where the first ngim quantities are the node coordinates and the rest are assumed that are node data that has to be passed to the flux function routine together with its gradient in order to compute the source term e Matrix amp grad_H input size Naim X Ny the gradient of the quantities in H see previous entry e Matrix flux output size Ndof X Ndim Each column is the vector F of fluxes for each of the governing equations 38 e vector lt Matrix gt A_jac output size a vector Of ngim pointers to matrices of Ndof X Naor Each matrix is the jacobian matrix as defined by 5 To access the jd jacobian matrix you may write A_jac jd 1 The macro AJAC jd defined in advective h expands to this e Matrix amp A_grad_U output size nao X 1 compute if options amp COMP_UPWIND This is the accumulation term A 9U 0x ma e Matrix amp grad_U input size Ndim X Naot The gradient of the state vector e Matrix amp tau_supg output size either 1 x 1 or naof X Naof compute if options amp COMP_UPWIND This is the 7 intrinsic time scale it may be either a scalar or a matrix Beware that even in the case where it is a scalar it must be returned as a Newmat Matrix object of d
115. erences in the Finite Difference Method It is well known that the Galerkin formulation leads to oscillations for advective systems and this is solved by adding a stabilizing term to the discretized equations 6 3 SUPG stabilization In the SUPG for Streamline Upwind Petrov Galerkin formulation of Hughes et al The stabilized formulation is MU F U G gt gt f Puro G Ft 0 0 11 Li where the whole expression corresponds to the j th block of size ngo in the global equa tions Note that as the added term is a weighted residual form of the residual the term in parentheses then the continuum solution is solution of these discrete equations we say that this is a consistent formulation Psupa is a matrix of ngor X Ndor the SUPG perturbation function usually defined as ON Psupa da 12 where T are the characteristic or intrinsic time of the element defined as he T 13 LA where h is the size of the element and and A represents some norm of the vector of jacobians There is a variety of possibilities for computing both quantities For instance h may be computed as the largest side of the element or as the radius of the circle with the same area On the other hand A may be computed as the maximum eigenvalue of all the linear combinations of the form n A with nj a unit vector i e the maximum propagation velocity possible in the fluid that i
116. ets the number of files in the rotary save mechanism see 7 2 found in file ns cpp e int ngather default 0 Number of gathered quantities found in file ns cpp e int nnwt default 1 Number of inner iterations for the global non linear Newton problem found in file ns cpp e int nrec default 1 Sets the number of states saved in a given file in the rotary save mechanism see 7 2 found in file ns cpp e int nsave default 10 Sets the save frequency in iterations found in file ns cpp e int nsaverot default 100 Sets the frequency save for the rotary save mechanism Sometimes it is interesting to save the state vector with a certain frequency in a append manner i e appending the state vector at the end of the file However this posses the danger of storing too much amount of data if the user performs a very long run The rotary save mechanism allows writing only a certain amount of the recent states The mechanism basically saves the state vector each nsaverot steps appending to the a file The name of the file is contructed from a pattern set by the user via the save_file_pattern entry by replacing d by 0 a la printf O For instance if save_file_pattern is set to file d out then the state vectors are appended to file0 out When the number of written states reach the nrec count the file is reset to 0 and the saving continues from the start of the file However if nfil
117. evious example Key name Value My Mesh Key ndim Value 2 1 Key viscosity Value 1 and so on This hash table is read in an object of type TextHashTable without checking whether this properties apply to the particular elemset or not The values are stored strictly as string so that no check is performed on the syntax of entering double values or integers 29 5 2 Text hash table inclusion Often we have some physical or numerical parameter that is common to a set of elemsets for instance gravity in shallow water or viscosity in Navier Stokes In this case these common properties can be put in a separate table with a table header and included in the elemsets with _table_include directives The table sections not associated to a particular elemset have to be put in preference before the calling elemset for instance table steel_properties density 13 2 viscosity 0 001 lt more steel properties here gt __END_HASH__ elemset nsi_tet 4 _table_include steel_properties npg 8 weak_form 1 lt more properties for this elemset here gt __END_HASH__ 1342 lt more element connectivities here gt Text hash tables may be recursively included to any depth The text hash table for an elemset may be included in other elemset referencing it by its name entry for instance elemset nsi_tet 4 name volume_elemset_1 geometry cartesian2d npg 4 viscosity 0 23 lt more properties here
118. f the problem that defines the FEM matrices and vectors may vary The FastMat2 matrix class has been designed in order to perform matrix computations at the element level It is assumed that we have an outer loop usually the loop over elements that is executed many times and at each execution of the loop a series of operations are performed with a rather reduced set of local vectors and matrices There are many matrix libraries but often the performance is degraded for small dimensions Sometimes performance is good for operations on the whole matrices but it is degraded for operations on a subset of the elment of the matrices like columns rows or individual elements This is due to the fact that acessing a given element in the matrix implies a certain number of arithmetic operations Otherwise we can copy the row or column in an intermediate object but then there is an overhead due to the copy operations The particularity of FastMat2 is that at the first execution of the loop the address of the elements used in the operation are cached in an internal object so that in the second ans subsequent executions of the loop the addresses are retrieved from the cache 9 1 1 Example Consider the following simple example We are given a 2D finite element composed of triangles i e an array xnod of 2 x Nyoq doubles with the node coordinates and an array icone with 3 x nelem elements with node connectivities For each element 0 lt 7 lt Nelem its n
119. for instant chunk_size sets the quantity of elements that are processed by the elemset routine at each time In section 5 1 some specific properties of the elemset properties text hash table are described 5 1 The elemset hash table of properties A very common problem in FEM codes is how to pass element physical or numerical properties conductivities viscosities etc to the element routine In PETSc FEM 28 you can pass arbitrary per elemset quantities via an elemset properties hash table This comes after the element header as in the following example elemset nsi_tet 4 fat 4 elemset header props per element properties to be explained later Elemset properties hash table name My Mesh geometry cartesian2d ndim 2 npg 4 couple_velocity 0 weak_form 1 physical properties Dt 1 time step viscosity 1 next lines flags end of the properties hash table __END_HASH__ element connectivities 1342 3564 5786 lt more element connectivities here gt 193 195 196 194 195 197 198 196 197 199 200 198 199 201 202 200 next lines flags end of elemset connectivities __END_ELEMSET__ In this example we define several properties containing doubles viscosity and Dt integers ndim npg etc or strings name This hash table is stored in the form of an associative array hash with a text key the first non blank characters and a value composed of the rest of the line In the pr
120. from t to this instant with period tn t spline Spline interpolated function This is similar to piecewise but data is smoothly interpolated using splines e npoints integer The number n of points to be entered t n doubles The instants t f n doubles The function values final_time double The function is extended from t to this instant with period tn t spline_periodic Spline interpolated periodic function Spline interpolation with piecewise may give poor results specially if you try to match smoothly the beginning and end of the period spline_periodic may give better results at the cost of being restricted to enter the data function in an equally spaced grid tj44 t At cnst e npoints integer The number n of points to be entered period double The period T of the function e start_time double default 0 The first time instant t The remaining instants are then defined as t t 1 n 1 T f n doubles The function values at times t Example The lines fixa_amplitude spline_periodic period 0 2 npoints 5 start_time 0 333 ampl_vals 1 0 1 O 1 Defines a function with period T 0 2 that takes the values 0 333 nT 1 0 333 n Ya 0 333 n Yp 0 333 n 34 T T 0 oa 144 0 129 Actually the resulting interpolated function is simply t 0 333 p t cos 2 a 145 T Implementation detai
121. he previous code FastMat2 CacheCtx2 ctx FastMat2 CacheCtx2 Branch bi b2 b3 Il for int j 0 j lt N j ctx jump b1 x fun rnd 81 double len x if len lt 1 0 in ctx jump b2 x scale 1 0 len else if len gt 1 1 ctx jump b3 x set 0 0 norm_p_all There is an additional block in the conditional if the length of the vector is greater than 1 1 then the vector is set to the null vector We have now three branches The code shown works OK but if let s say we forgot to add a third branch and replace the ctx jump b3 for a ctx jump b2 then when reaching the x set 0 0 operation the corresponding cache would be the cache corresponding to the x scale operation and most probably and error or incorrect computation will occur Every time that the retrieved cache doesn t exist or doesn t match with the operation that will be computed we say that there is a cache mismatch 9 2 4 When a cache mismatch is produced Basically the information stored in the cache and then retrieved from the objects that were passed in the moment of creating the cache must be the same needed for performing the current FastMat2 operation that is e The FastMat2 matrices involved must be the same i e their pointers to the matrices must be the same e The indices passed in the operation for instance for the prod ctr sum opera tions e The masks see 9 1 2 applied to ea
122. he volume elemsets that compute for each volume element the neares wall element This is stored as an integer per element property in the volume elemsets In order to reduce memory requirements only an index in the data_pts array is stored As several wall elemsets may exists an array of pair lt int elemset gt is used to store pointers to the data_pts array in order to know to which wall elemset the given index in data_pts belongs vector lt double gt data_pts_ new vector lt double gt vector lt ElemToPtr gt elemset_pointer new vector lt ElemToPtr gt WallData wall_data if LES Y VOID_IT arg1 argl arg_add data_pts_ USER_DATA argl arg_add elemset_pointer USER_DATA Elemset elemset NULL argl arg_add elemset USER_DATA ierr assemble mesh argl dofmap build_nneighbor_tree amp time CHKERRA ierr PetscPrintf PETSC_COMM_WORLD After nearest neighbor tree n wall_data new WallData data_pts_ elemset_pointer ndim lt gt lt gt lt gt lt gt lt gt lt gt Find nearest neighbor for each volume element VOID_IT argl argl arg_add wall_data USER_DATA ierr assemble mesh argl dofmap get_nearest_wall_element amp time CHKERRA ierr In the jobinfo build_nneighbor_tree call to assemble a loop over all the el ements in the elemset ignoring to what processor it belongs must be performed Oth erwise each process
123. hrough this option e store_in_property_name is a string that identifies the per element property that must be filled with the computed values Of course the user must define such property with the appropriate size The values in the connectivity table are irrelevant for instance null values and will be overwritten by the gatherer e compute_densities Ifcompute_densities 0 then the value set in the per element property is the integral of the integrand over the element That is if the integrand is p the computed value for the element is value for element e o 115 Qe Conversely if compute_densities 1 the value set is the density of the mean value of the integrand i e o l e Re Jo For instance if the integrand is the heat flow through the surface then ifcompute_densities 0 then the value set in the per element property is the total heat flow through the element which has units of energy per unit time whereas ifcompute_densities 1 then the value set is the mean heat flow density which has units of energy per unit time and unit surface For the traction on a surface the passed value is the total force on the element in one case units of force and the mean skin friction in the other units of force per unit area Of course this option has no effect on the values passed via the global gather_values vector e dump_props_to_file activates the mechanism of storing per element computed val ues in a file value for
124. hysical property is shared by all the elements for instance viscosity or specific heat then it is defined once for all the elemset If the quantity varies continuously in the region but it is known a priori then it can be passed as a per element property see 85 6 but that means storage of a double or the amount of memory needed for that property for each element If it is not the same on all the mesh but is constant over large regions then it may be convenient to divide the mesh in several elemsets where the given property has the same value over the elements of each elemset 4 General layout of the user input data file Input to application packages like ns advective or laplace is feed via input data files usually with extension dat This file contains global options node coordinates element connectivities physical properties fixations and boundary conditions etc Even if the precise format of the file may be changing it is worth to describe the general layout The file is divided in sections the table sections the nodedata section several elemset sections the fixa section and constraint section Each section starts with the keyword identifying the section in a separate line followed by several parameters in the same line Follows several lines that makes the section ending with a terminator of the form __END_ lt some identifier gt __ for instance __END_HASH__ Note that these terminators start and end with double
125. idea to fix this is to somewhat rigidize the elastic constants of the fictituos material in order to minimize the distortion of the elements Designing this nonlinear material behaviour should guarantee a unique solution and a relatively easy way to compute the Jacobian A possibility is to have an hyperelastic material i e to define an energy functional F e j as function of the strain tensor One should guarantee that this functional is convex and one should have an easy way to compute its derivatives and second derivatives A related approach implemented in PETSc FEM is to compute for simplices the transformation from a regular master element to the actual element and to define the energy functional to be minimized as a function of the associated matrix tensor If Jj Ox 0 where x are the spatial coordinates and the master coordinates then the metric tensor is Ox OLE T OG DE The Jacobian of the transformation may be computed by differentiating the interpolated displacements 94 al Y Miden 95 H with respect to the master coordinates so that Ox ON Jiz t A 96 We want to compute the distortion function to be minimized as a function of the eigen values of the metric tensor Gij The eigenvalues v4 of G are such that GijUgj Aqvaj 97 so that Ag Ugi Gij Ugj 98 72 no sum over q since the v are orthogonal Now if the functional F is a function of the element node coordinates
126. ifer Gs P Ry b h u 19 where Ry is the resistivity factor per unit arc length of the perimeter The corresponding gain to the aquifer is Ga Gs or 20 where I represents the planar curve of the stream and dp is a Dirac s delta distribution with a unit intensity per unit length i e L f x p dB f f x s ds 21 The stream_loss elemset represents this loss and a typical discretization is shown in figure 4 The stream loss element is connected to two nodes on the stream and two on the aquifer and must be entered in that order in the element connectivity table for instance elemset stream_loss 4 lt elemset properties gt __END__HASH__ lt n5 gt lt n4 gt lt nl gt lt n2 gt __END__ELEMSET__ 6 7 1 Related Options e double a_bar default 1 Unit conversion factor for Manning friction law found in file stream cpp e double Bi default lt required gt Width of the channel found in file stream h e double Ch default lt required gt Chezy roughness coefficient found in file stream h 44 e double diameter default lt required gt geometry of the channel found in file stream h e string friction_law default string undefined Choose friction law may be manning or chezy found in file stream cpp e int impermeable default 0 Flag whether the element is impermeable Rf 00 or not found in file stream h e double radius default lt requir
127. ified files to carry prominent notices stating that you changed the files and the date of any change b You must cause any work that you distribute or publish that in whole or in part contains or is derived from the Program or any part thereof to be licensed as a whole at no charge to all third parties under the terms of this License c If the modified program normally reads commands interactively when run you must cause it when started running for such interactive use in the most ordinary way to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty or else saying that you provide a warranty and that users may redistribute the program under these conditions and telling the user how to view a copy of this License Exception if the Program itself is interactive but does not normally print such an announcement your work based on the Program is not required to print an announcement These requirements apply to the modified work as a whole If identifiable sections of that work are not derived from the Program and can be reasonably considered independent and separate works in themselves then this License and its terms do not apply to those sections when you distribute them as separate works But when you distribute the same sections as part of a whole which is a work based on the Program the distribution of the whole must be on the terms of this License whose permissions for o
128. imensions 1 x 1 e double amp delta_sc output double compute if options amp COMP_UPWIND This is the shock capturing parameter as described in 6 4 Set to 0 if no shock capturing is added e double amp lam_max output double compute if options amp COMP_UPWIND The maximum propagation speed The expression is 14 but normally it may be computed directly from the state vector This is used to compute upwind and also the automatic and local time step e TextHashTable thash input This is the TextHashTable of the elemset Physical and numerical parameters can be passed from the user input data file to the flux function routine through this specific heat specific heat ratio gravity for instance Beware that the flux function routine is called at each Gauss point and decoding of the table may be time expensive so that if the properties are constant over all the mesh you can decode them once and leave the decoded data in static variables e double propel input double array compute if options amp COMP_UPWIND This is the table of per element properties Physical and numerical parameters that are not constant for all the elemsets can be passed from the user input data file to the flux function routine through this e void user_data input Arbitrary information may be passed from the main routine to the flux function through this pointer e int options input integer e Matrix amp G_source output size Nao
129. inear case will be considered later on 14 3 A small example Figure 20 A small example showing boundary conditions Consider the region shown in figure 20 composed of 8 triangle elements and 9 nodes We have 9 x 3 27 node field values but periodic boundary conditions on side 1 4 7 to side 3 6 9 eliminates the 9 unknowns on these last three nodes In addition at the outlet boundary nodes 1 2 3 there is only one in going component so that the unknowns here 120 are only wi w w2 and w3 On the other hand on the inlet boundary u and v are imposed so the vector of unknowns is U wy Us w U wi U w3 Us us UG Va 128 U7 pa Ug us Ug v5 U10 Ps Ui p7 Ui2 ps while the prescribed values are A Ur i Us Wy a 129 U4 v7 Us ug Us vs 121 And the relation defining matrices Q and Q are uy Yi Ur Vo 01 Va Us v Va Ui VA U1 Vox Us p Voy Uy Vay U1 Vox U2 uz V Oz V U V Ua va VA U2 VA U3 VA U4 po VA Da VZ U VA U4 uz cosa Vi Uy Vib U1 Vh U2 sin a Va Oy VA U1 VA U2 v3 sina Vi O VE Ui Ys U2 cos a Va Oy V U1 Va U2 p3 Va O VA U1 VA Us u4 Us Va Us pa U7 us Ug vs Uy 130 ps U10 ue cos a U5 sina Ug ve sin a Us cos a Ue pe U7 u7 U3 v7 U4 p7 Ui ug Us vg Us ps U12 ug Uz vg Us Pg Ui As we can see the boundary condit
130. ines The preprocessing capabilities supported in this class are the minimal ones needed in order to treat very large files efficiently More advanced preprocessing capabilities will be added in a higher level layer written in Perl or similar may be ePerl 4 2 1 Syntax The syntax of comments and continuation lines is borrowed from Unix shells Comments From the first appearance of to the end of the lines is considered a comment Continuation lines If a line ends in NX i e if backslash is the last character in the line before newline J then the next line is appended to the previous one to form a logical single line File inclusion The directive __INCLUDE__ path to file inserts the contents of file in directory path to to this file in the actual point Directories may be absolute or relative we use fopen from stdio File inclusion may be recursive to the limit of the quantity of files that can be kept open simultaneously by the system Echo control The directives __ECHO_ON __ _ECHO_OFF__ controls whether the lines read from input should be copied to the output Usually one may be interested in copying some part of the input to the output in order to remember the parameters of the run As implemented so far this feature is recursive so that if included files with the internal preprocessing i e with the __INCLUDE__ directive will be also copied to the output unless you enclose
131. ing a term 7 to the free surface level see doc for free_surface_damp The equation of the free surface is dn dt w 92 where 7 elevation and w is the velocity component normal to the free surface We modify this as follows d Cog F Cin gt Caamp An Ww 93 Where 7 is the average value of eta on the free surface and Caamp free_surface_damp smoothes the free surface adding a Laplacian filter Note that if only the temporal derivative and the Laplace term are present in 93 then the equation is a heat equation A null value which is the default means no filtering A high value means high filtering Warning A high value may result in unstability Cdamp has dimensions of L T like a diffusivity One possibility is to scale with mesh parameters like h At other is to scale with ht g05 Currently we are using Cdamp C4 h15g05 with Cae 2 found in file nssupg cpp amp e double free_surface_set_level_factor default 0 This adds a Cif term in the free surface equation in order to have the total meniscus volume constant found in file nssupg cpp e double fs_eq_factor default 1 Cey fs eq factor see doc for free surface damp option is a factor that scales the free surface rigidity Cog 1 which is the default means no scaling a zero value means infinitely rigid as for an infinite gravity found in file nssupg cpp e string geometry default cartesian2d Type of element geometry to defi
132. ion pq i e the permutation resulting of applying first q and then p also belong to the group as well as the product qp In general this is a finite group As a consequence if we have two elements p and q of a group G then pq qp pq pqp pq all belong to G Given a set of permutations S C G the set pS formed by the left product of all the elements of S with p belongs to G The same applies to Sp gS and Sq So that starting with the set So 1 where 1 is the identity permutation we can generate recursively a larger set of symmetries by forming S So pSo Sop qSo Soq As the total number of permutations in the group is finite at most n where n is the number of nodes in the object applying recursively the relation above will end with a set of permutations H that is itself a group included in G We will call it the group generated by p an q This can be applied to any number of generators p q T S For instance the symmetries for the oriented triangle can be generated with the per mutation p 1 2 0 since 2 0 1 can be generated as p The symmetries for the unoriented triangle can be derived from generators 1 2 0 rotation and 0 2 1 in version The most relevant geometrical objects and their symmetries are 0 0 2 2 Figure 34 Generators for triangle e Unoriented triangles are described by numbering their nodes in any way Their symmetries are all the permutations 6 2 en
133. ions result in sparse Q and Q matrices Moreover in real large problems most of the rows correspond to interior nodes such as node 5 here so that Q is very close to a permutation matrix If we think at accessing the elements of Q by row then could store Q as a sparse matrix but in this case we need an average of two integers for pointers and a double for the value per unknown whereas a permutation matrix can be stored with only one integer per unknown One possibility is to think at these matrices as permutations followed by a sequence of operations depending on each kind of boundary conditions but it may happen also that several kind of b c s superposes as int the case of node 3 periodic absorbing and node 9 periodic Dirichlet 122 14 4 Inversion of the map It is necessary sometimes to invert relation 123 i e given Unp and U to find U for instance when initializing a temporal problem Of course this may not be possible for arbitrary Unp and U since Q is in general rectangular but we assume that QTQ is non singular and solve 123 in a least square sense After we may verify if the given data Unr and U is compatible by evaluating the residual of the equation This operation should be performed in O N operations 14 5 Design and efficiency restrictions The class of matrices representing Q and Q should have the following characteristics e Be capable of storing arbitrary matrices e Should be efficient for pe
134. it in your data files as follows 131 amplitude_function my_own tau 3 0 __END_HASH__ 123 14 8 4 Dynamically loaded amplitude functions Another possibility to add new amplitude functions is through loading functions at run time This avoids re linkage and modification of the sources Note You need to have the module compiled with the appropriate flag either the compilation flag DUSE_DLEF or the Makefile variable USE_DYNAMICALLY_LOADED_EXTENDED_FUNCTIONS yes either in the Makefile base or the Makefile defs file These amplitude functions are written in C in source files say fun cpp The com piled files have extension efn from extension function for instance fun efn in this example The Makefile base file provides the appropriate rules to generate the compiled efn functions from the source For each amplitude you want to define you have to write three functions extern C void lt prefix gt init_fun TextHashTable thash void amp fun_data extern C double lt prefix gt eval_fun double t void fun_data extern C void lt prefix gt clear_fun void fun_data this is optional Prefix may be an empty string or a non empty string ending in underscore Use of non empty prefix is explained below The macros INIT_FUN EVAL_FUN and CLEAR_FUN expand to these definitions The second function eval_fun is that one that defines the relation between the time and the corresponding amplitude The first init_fun
135. it state U Up we can proved to be perfectly absorbing since as U U at the boundary we can expand each of the conditions around U and it would result in an equation similar to 52 but centered about Ugo LU Ue 0 j 1 n7 56 The problem is that in general the differentials are not exact Riemann invariants are functions that satisfy 53 under some restrictions on the flow For instance Riemann in variants can be computed for compressible gas flow if we assume that the flow is isentropic They are wi s log p p A 4 entropy 2 w23 u T 23 u a acoustic waves 57 Y w45 u t12 A45 4 vorticity waves Boundary conditions based on Riemann invariants are then absorbing in some particular cases 52 6 12 3 Absorbing boundary conditions based on last state Another possibility is to linearize around the last state U i e IL U 0 t UG U 0 t 0 58 This equation is always perfectly absorbing in the limit because we are always linearizing about a state that in the limit will tend to Us and doesn t need the computation of Riemann invariants which could be unknown for certain problems Also this boundary condition is fully absorbing even in the case of inversion of the sense of propagation of waves The drawback is that we have no external control on the internal states i e the limit internal state does not depend on some external value as th
136. l projected system and consider the sign of the eigenvalues of A un c and un c We remark that if n is the outward unit normal to the boundary edge an inflow boundary corresponds to u n lt 0 and an outflow one to u n gt 0 Fluvial Boundary e inflow boundary u specified and the depth h is extrapolated from interior points or vice versa e outflow boundary depth h specified and velocity field extrapolated from interior points or vice versa Torrential Boundary e inflow boundary u and the depth h are specified e outflow boundary all variables are extrapolated from interior points Solid Wall Condition We prescribe the simple slip condition over P sip C Ist u n 0 41 Kinematic Wave Model The applicability of the kinematic wave as an approximation to dynamic wave was discussed in Rodriguez 1995 and according to Lighthill and Whitham subcritical flow conditions favor the kinematic approach Since one characteristic is propagated inside the domain only we can specify the water head the channel section or the discharge at inflow boundaries see eq 33 6 12 Absorbing boundary conditions Absorbing boundary conditions are a very useful feature for the solution of advective diffusion problems They allow the user to put artificial boundaries closer to the interest region and also accelerate convergence to steady solutions since provide the highest rate of energy dissipation through the boundaries In PETS
137. le 5 7 Taking values transparently from hash table or per element table The general advective elemset 6 1 Introduction to advective systems of equations 13 13 13 14 15 16 16 17 17 17 18 18 20 20 21 21 22 22 22 23 24 27 27 27 28 28 30 31 31 31 32 33 34 6 2 Discretization of advective systems o e e ee 35 bd SUPC stapiizati i cos ose ea he Ee a a a ee ae a A 36 BA Shock Gapturihg so oa pamete a a ca a 37 6 5 Creating a new advective system 2 e o so 37 6 6 Flux function routine arguments e 38 o E RO we A a a a ek a oe 39 6 7 The hydrology module 22k ee Re ee 42 Oil Related Options c coss ke ek ee ee RR A ee 8 44 6 8 The Hydrological Model cont lt lt 45 6 0 Subsurtace Plow ieo senas nagie Ee a tered do 46 6 1 Suelace Plow ocasiona ra A a 46 6 10 1 2D Saint Venant Model o e o 46 6 10 2 1D Saint Venant Model o o e e 47 6 10 3 Kinematic Wave Model o o e 47 6 11 Boundary Conditions lt ss seras A ee a 48 6 11 1 Boundary Conditions to simulate River Aquifer Interac unas Coupling Termik sas Re ee Ee e he we 48 6 11 2 Initial Conditions First Second and Third Kind Boundary Condi tions Absorbent Boundary Condition 48 6 12 Absorbing boundary conditions 0
138. le storing a state where to read the state to be visualized e output output_array_list type field Description Group object of imported arrays e output output_field_list type field Description Group object of imported fields 114 13 4 DX hook options int dx_auto_combine default 0 Auto generate states by combining elemsets with fields found in file dxhook cpp int dx_cache_coords default 0 Uses DX cache if possible in order to avoid sending the coordinates each time step or frame Use only if coordinates are not changing in your problem found in file dxhook cpp double dx_coords_scale_factor default 1 Coefficient affecting the new displacements read New coordinates are cO x0 c u where cO dx_coords_scale_factor0 cl dx_coords_scale_factor x0 are the initial coordinates and u are the coordinates read from the given file found in file dxhook cpp double dx_coords_scale_factor0 default 1 Coefficient affecting the original coordinates See doc for dx_coords_scale_factor found in file dxhook cpp int dx_do_make_command default 0 If true then issue a make dx_step lt step gt dx_make_command found in file dxhook cpp string dx_node_coordinates default lt none gt Mesh where updated coordinates must be read found in file dxhook cpp int dx_port default 5314 TCP IP port for communicating with DX 5000 lt dx_port lt 65536 found in file dxhook cpp int dx_read
139. little probability of having collisions between them adn the probability of collision is almost by random i e m M Then the number of collisions is approximately m2 br of collisi M nbr of collisions Ni x 7 j 0 163 We consider the following sequence hasshing functions e Hasher SVID rand48 functions If we have some kind of pseudo random gener ator in the form y rand s where s is the seed then we can do a hashing function with the following seudocode int hash int x int n Y int v 0 for int j 0 j lt n j v rand f v x jl return V where f v x is some binary functions that combines the values of the current state v and the incoming sequence element x j The rand48 hasher is based on the random function based on the SVID library coming with the GNU C library version 5 3 12 at the moment of writing this e FastHasher This is based in a simple pseudo random function of the form int rand int v int x a v X int y v x 2 y y MAX y y m 152 int hash int w int n Y int v C for int j 0 j lt n j v rand v x where MAX 2 and c 0x238e1f29 m 0x6b8b4567 e MD5Hasher This is based on the MD5 routines from RSA This is an elaborated algorithm that creates a 16 byte hash value from a string of characters We take as hash value the first 4 bytes from this digest 18 Synchronized buffer One difficult task in parallel programming is p
140. locity field must be computed in order to compute the stress on the skin But if the velocity field is known only on the surface think for simplicity in the case a of plane surface the normal component of the gradient can not be determined In order to solve this a special class of gatherers have been developed namely the embedded_gatherer class Typically such an elemset is composed is a surface elemset associated with a volume one For instance see figure 12 the user can have a fluid problem with a volume elemset in 2D composed of cartesian2d and wants to compute the viscous traction on the solid surface AB For this she adds an elemset visc_force_integrator composed of six nodes elements composed of three layers of segments parallel to the surface 103 as for instance the element 3 4 8 9 14 marked with a dashed line in the figure This special elements can be seen as layers of surface elements parallel to the skin With the velocity values computed by the Navier Stokes solver at these nodes the gatherer elemset can compute high precision approximations to the normal derivatives of velocity at the surface e Note that this requires that the mesh must be somewhat structured near the surface However it is usual to add structured layer on the body skin in order to correctly capture the boundary layer e Also it requires the construction of the connectivities of these layers what may be cumbersome but we will see later that this
141. ls If we define the phase 0 as t t 0 2r 146 then is periodic with period 27 We can decompose it in two even functions p7 0 as 0 0 P 0 2 147 a _ 0 9 p 0 0 NETOS so that p 0 H7 0 sin y 0 148 As are even function they may be put in terms of x 1 cos 2 So that x and pF are computed and by construction only one half of the values say j 1 to j m n 1 2 1 are relevant The values and f have to be computed specially since sin 0 for them If we take the limit p0 o 9 m 2sin 0 149 then if linear interpolation is assumed in each interval it can be shown that PO P0 _ da n1 5 ae 6 for 0 lt A 150 Then 6 6 san SO 1 8 d2 bn E 0 0 2 sin 0 2A0 We could take then this value for 9 however this introduces some noise For instance if 0 sin then p 0 and j 1 for j 1 m and 151 gives 0 p 0 _ sin A9 ji ao 2sin A0 ee We introduce then a correction factor hu so that we define h2 n 1 e 1 bo 2sin A0 158 Analogously we define Pm 1 7 Pm 1 Pm 2sin A0 es 130 14 8 2 Implementation details Temporal boundary conditions are stored in an object of class Amplitude This objects contains a key string identifying the function and a text hash table containing the pa rameters for that function for instance omega
142. lver Element in processor 0 Element in processor 1 O dof in processor 0 S dof in processor 1 dof s in processor 0 connected to dof s in processor 1 Figure 27 IISD deccomposition by subdomains Actual decomposition e Selecting interface and local dof s One strategy could be to mark all dof s that are connected to a dof in other processor as interface However this could lead to 141 an interface dof set twice larger in the average than the minimum needed As the number of nodes in the interface set determines the size of the interface problem 160 it is clear that we should try to choose an interface set as small as possible In read_mesh partitioning is done on the dual graph i e on the elements Nodes are then partitioned in the following way A node that is connected to elements in different processors is assigned to the highest numbered processor Referring to the mesh in figure 26 and with the same element partitioning all nodes in the interface would belong to processor 1 as shown in figure 27 Now if a dof is connected to a dof j on other processor we mark as interface that dof that belongs to the highest numbered processor So in the mesh of figure 27 all dof s in the interface between element sub domains are marked to bel
143. m via an abstract interface A first approach to writing a Finite Element program with OOP philosophy is to define each element or node as an object However this is both time and memory consuming since accessing each element or each node is performed by passing through the whole code layer of the element or node class As one of the primary objectives of PETSc FEM is the efficiency we solved this by defining the basic objects as a whole set of elements of nodes that share almost all the properties aside element connectivities or node coordinates This is very common in CFD where for each problem almost all the elements share the same physical properties viscosity density etc and options number of integration Gauss points parameters for the FEM formulation etc Thus for each problem the user defines a nodedata object and one or several elemset objects Each elemset groups elements of the same type i e for which residuals and matrices are to be computed by the same routine Usually in CFD all the elements are processed by the same routine so that one may ask for what it may serve to have several elem sets First some boundary conditions constant flux or mixed boundary conditions for heat conduction absorbing boundary conditions for advective systems may be imposed more conveniently through an elemset abstraction Also several elemsets may be used for reducing the space required for storing varying physical properties If some p
144. n lt and gt delimiters for instance lt pi 2 atan2 1 0 theta sin pi 4 gt some text here theta lt print theta gt beta lt print 4 theta gt results after being processed by ePerl in some text here theta 0 707106781186547 beta 2 82842712474619 17 The basic rules are e Variables start with following by a C like identifier alphanumeric plus underscore case sensitive for instance alpha or my_variable e Statements end in semicolon e The text inserted in place of the ePerl block is the output of the commands inside the block This output is done in Perl with the print O function but in ePerl there is a shortcut of the form lt expression gt e Mathematical functions sin cos tan exp atan2 y x are available as in C powers x are expressed as x y i e Fortran like 4 3 2 Variables You can define variables to use them after in different places and also in mathematical expressions lt Reynolds 1000 L 2 rho 1 345 velocity 3 54 Grashof 6 e4 gt mu lt rho x velocity L Reynolds gt Nusselt lt Reynolds Grashof 0 25 gt which results in mu 0 0095226 Nusselt 88 0111736793393 4 3 3 Text expansion It is common to have several lines of text that have to be repeated several times For instance some options that hae to be applied to several elemsets The trick is to assign the text to a variable via the here in document
145. n section 4 2 Off course this internal preprocessing may be combined with any previous preprocessing package such as m4 or ePerl In section 34 2 we describe internal preprocessing while in section 4 3 we describe preprocessing with ePerl The reason to have this two mechanisms of preprocessing is the following Preprocessing with ePerl or m4 or any other packages is very powerful and well supported however the mechanism is to create an intermediate file and this file may be very large for large prob lems so that some internal preprocessing including at least the file inclusion is needed On the other hand including all the preprocessing capabilities of preprocessing as in ePerl is beyond the scope of PETSc FEM so that we preferred to keep with this two levels of preprocessing The idea is to have a small user data file where the strong capabilities of an external preprocessing package may be used while performing file inclusion of very large files containing node coordinates element connectivities and so on as the file is reading avoiding the creation of large intermediate files In addition this allows the user to choose the external preprocessing package 15 4 2 Internal preprocessing The FileStack class This class allows reading from a set of linked files the PETSc FEM data file including node coordinates mesh connectivities etc with some preprocessing capabilities Supports file inclusion comments and continuation l
146. n3 cpp include lt src ampli h gt class linramp public double f0 f1 t0 t1 slope void init TextHashTable thash double eval double F void linramp init TextHashTable thash int ierr TGETOPTDEF_ND thash double t0 0 TGETOPTDEF_ND thash double t1 1 TGETOPTDEF_ND thash double f0 0 TGETOPTDEF_ND thash double f1 1 slope f1 f 0 t1 t0 double linramp eval double t if t lt tO return f0 else if t gt t1 return f1 else return f0 slope t t0 DEFINE_EXTENDED_AMPLITUDE_FUNCTION linramp 135 SS NS A lt gt lt gt class tanh_ramp public double A t0 delta base void init TextHashTable thash double eval double Je void tanh_ramp init TextHashTable thash Y int ierr TGETOPTDEF_ND thash double base 0 TGETOPTDEF_ND thash double A 1 TGETOPTDEF_ND thash double t0 0 TGETOPTDEF_ND thash double delta 1 assert delta gt 0 double tanh_ramp eval double t if t lt t0 return base else return base A tanh t t0 delta DEFINE_EXTENDED_AMPLITUDE_FUNCTION tanh_ramp 14 8 6 Time like problems The mechanism of passing time to boundary conditions and elemsets may be used to pass any other kind of data since it is passed as a generic pointer void time data This may be used to treat other problems where an external parameter exists for instance Continuation problems For instance a NS solution at high Reynol
147. ndary conditions Temporal dependent boundary conditions are treated in PETSc FEM assuming that the boundary condition on a set of nodes J j1 j2 j3 jn is of the form bj t a t pja t a26 t 135 bjy t an G t where the az are spatial amplitudes and the function t is the temporal part of the dependency For instance consider the solution of a problem of heat conduction coupledt with diffusion of a component in a rectangular region as depicted in figure 21 The boundary condition on side AD is of the form T x t 100 C 4a L x L sin 5t 136 where L is the distance AD The nodes on side AD are 1 7 13 19 25 31 37 and we assume that concentration and temperature are fields 1 and 3 respectively Here we can take aj 4xj L x L for j in J p t 100 C sin 5t 137 Boundary conditions depending on time as in this example are entered in PETSc FEM with a fixa_amplitude section The general form is fixa_amplitude lt function gt lt paraml gt lt vall gt lt param2 gt lt val2 gt 125 Figure 21 Example with time dependent boundary conditions __END_HASH__ lt nodel gt lt dofl gt lt vall gt lt node2 gt lt dof2 gt lt val2 gt lt nodeN gt lt dofN gt lt valN gt __END_FIXA__ For instance for the example above it may look like this fixa_amplitude sine omega 5 amplitude 1 __END_HASH O 55556 88889 00000 88889 5555
148. ne Gauss Point data found in file nssupg cpp e int LES default 0 Add LES for this particular elemset found in file nssupg cpp e int npg default none Number of Gauss points found in file nssupg cpp 71 7 3 Mesh movement Sometimes one has a mesh connectivity and node coordinates and wants a mesh for a slightly modified domain If u is the displacement of the boundaries then we can pose the problem of finding a mesh topologically equal to the original one but with a slight displacement of the nodes so that the boundaries are in the new position This is termed mesh relocation One way to do this is to solve an elasticity problem where the displacements at the boundaries are the prescribed displacement of the boundary This problem has been included by convenience in the Navier Stokes module even if at this stage it has little to do with the Navier Stokes eqs Once the displacement is computed by a standard finite element computation we can compute the new mesh node coordinates by adding simply the computed displacement to the original node coordinate The elastic constants can be chosen arbitrarily If an isotropic material is considered then the unique relevant non dimensional parameter is the Poisson ratio controlling the incompressibility of the fictituos material However if the distortion is too large this linear simple strategy can break down when some element collapses and has a negative Jacobian A simple
149. neral advective linear systems and the Laplace Poisson equation come bundled with it whereas the library is an abstract interface that allows people to write other applications So that we distinguish between the user for which the interaction with PETSc FEM is limited to writing data files for the bundled applications from the application writers that is people that uses the library to develop new applications Usually application writers write a main routine that use routine calls to the PETSc FEM library in order to assemble PETSc vectors and matrices and perform algebraic operations among them via calls to PETSc routines In addition they also have to code element routines that compute vectors and matrices at the element level PETSc FEM is the code layer that is in charge of assembling the individual contributions in the global vectors or matrices taking into account fixations etc Finally there is the PETSc FEM programmers that is people that write code for the core library written by the user data_file Application Code main l l l N oe S 28 PETSc FEM library cS gs Q i Element routines program output Figure 1 Typical structure of a PETSc FEM application 3 2 The elemset concept PETSc FEM is written in the C language and sticks to the Object Oriented Program ming OOP philosophy In OOP data is stored in objects and the programmer access the
150. ng points Consider for instance the following Hello world hook that prints the message at the corresponding points in the program tinclude lt src hook h gt tinclude lt src dlhook h gt class hello_world_hook 4 public void init Mesh amp mesh_a Dofmap amp dofmap TextHashTableFilter options const char name printf Hello world 1I m in the init hook n void time_step_pre double time int step printf Hello world I m in the time_step_pre hook n void time_step_post double time int step const vector lt double gt amp gather_values printf Hello world I m in the time_step_post hook n void close printf Hello world I m in the close hook n DL_GENERIC_HOOK hello_world_hook You can use almost any conceivable C C library within your hooks Take into account that the program may be called in a parallel environment so for instance if you will compress a certain file then you should take care of doing that only at the master process by e g enclosing the code with aif MY_RANK construct 10 3 Shell hook The shell hook allows the user to execute a certain action at the hook launching points by simply writing shell commands hook_list shell_hook lt name gt lt name gt lt shell command gt For instance hook_list shell_hook hello hello echo Hello world In this case PETSc FEM will issue the echo command at each of the launching points
151. o nent Figure 18 Symmetrical arrangement of foils with specular symmetry Periodic boundary conditions These kind of b c s are useful when treating geometries with repetitive components as is very common for rotating machinery In some 118 Figure 19 Non symmetrical arrangement of foils cases this can be done as slip boundary conditions see figure 18 Here the foils are symmetric about their centerline and then the whole geometry not only posses symmetry of rotation about angles multiple of 27 Nfoiis but in addition possess re flection symmetry about the centerline of a foil like BC and the centerline between two consecutive foils like AB In consequence it suffices the domain ADEC with inlet outlet b c s in DE and AC slip boundary conditions on those parts of AD and EFGHC On the foil boundary FGH pressure should be free while on AD EF and HC we can impose 0p 0n 0 For Navier Stokes we may impose solid wall b c s on the foil normal null velocity component on AD EF and HC null tangential shear stress 9u 0n 0 and Op On 0 However if the foils are non symmetric or they are not disposed symmetrically about a line passing by the rotation axis see figure 19 then there are no predefined stream lines but given two corresponding point like j and j that are obtained through rotation of a 27 Nfoils then we can impose Us COS Q Uj SIN Q Vj vj sinau Cos QU 126 Pj Pj
152. o fix ideas for the Navier Stokes module and the Advective Diffusive module the standard hooks are e init To be executed once at the start of the program 96 e time step pre To be executed before the time step calculation e time step post To be executed after the time step calculation e close To be executed once at the end of the program There are built in hooks included in the modules for instance the DX hook that is in charge of communicating with the DX visualizations program or the shell hook that allows you to execute shell commands but you can also define your own hooks that are defined in a C piece of code compiled and dynamically loaded at run time You can do almost anything with your hooks for instance you can compress the result files perform file manipulation launch visualization with other software like GMV Also hooks are useful for communicating between different instances of PETSc FEM For instance if you want to couple a inviscid external flow with an internal viscous flow then you can run a PETSc FEM instance for each region and perform the communication between the different regions with hooks The DX hook is explained in the DX section see 813 and we will explain here how to write and use dynamically loaded hooks and the shell hook The hook concept has been borrowed from GNU Emacs 10 1 Launching hooks The hook list In order to activate a hook you first have to add a for each hook a pair
153. obians 76 The modified expression for the projectors is I Cp SIS 77 6 12 8 Absorbing boundary conditions available At the moment of writing this we have three possible combinations of boundary conditions Using extrapolation from the interior These is the elemset lt system gt abso The number of nodes per element ne must be not lower than 4 The first ne 2 nodes are used for a second order extrapolation of the outgoing wave The ne 1 th node is the node with Lagrange multipliers and the ne th node is used to set the reference value For instance for an absorbing elment of ne 5 nodes we would have 3 internal nodes and the data would loke like this see figure 5 P n lagrange multipliers ome n reference state outgoing wave Figure 5 Absorbing element elemset gasflow_abso 5 normal lt nx gt lt ny gt __END_HASH__ lt nl gt lt n2 gt lt n3 gt lt n4 gt lt n5 gt __END_ELEMSET__ end_elemsets 57 fixa lt n4 gt 1 lt rho_ref gt lt n4 gt 2 lt u_ref gt lt n4 gt 3 lt v_ref gt lt n4 gt 4 lt p_ref gt __END_FIXA__ Each element has 5 nodes first three are real nodes i e not fictitious numbered from the boundary to the interior Fourth node is reserved for Lagrange multipliers and the fifth node is set to the reference value The normal option is used to define the normal to the boundary It can be set as a constant vector per elemset usually f
154. odes are stored at icone 3 j k for O lt k lt 2 We are required to compute the maximum and minimum value of the area of the triangles This is a computation which is similar to those found in FEM analysis For each element we have to load the node coordinates in local vectors X1 xg and x3 compute the vectors along the sides of the elements a x2 xX and b x3 xy The area of the element is then the determinant of the 2 x 2 matrix J formed by putting a and b as rows The FastMat2 code for the computations goes like this FastMat2 CacheCtx2 ctx FastMat2 CacheCtx2 Branch bi FastMat2 x amp ctx 2 3 2 a amp ctx 1 2 b amp ctx 1 2 J amp ctx 2 2 2 chrono start 77 for int ie 0 ie lt nelem ie ctx jump b1 for int k 1 k lt 3 k 4 int node icone e ie k 1 x ir 1 k set amp xnod e node 1 0 rsQ rsQ set x ir 1 2 rest x ir 1 1 P p h Y o set x ir 1 3 b rest x ir 1 1 J ir 1 1 set a ir 1 2 set b qa double area J rs det 2 total_area area if ie 0 minarea area maxarea area if area gt maxarea maxarea area if area lt minarea minarea area printf total_area g min area g max area jg ratio g n total_area minarea maxarea maxarea minarea printf Total area OK s n fabs total_area 1 lt 1e 8 YES NOT double cpu chrono elapsed Calls to the FastMat2 CacheCtx2 ctx object are related wi
155. of values with global option ngather Then for each gatherer elemset the user must set the options that select a continuous range in this vector namely gather_pos and gather_length The selected range is gather_pos gather_post gather_length Then for instance if the hypothetical gatherer elemset momentum_integrator is sup posed to compute the integral of the momentum a 3 vector in 3D then we could use as global_options ngather 3 __END_HASH__ elemset momentum_integrator 4 gather_pos 0 gather_length 3 data connectivity dat __END_HASH 102 With this setup the program will compute for each time step the integral of the momen tum at will print these three values on standard output If a string is passed to the global option gather_file for instance ngather 3 gather_file momentum out then instead of reporting the gathered values on standard output they are printed on the corresponding file The file is opened and closed at each time step 11 2 Embedded gatherers 12 embedded gatherer element Figure 12 Embedded gatherer element If the elemset has a dimension lower than the embedding space for instance a surface embedded in 3D space then computing the gradients of the variables on the surface can not be done with the information on the surface only This is not an uncommon situation for instance it happens when computing viscous forces of a Newtonian fluid on the skin of a solid body The gradient of the ve
156. of which represents a geven load per field found in file 1inhff cpp 109 e int ndimel default ndim 1 The dimension of the element found in file genload cpp 12 2 Functional extensions of the elemset The generic elemset is GenLoad There is an instantiation for the Generic load elemsets can be extended by the user by deriving the base class GenLoad and the most impor tant task is to implement the methods q u flux jac for the single layer case and q u_in u_out flux_in flux_out jac in the double layer case 12 3 The flow reversal elemset This instantiation of GenLoad is useful for avoiding instabilities caused by inversion of the flow at an outlet boundary Assume that the Navier Stokes module computes a velocity field v and some scalar is being advected with this velocity field On an outlet boundary one usually leaves the temperature free that is no Dirichlet condition is imposed on But if the flow is reversed that is v lt 0 is the external normal to the boundary at some instant then this boundary condition becomes ill posed and the simulation may diverge One solution is to switch from Neumann to Dirichlet boundary conditions depending on the sign of v i e ifv n gt a a Se 118 ho ifv n gt 0 where A is a film coefficient and phi is the value to be imposed at the boundary when the flow is reversed Note that this amounts to switch to a Dirichlet condition by penalization For h large enough
157. ol_map of the form map lt int col_t gt col_map 14 7 Block matrices When solving a linear 123 for U itis a key point to take into account that most part of Q are independent blocks so that the inversion may be done with a minimum computational effort We say that a given row and column indices i j we say that they are directly connected if the element Qj is non null We can then define that two row indices i 7 are indirectly connected if they are connected by a path i gt j gt i1 gt j2 gt gt jn gt 1 where the arrow means directly connected The definition applies also to pairs of column indices and row column column row pairs The indirect connection defines an equivalence class that splits the rows in disjoint sets 14 7 1 Example Consider matrix eee sel colli so so a s a o EX DE 3 xo Ro 4 Q 5 1 k ok Kone 4 133 10 2 ok 7 where the asterisks means a non null element Starting with row index 1 we see that its corresponding connecting set is rows 1 3 10 and columns 8 9 So that the linear transfor 124 mation y Qx can be represent as ya yo 0 is Q Pe Y10 e 134 si ya Q 219 y Q ij 5 6 pes So that Q decomposes in three blocks of 3 x 2 1x 1 and 2x3 two void rows 2 9 and two row columns 2 3 One basic operation of class idmap is to compute the row connected to a given column index 14 8 Temporal dependent bou
158. omposed of three Array objects called positions node coordinates connections element connectivities and data computed field values At each time step ExtProgImport exports two Group objects e a Group of Arrays named output_array_list and e a Group of Fields objects named output_fields_list This objects are generated as follows e For each Nodedata PETSc FEM object a positions array is constructed this results in say nn array objects Currently PETSc FEM has only one Nodedata object member name nodes so that NN 1 e A data array is constructed in basis to the current state vector member name data 112 e For each elemset a connections array is constructed You can disable construction of some particular elemsets through the dx elemset option and also which nodes and DX interpolation geometry is used This results in other ne connection arrays The member name of each array is based in the name option of the elemset If this has not been set the name is set to the type of the elemset If collision is produced a suffix of the form _0 _1 is appended in order to make it unique Options controlling how the connection array is constructed can be consulted in 4 4 2 The resulting nn ne 1 arrays are grouped in a Group object and sent through the output_array_list output tab You can extract the individual components with the Select DX module and build field objects Also a set of field objects is created automati cally and
159. on with DX Data Explorer http www opendx org is a system of tools and user interfaces for visualizing scientific data Originally an IBM commercial product has been released now under the IBM Open Source License and maintained by a group of volunteers see the URL above Besides their impressive visualization capabilities DX has many features that make it an ideal visualization tool for PETSc FEM e DX is Open Source with a License very close to the GPL not completely compatible though e It has a Visual Program Editor which makes it very configurable for different modules e It has been linked to PETSc FEM through sockets which makes it possible to visu alize a running job even in background e It has a scripting language If you want to visualize your results with DX you have first to download it from the URL above and install it Then you can pass your results to DX simply by editing the needed dx files or well by using the ExtProgImport DX module In order to use this last option you have to e Compile PETSc FEM with the USE_SSL flag enabled disabled by default Also if you want to DX be able to communicate asynchronously with PETSc FEM you have to compile with the USE_PTHREADS flag enabled disabled by default e Load the dx_hook hook in PETSc FEM and pass it some options e Build the dynamically loadable module file dx epimport and load it in DX DX basic visualization units are Field objects which are c
160. ong to processor 1 The nodes in the shadowed strip belong to processor 0 and are connected to nodes in processor 1 but they are not marked as interface since they belong to the lowest numbered processor Note that this strategy leads to an interface set of 4 nodes whereas the simpler strategy mentioned first would lead to an interface set of 4 i e including the nodes in the shadowed strip which is two times larger Element in processor 0 Element in processor 1 O dof in processor 0 S dof in processor 1 YW dof s in processor 0 connected to dof s in processor 1 Figure 28 Non local element contribution due to bad partitioning e The IISDMat matrix object contains three MPI PETSc matrices for the Ar Arz 142 and A blocks and a sequential PETSc matrix in each processor for the local part of the Azz block The Azz block must be defined as sequential because otherwise we couldn t factorize it with the LU solver of PETSc However this constrains that MatSetValues has to be called in each processor for the matrix that belongs to its block i e elements in a given processor shouldn t contribute to Arz elements in other processors Normally this is so but for some reasons this condition may be violated One is periodic boundary conditions and constraints in general
161. ook_list shell_hook hello hello my_script_hook and bin bash File my_script_hook if 1 init then echo in init Do more things in init stage Ho 99 elif 1 pre then echo in pre Do more things in pre stage Ho elif 1 post then echo in post Do more things in post stage Ho elif 1 close then echo in close Do more things in close stage Ho else Catch all Should not enter here echo Don t know how to handle stage 1 fi At the init and close hook launching points the step number passed is 1 and 2 respec tively so that you can detect whether you are in a pre post stage or init close by checking this too The time passed is in both cases 0 10 4 Shell hooks with make If no command is given i e if you write hook_list shell_hook hello but don t add the hello lt command gt line then PETSc FEM uses a standard command line like this make petscfem_step 2 d petscfem_time 3 f hello_ 1 s so that it will execute make commands with targets hello_init hello_pre hello_post and hello_close like make petscfem_step 1 petscfem_time 0 hello_init make petscfem_step 1 petscfem_time 0 1 hello_pre make petscfem_step 1 petscfem_time 0 1 hello_post make petscfem_step 2 petscfem_time 0 2 hello_pre make petscfem_step 2 petscfem_time 0 2 hello_post make petscfem_step 3 petscfem_time 0 3 hello_pre mak
162. operties The difference is that for each boundary node and at each time step the b c may switch from absorbing to wall i e v 0orv n 0 This is activated by implmenting the virtual function 1int AdvectiveAbsowWall 2turn wall fun int elem int node FastMat2 amp x double t At run time this function is called for each boundary element and if the function returns a true value 1 then the wall boundary condition is enforced through Lagrange multipliers or penalization currently only with Lagrange multipliers If the result returned is a false value 0 then the boundary condition is absorbing The arguments passed to the routine can used by the user to compute the desired value elem the element index 1 base node the element index 1 base x the coordinates of the boundary node an ndim vector t the current time 60 For instance if the computational domain is the 0 lt x y lt 1 square and the outlet boundary is the segment x 1 0 lt y lt 1 then the following code makes the upper half of the boundary to be wall and the half lower to be absorbing 1int AdvectiveAbsowWall 2 turn_wall_fun int elem int node 3 FastMat2 amp x double t 4 return x get 2 gt 0 5 5 See example petscfem cases gasflow dyna abso wal13 The node and elem indices may be used to perform more elaborate tasks as loading tables from files or seek per element properties 6 12 11 Related Options int activate_
163. options are passed from the user data file to the program in the form of small databases which are called text hash tables This consist of several lines each one starting with a keyword and followed by one or several values When reading this data in the read_mesh routine the program doesn t know anything about neither the keywords nor the appropriate values for these keywords Just when the element is processed by the element module via the assemble function call for instance assembling residuals or matrices the element routine written by the application writer reads values from this database and apply it to the elemset The text hash table is stored internally as a correspondence between the keys and the values both in the form of strings The key is the first space delimited token and the value is the rest of the string from the end of the space separator to the end of the line Usually the values are strings like a C identifier chunk_size for instance As the values are stored in the form of strings almost any kind of values may be stored their for instance non_linear_method Newton string value max_iter 10 integer value viscosity 1 2e 4 double value or lists and combinations of them It s up to the application writer to decode this values at the element routine level Several routines in the getprop package help in getting this see 85 4 Some of this options are used for internal use of the PETSc FEM code
164. or 24 intrinsic time 36 inviscid 34 jacobian_factor 70 Krylov_dim 26 KSP_method 26 ldfilename 61 162 LES 65 70 71 local_solver 24 local_store 22 local_time_step 40 lumped_mass 41 masks 143 MatSetValue 143 MatSetValues 143 max_partgraph_vertices 22 max_partgraph_vertices_proc 24 maxits 26 40 measure_performance 40 65 ncore 22 ndim 59 61 66 110 ndimel 109 nfile 40 66 ngather 66 nnwt 66 Non linear Dirichlet boundary conditions 120 npg 70 71 nrec 40 66 nsave 40 66 nsaverot 40 66 nsome 67 nstep 40 67 nstep_cpu_stat 40 o_nnz 138 octree 64 partitioning 141 partitioning method 23 pe_lu_fill 24 per element properties table 32 periodic boundary conditions 118 Perl 17 permutation matrices 116 perturbation function 36 PETSc FEM pplication writers 13 rogrammers 13 sers 13 PFMat 138 preco_side 26 preco_type 26 precoflag 60 Preprocessing 15 preprocessing nternal 16 xternal 17 pressure_control_coef 70 print_dofmap_id 23 print_fsm_transition_info 26 print_hostnames 23 print_interface_full_preco_conv 24 print_internal_loop_conv 26 40 print_linear_system_and_stop 40 67 print_local_chunk_size 23 print_partitioning_statistics 23 print_residual 67 print_Schur_matrix 25 print_some_file 67 print_van_Driest 70 profile determination 137 radius 45 RENORM_f lag 67 report_assembly_time 23 report_consumed_time 23 144
165. or depl extensions to this mode add this to your emacs file setq auto mode alist cons C C Id Jepl shell script mode auto mode alist 4 5 1 Installing PETSc FEM mode In order to use the mode you have to copy the file tools petscfem el to somewhere accessible for Emacs usr share emacs site lisp is a good candidate There is also a file tools petscfem init el that contains some basic configuration you can also copy it or insert directly its contents into your emacs 35 Load basic PETSc FEM mode load file path to petscfem tools petscfem el Load petscfem init or insert directly its contents 33 below and configure load file path to petscfem tools petscfem init el 4 5 2 Copying and pasting PETSc FEM options PETSc FEM modules have a lot of options and is mandatory to have fast access to all of them and to their documentation The HTML documentation has a list of all of them at the end of the user s guide For easier access there is also an info file doc options info that has a page for each of the options You can browse it with the standalone GNU Info program or within Emacs with the info commands In the last case you have the additional advantage that you can very easily find the documentation for a given option with a couple of keystrokes and paste options from the manual into your PETSc FEM data file For jumping to the documentation for an option put the cursor on the option and press C
166. or loads in data_pts only the coordinates of the elements that be long to him A possible solution is after the loop to exchange the information among the processors but the simplest solution id to simply bypass the element selection with compute_this_elem with a call like for int k el_start k lt el_last k 64 if build_nneighbor_tree comp_shear_vel compute_this_elem k this myrank iter_mode continue That means that for jobinfo build_nneighbor_tree and comp_shear_vel the nor mal element selection is bypassed 7 2 Options General options int A_van_Driest default 0 If A van Driest 0 then the van Driest damping factor is not used found in file ns cpp int activate_debug default 0 Activate debugging found in file ns cpp int activate_debug_memory_usage default 0 Activate report of memory usage found in file ns cpp int activate_debug_print default 0 Activate printing in debugging found in file ns cpp double alpha default 1 Trapezoidal method parameter alpha 1 Backward Euler alpha 0 Forward Euler alpha 0 5 Crank Nicholson found in file ns cpp double C_volume default 0 Coefficient for volume control found in file ns cpp double delta_time default 0 Time step for volume control found in file ns cpp double displ_factor default 0 1 Scales displacement for ALE like mesh relocation found in file ns cpp double Dt default 0 The time step f
167. or plane boundaries as in the example above or as a per element value In this last case we would have something like this elemset gasflow_abso 5 props normal 2 normal lt nx gt lt ny gt __END_HASH__ lt nl gt lt n2 gt lt n3 gt lt n4 gt lt nd gt lt nx gt lt ny gt __END_ELEMSET__ The normal need not be entered with a high precision If the vector entered is not exactly normal to the boundary then the condition will be absorbing for waves whose group velocity vector is parallel with this vector Note the the fixa section for the values of the reference node In this case gas dynamics elemset gasflow we set the four fields ngim 2 to the reference values The reference values can be made time dependent in an explicit way by using a fixa_amplitude section instead of a fixa section Using this absorbing boundary condition requires the flow to have implemented the Riemman_Inv method If this is not the case then the program will stop with a message like For the gasflow elemset If the option linear_abso is set to false 0 then the Riemman invariants for gas dynamics are used and the state reference value is used for computing the reference Riemman invariants If linear_abso is set to true 1 then the linear absorbing boundary conditions are imposed Not using extrapolation from the interior These is the elemset lt system gt _abso2 The number of nodes per element ne is 3 The first node is the node at th
168. orresponding to incoming waves are null and vice versa for the outgoing waves we can add both blocks of equations to add a single set of Ndof equations m TIP Vet Ty Vt Y ep VE 0 62 p 0 and coming back to the U basis Ug Up Wg Ug Y ep Up 0 63 p 0 The modified version of the FEM system of equations 59 that incoroporates the absorbing boundary conditions is then Ug Uo Wg UGt Y ep Ur 0 p 0 F Ugt U we ne 64 FU UA UR REM 54 6 12 6 Avoiding extrapolation For linear systems equation 59 is of the form n 1 n n 1 n a Ea O uel E ue n 65 k K k 1 k 1 A 0 k gt 1 At 2h We have made a lot of simplifications here no source or upwind terms and a simple discretization based on centered finite differences Alternatively it can be thought as a pure Galerkin FEM discretization with mass lumping In the base of the characteristic variables V this could be written as VE VE VE Via A A 7 Q k 7 Vk k 1 k 1 A 0 k 1 At i h 7 For the linear absorbing boundary conditions 49 we should impose IT Vier Vo Vref 0 67 while discarding the equations corresponding to the incoming waves in the first rows of 66 Here Urer Vrer is the state about which we make the linearization This can be done via Lagrange multipliers in the following way TF Ver Vo Vref IT Vref Vim 0 vat VE VIV F pN vr vn
169. ou under this License will not have their licenses terminated so long as such parties remain in full compliance 5 You are not required to accept this License since you have not signed it However nothing else grants you permission to modify or distribute the Program or its deriva tive works These actions are prohibited by law if you do not accept this License Therefore by modifying or distributing the Program or any work based on the Pro gram you indicate your acceptance of this License to do so and all its terms and conditions for copying distributing or modifying the Program or works based on it 6 Each time you redistribute the Program or any work based on the Program the recipient automatically receives a license from the original licensor to copy distribute or modify the Program subject to these terms and conditions You may not impose any further restrictions on the recipients exercise of the rights granted herein You are not responsible for enforcing compliance by third parties to this License 7 If as a consequence of a court judgment or allegation of patent infringement or for any other reason not limited to patent issues conditions are imposed on you whether by court order agreement or otherwise that contradict the conditions of this License they do not excuse you from the conditions of this License If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pe
170. ound in file ns cpp int fractional_step default 0 Use fractional step or TET algorithm found in file ns cpp string fractional_step_solver_combo default iisd Solver combination for the fractional step method May be iisd lu global_gmres found in file ns cpp int fractional_step_use_petsc_symm default 1 Fractional step uses symmetric matrices only CG iterative KSP found in file ns cpp string gather_file default gather out Print values in this file found in file ns cpp int LES default 0 Use the LES Smagorinsky turbulence model found in file ns cpp 65 e int measure_performance default 0 Measure performance of the comp_mat_res jobinfo found in file ns cpp e int ndim default 3 Dimension of the problem found in file ns cpp e vector lt double gt newton_relaxation_factor default none Relaxation parameter for Newton iteration Several values may be entered in the form newton_relaxation_factor wi n1 w2 n2 wn that means Take relaxation factor w1 for the first n1 steps w2 for the following n2 steps and so on until w_ n 1 wn is taken for all subsequent steps Normally one takes a conservative said 0 5 relaxation factor for the first steps and then let full Newton i e w 1 for the rest For instance the line newton_relaxation_factor 0 5 3 1 means take w 0 5 for the first 3 steps and then use w 1 found in file ns cpp e int nfile default 1 S
171. part default 1 Number of subpartitions inside each processor found in file iisdcr cpp int iisd_subpart_auto default 0 Choose automatically the number of subdomains so as to have approximately this number of unknowns per subdomain found in file iisdcr cpp int iisdmat_print_statistics default 0 Print dof statistics number of dofs local and interface in each processor found in file iisdcr cpp double interface_full_preco_fi11 default 1 The ILU fill to be used for the A_II problem if the ILU preconditioning is used found in file iisdcr cpp int interface_full_preco_maxits default 5 Number of iters in solving the preconditioning for the interface problem when using use_interface_full_preco found in file iisdcr cpp string interface_full_preco_pc default jacobi Defines the preconditioning to be used for the solution of the diagonal interface problem not the Schur problem found in file iisdcr cpp double interface_full_preco_relax_factor default 1 The problem on the interface is solved with Richardson method with few iterations normally 5 Richardon iteration may not converge in some cases and then we can help convergence using a relaxation factor 1 found in file iisdcr cpp string local_solver default PETSc Chooses the local solver may be PETSc or SuperLU found in file iisdcr cpp int max_partgraph_vertices_proc default INF The maximum number of vertices in the coars
172. pe edge as unordered sequences of two node indices while the type ordered edge is associated with ordered sequences of two nodes We say that the set of permutations that leave invariant edge objects is 0 1 1 0 whereas for the oriented edge is only the identity permutation 0 1 For larger objects the set of permutations that leave invariant the node sequence that defines the object is more complex than that it doesn t reduce to the special case of ordered and unordered 147 sequences as for edges Consider for instance the case of triangles For an oriented triangle the set of nodes that leave invariant the triangle is 0 1 2 1 2 0 2 0 1 ie shifts clockwise and counter clockwise of the node sequence whereas for unordered objects the sequences are the same 0 1 2 120 201 0 2 1 1 0 2 2 1 0 In the last case the permutations that leave invariant the object are the whole set of 6 permutations for 3 objects so that in this case we can say that unordered triangles can be represented as unordered sets of three nodes But for the unordered triangle edge a b c isthe same that b c a so that associating unordered triangles with unordered sequences does not take into account the rotational symmetry 17 3 1 Symmetry group generator The set of permutations perm SHAPE that leave the geometrical object SHAPE invariant are a group that means that if two permutations p q belong to perm SHAPE then the composit
173. portion of it either verbatim or with modifications and or translated into another language Hereinafter translation is included without limitation in the term modification Each licensee is addressed as you Activities other than copying distribution and modification are not covered by this License they are outside its scope The act of running the Program is not restricted and the output from the Program is covered only if its contents constitute a work based on the Program independent of having been made by running the Program Whether that is true depends on what the Program does 1 You may copy and distribute verbatim copies of the Program s source code as you receive it in any medium provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty keep intact all the notices that refer to this License and to the absence of any warranty and give any other recipients of the Program a copy of this License along with the Program You may charge a fee for the physical act of transferring a copy and you may at your option offer warranty protection in exchange for a fee 2 You may modify your copy or copies of the Program or any portion of it thus form ing a work based on the Program and copy and distribute such modifications or work under the terms of Section 1 above provided that you also meet all of these conditions a You must cause the mod
174. r the continuity equation Galerkin term in the Navier Stokes equations 17 The DistMap class This class is an STL map lt Key Val gt template container where each process can insert values and at a certain point a scatter call sends contributions to the corresponding processor 144 17 1 Abstract interface To insert or access values one uses the standard insert member or the operator of the basic class The processor iterator k member to be defined by the user of the template should return the process rank to which the pair key val pointed by k should belong After calling the scatter member by all processes all entries are sent to their processors In order to send the data across processors the user has to define the size_of_pack and pack routines The pack and unpack functions in the BufferPack namespace can help to do that for basic types i e those for which the sizeof operator and the libc memcpy routine work Finally the combine member defines how new entries that have been sent from other processors have to be merged in the local object 17 2 Implementation details When calling the scatter member each entry in the basic container is scanned and if it doesn t belong to the processor it is buffered using the pack member to a buffer to be sent to the other processor Once all the data to be sent is buffered the entries are scanned again and the entries that have been buffered are deleted The b
175. report_consumed_time_stat 23 report_option_access 67 report_total_liquid_volume 67 residual_factor 70 reuse_mat 67 rho 70 71 roughness 45 rtol 26 41 save_file 41 67 save_file_pattern 41 67 save_file_some 68 save_file_some_append 68 Schur matrix 140 shallow water equations 34 shape 45 shear velocity 64 shell hook 98 shock capturing 39 shock_capturing 41 shock_capturing_factor 70 163 shock_capturing_threshold 41 42 solve_system 68 solver 68 solver_mom 68 stabilizing term 36 start_comp_time 68 start_time 41 stdout_file 68 steady 68 STL 144 stop_mom 68 stop_on_step 68 stop_poi 68 stop_prj 68 Streamline Upwind Petrov Galerkin 36 SUPG 36 switch_to_ref_on_incoming 60 61 tau_fac 41 42 70 tau_pspg_fac 70 tau_supg_fac 70 temporal_stability_factor 70 text hash tables 28 thermal_convection 110 tol_mass 41 tol_newton 68 twf_class_name 61 update_jacobian_iters 68 update_jacobian_start_iters 68 update_jacobian_start_steps 69 update_jacobian_steps 69 update_wall_data 69 uploading 143 use_compact_profile 26 use_iisd 69 use_interface_full_preco 25 use_interface_full_preco_nlay 25 use_old_state_as_ref 60 61 user input data file 14 van Driest 63 vel_indx 61 110 verify_jacobian_with_numerical_one 69 volume_control_flag 69 wall element 64 wall_angle 45 weak_form 41 70 weighted residual 36 width 45 width_bottom 45
176. represents the duration of the change from the starting value to the end value During the interval from ts T to ts 7 i e a total duration of 27 goes from o 0 12A to po 0 88A start_value o default 0 The limit valu for t gt oo end_value default 1 The limit valu for t 00 127 Figure 24 Sine function sin Trigonometric sine function olt do A sin wt p 141 The parameters are mean_val o default 0 amplitude A default 1 omega w default none frequency w 27 default none period T 271 w default none phase y default 0 Only one of omega frequency or period must be specified cos Trigonometric cosine function g t do A cos wt p 142 The parameters are the same as for sin see 14 8 1 piecewise Piecewise linear function This defines a piecewise linear interpolation for a given series of time instants t ordered such that tj lt t 41 and amplitudes 4 128 1 lt j lt n entered by the user The interpolation is 0 if t lt to ort gt tn w i 143 pr t EEE tp ifte lt t lt tey 143 If final_time is defined then th function is extended periodically with period tn t1 The parameters are e npoints integer The number n of points to be entered t n doubles The instants t f n doubles The function values final_time double The function is extended
177. rinting from the slave nodes In MPI in general it is not guaranteed that printing from the nodes is possible and in the MPICH implementation output from the nodes get all mixed and scrambled PETSc provides a functions in order to facilitate this task The user can call PetscSynchronizedPrintf as many times as he wants en each nodes The output is concatenated in each node to a buffer and then a collective call to PetscSynchronizedFlush flushes all the buffers in order to the standard output There is a similar function for files PetscSynchronizedFPrintf but it turns out that the flushing of standard output and files is mixed In addition even in the case of writing only to standard output the output is not sorted properly The objective of the SyncBuffer lt T gt template class and the KeyedOutputBuffer class is to have a synchronized output device that sorts the lines written by the nodes The idea is that one defines a class say KeyedObject that must support the following member functions class KeyedObject 4 public Default constructor KeyedObject Copy Ctor KeyedObject const KeyedObject amp ko Dtor KeyedObject Used for sorting the lines friend int operator lt const KeyedObject amp left const KeyedObject amp right Call back for the distributed container Return size of the buffer needed to store this element int size_of_pack const Effectively packs the object into the buffer
178. rix gt A_jac Matrix amp A_grad_U Matrix amp grad_U Matrix amp G_source Matrix amp tau_supg double amp delta_sc double amp lam_max TextHashTable thash double propel void user_data int options This may eventually change in any case if you are interested in adding a new advective system then see the actual description in the advective h file in the distribution The meaning of these arguments are listed below When the size is specified it means that the argument is a Newmat or FastMat matrix In some situations the flux function routine must compute only some of the required values For instance when computing the contribution of the absorbing boundary ele ments there is no need to compute the parameters regarding stabilizing terms This is controlled with the parameter options which can take the values DEFAULT COMP_UPWIND and COMP_SOURCE In the list below it is indicated under which conditions the specific quantity must be computed e const RowVector amp U input size ngo This is the state vector you must return the flux jacobians and other quantities for this state vector e int ndim input The dimension of the space e const Matrix amp iJaco input size ngim X Ndim The jacobian of the master to element coordinates in the actual gauss points This may be used in order to calculate the characteristic size of the element e Matrix amp H input size 1 x nz In the shallow water the source term G d
179. rmutation like matrices i e require as storage order of 2 integers per unknown and constant time access by row and column 14 6 Implementation An arbitrary permutation P of N objects can be stored as two integer vectors ident N and iident N such that if P j k then ident j 1 k 131 iident k 1 j Pan We will consider a slight modification to this A row is void If all its elements are null normal If one element is one and the rest null like in a permutation matrix special otherwise Then we set 0 if row j is void ident j 1 4 k if row j is normal and the non null element is in position k 1 if row j is special 132 We need now how to access the coefficients of the special rows Rows are stored as a map in the STL language from integer values to doubles i e typedef map lt int double gt row_t and we keep in class idmap a map from row indices to pointers to rows of the form map lt int row_t gt row_map 123 So that for a given j we can get the corresponding row by checking the value of inode j 1 If the row is void or normal we return the corresponding row and if it is special then we look for the pointer in row_map and this returns the desired row Similar information is stored by columns but in this case it is not necessary to store the values for the special rows so that the columns are of the form typedef set lt int gt col_t and class idmap contains a private member c
180. rtinent obligations then as a consequence you may not distribute the Program at all For example if a patent license would not permit royalty free redistribution of the Program by all those who receive copies directly or indirectly through you then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program If any portion of this section is held invalid or unenforceable under any particular circumstance the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices Many people have made generous con tributions to the wide range of software distributed through that system in reliance on consistent application of that system it is up to the author donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice This section is intended to make thoroughly clear what is believed to be a conse quence of the rest of this License 8 If the distribution and or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces the original
181. s 1 Create the flux function in a file by itself in the applications advective directory The better is to start copying one of the existing advective sys tems for instance ffeuler cpp or ffshallw cpp The arguments to flux function routines is described in section 6 6 The name of the function has to be of the form flux_fun_ lt system gt where lt system gt identifies the new sys tem We assume that you write the flux function flux_fun_new_adv_sys in file applications advective ffnadvs cpp 2 Add the file in the MYOBJS variable in the Makefile for instance MYOBJS advective o adv o absorb o ffeuler o ffshallw o ffadvec o 3 Define the new derived classes volume_new_adv_sys and absorb_new_adv_sys by adding a line at the end of the file applications advective advective h as in the following example Add here declarations for further advective elemsets Euler equations for inviscid Gas dynamics eqs ADVECTIVE_ELEMSET euler Shallow water equations ADVECTIVE_ELEMSET shallow Advection of multiple scalar fields with a velocity field ADVECTIVE_ELEMSET advec My new advective system ADVECTIVE_ELEMSET new_adv_sys lt Add this line 37 4 Recompile 6 6 Flux function routine arguments Currently the interface is the following typedef int FluxFunction const RowVector amp U int ndim const Matrix amp iJaco Matrix amp H Matrix amp grad_H Matrix amp flux vector lt Mat
182. s J e E CL ao where A is the k th eigenvalue of jacobian A For the Euler equations it turns out to be that this corresponds to pressure waves propagating in the direction of the fluid and is c u where c is the speed of sound and u the absolute value of velocity For the shallow water equations its value is u gh where g is gravity acceleration and h the local water elevation with respect to bottom 36 6 4 Shock capturing For problems with strong shocks shock waves in Euler or hydraulic jumps in shallow water the standard SUPG stabilizing term may not be sufficient Then an additional stabilizing term is added so that the stabilized equations are now of the form MU F U G gt Psupa j E Ped t Ox o ole ON WU D fh 8 Ger aa Note that in contrast with the SUPG term the new so called shock capturing term is no more consistent ds is a scalar the so called shock capturing parameter Often when shock capturing is added we diminish the amount of stabilization in the SUPG term in order to compensate and not to have an over diffusive scheme We will not enter in the details of this computations refer to 1 for further details 15 6 5 Creating a new advective system New advective systems may be added to PETSc FEM only by defining their flux function jacobians and other quantities This means that you don t need to code details of the numerical discretization Follow these step
183. saverot default 100 Save state vector frequency with the rotary save mechanism see 7 2 found in file adv cpp int nstep default 10000 The number of time steps found in file adv cpp int nstep_cpu_stat default 10 Output CPU time statistics for frequency in time steps found in file adv cpp int print_internal_loop_conv default 0 Prints the convergence history when solving a consistent matrix found in file adv cpp 40 e int print_linear_system_and_stop default 0 After computing the linear system prints Jacobian and right hand side and stops found in file adv cpp e double rtol default 1e 3 Relative tolerance when solving a consistent matrix found in file adv cpp e string save_file default outvector out Filename for saving the state vector found in file adv cpp e string save_file_pattern default outvector d out The pattern to generate the file name to save in for the rotary save mechanism found in file adv cpp e double start_time default 0 Counts time from here found in file adv cpp e double tol_mass default 1e 3 Tolerance when solving with the mass matrix found in file adv cpp Generic elemset advecfm2 e double beta_supg default 0 8 Parameter to control the amount of SUPG perturbation added to the mass matrix to be consistent SUPG beta_supg 0 implies consistent Galerkin and beta_supg 1 implies full consistent SUPG found in file advecfm2
184. ser input data file 4 2 Internal preprocessing The FileStack class ALL ONAE casa he hee MAGEE SH RO ER ERR e i 422 Class inbernale occiso hae eb baw te ee eee ee ee 4 3 Preprocessing with ePerl e e 43 1 BasisotePerl 2 22245 44025 9 e da a wa ee oe ae ee als Boe o ok a gee bee ee AO ee ae doa TOE CSAS gt sca aoe ke mob a BOR ee Bg ee ee ee h a 4 3 4 Conditional processing 0 00002 eee eee dao File EAS a oa ee Re a ek eo ae Re ete Pes 430 Weeot ePerl m malls o o oomega eee Pk Ae ee Es Aa ePerini library o a ecs fk ke a A a ee a 4 3 8 Errors in ePerl processing oa eaa oe e e AA General options i so s sud aa e wosia g aa a a a a e 44 1 Read mesh options s s s soa sasae awa kaa dan daea aa das Elemset ophion so oead 25 La ee we a KO PR Ke eR raea 443 PFMat TISDMat class options lt eo o io reses 4 5 Emacs tools and tips for editing data files 4 5 1 Installing PETSc FEM mode aoaaa aaae 4 5 2 Copying and pasting PETSc FEM options aaaea aaa Text hash tables 5 1 The elemset hash table of properties aaoo e 5 2 Text hash table inclusion ios eos a praos aie a e aTa hu R aaa e a oe The global table 22 0 45 9 ee ew be bee ee Eee e 5 4 Reading strings directly from the hash table 5 5 Reading with get_int and get_double 5 6 Per element properties tab
185. strategy consists in a multilevel scheduling algorithm Assume first that Nproc is even and divide processors in subsets S 0 1 Nproc 2 1 S2 Nproc 2 Nproc 2 1 proc 1 We first exchange all information between processors in S with those in S2 in Nproc 2 stages In stage O see figure 31 processor 0 exchanges data with n 2 1 with nproc 2 1 and n 2 1 with nproc i e processor 0 calls exchange n 2 and processor n 2 calls exchange 0 the code of procedure exchange is shown in the procedure below In the stage 1 processor 0 exchanges with nproc 2 2 1 with Nproc 2 3 Nproc 2 2 with Nproc and Nproc 2 1 with Nproc 2 i e processor j exchanges with n 2 j 1 Mproc 2 In general in stage k processor j exchanges with processor n 2 j k nproc 2 Note that in each stage all communications are paired and can be performed simultaneously so that each stage can be performed in 2T sending plus receiving This nproc 2 stages take then a total of 2 Nproc 2 T NprocT secs Now all communication between processors in S and S2 has been performed it only remains to perform the communication between processors in S and processors in S2 But then we can apply this idea recursively and divide the processors in S in two subsets S11 and S12 with Nproc 4 each let s assume that the number of processors is a power of 2 Nproc 2 with a required time of Nproc 2 T Applying the idea recursi
186. t For instance TGETOPTDEF thash double Young_modulus 0 The set_pg_values is the main method Here the user computes the values to be integrated Its signature is void set_pg_values vector lt double gt amp pg_values FastMat2 amp u FastMat2 amp uold FastMat2 amp xpg FastMat2 amp n double wpgdet double time The argument pg_values are the values to be computed pg stands for Gauss point since usually this function is called by the gatherer class at the Gauss points of integration u and uold are the states at times t and t xpg are the coordinates of the Gauss point n is the normal to the surface only relevant if the dimension of the element ndimel is equal to ndim 1 wpgdet is the area of the Gauss point i e the Gauss point weight times the Jacobian of the transformation to the master element void element_hook int k is called before the Gauss point hook and then can be used in order to pre compute some stuff for all the Gauss points k is the element number e void clean can be used after processing all the elements and doing some cleanup 12 Generic load elemsets Generic load elemsets account for surface contributions which represent constant terms in the governing equations or either a function of the state at the surface Typical terms that can be represented in this way are e External heat loads like a constant radiation load in thermal problems q q e A linear Newtonian cooling term q
187. t also it is needed the whole distribution of forces In this case what is needed is a list of surface elements and the total force for each element There are two flavors of this feature The simplest one is activated with dump_props_to_file and simply stores the data in a file The other possibility is ac tivated with the pass_values_as_props option and stores the computed per element values in the per element properties table Subsequent parts of the code can grab this information using the usual mechanism to query the per element properties table All three mechanism can be activated or deactivated independently of the others In summary e Computation of global values is activated with gather_length gt 0 e Storing the per element is activated with the dump_props_to_file option e Passing the per element computed values to other sections of the code via the per element table is activated with the pass_values_as_props option In the last two cases the number of values to be computed by the gatherer can be set via the store_values_length option The summary of relevant options is e pass_values_as_props activates the mechanism of passing per element computed values as per element properties 106 e store_values_length is an integer indicating how many values are computed by the gatherer It defaults to gather_length However if the standard mechanism of the gather_values is deactivated then the number of computed values must be passed t
188. t of the ordering 151 In order to reduce the probability that unequal objects give the same hash sum we can device better hashing functions Two kind of hashing functions will be discussed functions that hashes a sequence of numbers hash sequence functions taking in account the ordering of the sequences and functions that hash the sequences in a way infependent of the ordering hash sum functions Consider first the hash sequence functions If we consider sequences of integers then the hash functions should return a scalar value for each sequence of integers in such a way that if two sequences are distinct i e they have distinct numbers or the same numbers in differente order they should have different hashing functions We restrict us to 32 bit integers unsigned integers are bounded by 2 and there are 2 distinct sequences of n integers so that it is impossible to have such un hashing function If for two distinct sequences we have the same hash values then we said thar there is a collision in the hashing process We then try to have a hashing functions that minimizes the number of collisions Suppose we consider the set of number sequences 32 bit integers of length n We have N 2 2 sequences and M 2 hash values and in the best case we would have N M 2 2 sequences for each hash value If we generate m distinct random sequences and m lt N then if the hashing funtion is good there is very
189. take away your freedom to share and change it By contrast the GNU General Public License is intended to guarantee your freedom to share and change free software to make sure the software is free for all its users This General Public License applies to most of the Free Software Foundation s software and to any other program whose authors commit to using it Some other Free Software Foundation software is covered by the GNU Library General Public License instead You can apply it to your programs too When we speak of free software we are referring to freedom not price Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software and charge for this service if you wish that you receive source code or can get it if you want it that you can change the software or use pieces of it in new free programs and that you know you can do these things To protect your rights we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights These restrictions translate to certain responsibilities for you if you distribute copies of the software or if you modify it For example if you distribute copies of such a program whether gratis or for a fee you must give the recipients all the rights that you have You must make sure that they too receive or can get the source code And you must show them these terms so they know their rights We protect
190. ted by the first entry would be overwritten by the second entry 14 8 5 Use of prefixes Several functions can be written in the same cpp file using a prefix ending in underscore for instance if you want to define a gaussian function then you define functions with lt prefix gt gaussian_ for instance extern C void gaussian_init_fun TextHashTable thash void amp fun_data extern C double gaussian_eval_fun double t void fun_data extern C void gaussian_clear_fun void fun_data this is optional The macros INIT_FUN1 name EVAL_FUN1 name and CLEAR_FUN1 name do it for you where name is the prefix with the underscore removed for instance lt headers gt struct MyGaussFunData lt your data here gt os INIT_FUN1 gaussian lt read data gt MyGaussFunData d new MyGaussFunData 134 fun_data d lt set data in d gt EVAL_FUN gaussian 1 MyGaussFunData d MyGaussFunData fun_data lt use data in d and define amplitude gt CLEAR_FUN gaussian clear allocated memory MyGaussFunData d MyGaussFunData fun_data delete d fun_data NULL Finally yet another approach is to have a wrapper class with methods init and eval The macro DEFINE_EXTENDED_AMPLITUDE_FUNCTION class_name is in charge of creating and destroying the object For instance the following file fun3 cpp defines two functions linramp and tanh_ramp File fu
191. th the caching manipula tion and will be discussed later Matrix are dimensioned in line 3 the first argument is the number of dimensions and then follow the dimensions for instance FastMat2 x 2 3 2 defines a matrix which 2 indices ranging from 1 to 3 and 1 to 2 respectively The rows of this matrix will store the coordinates of the local nodes to the element FastMat2 matrices may have any number of indices Actually the library is compiled for a maximum number of indices 10 by default This limit may be modified by redefining variable max_arg in script readlist eperl and recompile Also they can have zero dimensions which stands for scalars 9 1 2 Current matrix views a k a masks In lines 10 to 13 the coordinates of the nodes are loaded in matrix x The underlying philosophy in FastMat2 is that you can set views a k a masks of the matrix without actually made any copies of the underlying values For instance the operation x ir 1 k 78 for index restriction sets a view of x so that index 1 is restricted to take the value k reducing in one the number of dimensions of the matrix As x has two indices the operation x ir 1 k gives a matrix of dimension one consisting in the k th row of x A call without arguments like in x ir cancels the restriction Also the function rs for reset cancels the actual view 9 1 3 Set operations The operation a set x ir 1 2 copies the contents of the argument x ir
192. the __INCLUDE__ line itself with a __ECHO_OFF__ __ECHO_ON__ pair For instance __ECHO_ON__ global_options ett more options here nsaverot 50 viscosity 13 3333333333333 weak_form 0 re and here __END_HASH__ nodes 2 2 3 do not echo coordinates 16 __ECHO_OFF__ __INCLUDE__ stabi nod tmp __ECHO_ON__ __END_NODES__ 4 2 2 Class internals As its names suggests the class is based in a stack of files When a get_line is issued a new line is read comments are skipped and continuation lines are resolved if the line is an __INCLUDE__ directive then the current line is pushed in the stack and the new file is open and is the current file 4 3 Preprocessing with ePerl ePerl for embedded Perl is a package that allows inclusion of Perl commands within text files Perl is the Practical Extraction and Report Language a scripting lan guage optimized for scanning arbitrary text files extracting information from those text files and printing reports based on that information For more information about ePerl see http www engelschall com sw eperl while for more information on Perl see http www perl com Describing the possibilities of preprocessing with ePerl are far beyond the scope of this manual We will describe some basic techniques of use in the context of writing PETSc FEM user data files 4 3 1 Basics of ePerl ePerl allow you to embed Perl commands within text enclosing them withi
193. the best way to achieve this is to make it free software which everyone can redistribute and change under these terms To do so attach the following notices to the program It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty and each file should have at least the copyright line and a pointer to where the full notice is found lt one line to give the program s name and a brief idea of what it does gt Copyright C 19yy lt name of author gt This program is free software you can redistribute it and or modify it under the terms of the GNU General Public License as published by the Free Software Foundation either version 2 of the License or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU General Public License for more details You should have received a copy of the GNU General Public License along with this program if not write to the Free Software 11 Foundation Inc 675 Mass Ave Cambridge MA 02139 USA Also add information on how to contact you by electronic and paper mail If the program is interactive make it output a short notice like this when it starts in an interactive mode Gnomovision version 69 Copyright C 19yy name of author Gnomovision comes with ABSOLUTELY NO WA
194. the value of will converge to However if h is too large this can affect the conditioning of the linear system The options for this elemset are e vector lt double gt coefs default empty vector Penalization coefficients ar tificial film coefficients The length of coefs and dofs must be the same found in file flowrev cpp e vector lt int gt dofs default empty vector Field indexes base 1 for which the flow reversal will be applied found in file flowrev cpp e int ndim default 0 Dimension of the problem found in file flowrev cpp e vector lt double gt refvals default empty vector The reference values for the unknown The length of coefs and dofs must be the same found in file flowrev cpp e int thermal_convection default 0 If this is set then the code assumes that a thermal NS problem is being run i e vel_index 1 and the penalization term is added only on the temperature field i e dofs ndim 2 Also the refval found in file flowrev cpp 110 e int vel_indx default 1 Field index where the components of the velocity index start 1 base found in file flowrev cpp 12 4 Examples of use of flow reversal elemset Typical use in a thermal convection problem associated with an element like nsi_les_ther is as follows 1elemset flow_reversal 2 2npg 2 3 geometry cartesianld 4 dofs 4 5s coefs lt PENALIZATION COEFFICIENT gt 6 refvals lt INLET TEMPERATURE gt 7 data lt SURFAC
195. ther licensees extend to the entire whole and thus to each and every part regardless of who wrote it Thus it is not the intent of this section to claim rights or contest your rights to work written entirely by you rather the intent is to exercise the right to control the distribution of derivative or collective works based on the Program In addition mere aggregation of another work not based on the Program with the Program or with a work based on the Program on a volume of a storage or distri bution medium does not bring the other work under the scope of this License 3 You may copy and distribute the Program or a work based on it under Section 2 in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following a Accompany it with the complete corresponding machine readable source code which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange or b Accompany it with a written offer valid for at least three years to give any third party for a charge no more than your cost of physically performing source distribution a complete machine readable copy of the corresponding source code to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange or c Accompany it with the information you received as to the offer to distribute corresponding source code
196. to the boundary It can be set as a constant vector per elemset usually for plane boundaries as in the example above or as a per element value found in file advabso cpp int precoflag default 0 Flags whether we are solving a precondioned system with the dual time strategy found in file advabso cpp int switch_to_ref_on_incoming default 0 Use special combination for choosing reference state If velocity is outgo ing uref n gt 0 n is exterior normal u is the flow velocity then the use_old_state_as_ref strategy is used otherwise the state reference is used found in file advabso cpp int use_old_state_as_ref default 0 Flags whether to use the old state ad the boundary as reference for the linear ab sorbing boundary condition found in file advabso cpp double array vmesh default none Defines the mesh velocity when doing ALE computations For simple problems it is entered by the user For more complex problems it may be calculated automatically by the mvskin hook activating the compute_mesh_vel option in the elemset found in file advabso cpp 6 12 10 Absorbing wall boundary conditions In some cases we want to have a boundary condition that switches between absorbing and wall This is useful for instance for simulating a sliding hole in a boundary This kind of b c is implemented with the gasflow_abso_wall elemset It has properties similar to that elemset in particular the normal and vmesh pr
197. tolerance to solve the monolithic linear system Newton linear subitera tion found in file iisdmat cpp int compact_profile_graph_chunk_size default 0 Size of chunk for the dynamic vector used in computing the mstrix profile found in file iisdmat cpp double dtol default 1e 3 Divergence tolerance to solve the monolithic linear system Newton linear subitera tion found in file iisdmat cpp 25 string gmres_orthogonalization default modified_gram_schmidt Orthogonalization method used in conjunction with GMRES May be unmodified_gram_schmidt modified_gram_schmidt or ir_orthog default Iterative refinement See PETSc documentation found in file iisdmat cpp int Krylov_dim default 50 Krylov space dimension in solving the monolithic linear system Newton linear subit eration by GMRES found in file iisdmat cpp string KSP_method default fgmres Defines the KSP method found in file iisdmat cpp int maxits default Krylov_dim Maximum iteration number in solving the monolithic linear system Newton linear subiteration found in file iisdmat cpp string preco_side default lt ksp dependent gt Uses right or left preconditioning Default is right for GMRES found in file iisdmat cpp string preco_type default jacobi Chooses the preconditioning operator found in file iisdmat cpp int print_fsm_transition_info default 0 Print Finite State Machine transitions for PFPETScM
198. tored in the cache is checked against the label for the operation to be performed and if differences are found the system stops execution Currently the string stores only the type of operation and the pointers to the matrices involved not the masks this will be done in a future So that an error as discussed in 89 2 5 will not be detected Activating this feature is very expensive so that it is only expected to be used in the debugging process Production code should not have this feature enabled The feature can be selectively activated in some parts of the code and deactivated in others To deactivate the check use ctx check_labels 0 9 2 7 Multithreading reentrancy If caching is not enabled FastMat2 is thread safe If caching is enabled then it is thread safe in the following sense a ctx must be created for each thread and the matrices used in each thread must be associated with the context of that thread If creating the cache structures each time is too bad for efficiency then the context and the matrices may be used in a parallel region stored in variables and reused in a subsequent parallel region 9 2 8 Debugging FastMat2 code The following tips can help in debugging FastMat2 code e As with the debugging of any other code it is better to first switch to the sim plest situation and debug then restore each feature at a time For this deactivate caching ctx use_cache 0 run in single thread mode and go to debug mo
199. tream bottom Gs on Qs x 0 t 32 6 11 Boundary Conditions 6 11 1 Boundary Conditions to simulate River Aquifer Interactions Coupling Term The stream aquifer interaction process occurs between a stream and its adjacent flood plain aquifer The coupling term is not explicitly included in eq 22 but it is treated as a boundary flux integral At a nodal point we can write the coupling Es P Ri h h 34 where Gs represents the gain or loss of the stream and the main component is the loss to the aquifer and Ry is the resistivity factor per unit arc length of the perimeter The corresponding gain to the aquifer is Ga E T 35 where I represents the planar curve of the stream and r is a Dirac s delta distribution with a unit intensity per unit length i e L f OLIE f E 8 ds 36 The stream loss element set represents this loss and a typical discretization is shown in fig 4 The stream loss element is connected to two nodes on the stream and two on the aquifer If the stream level is over the freatic aquifer level hy h gt then the stream losses water to the aquifer and vice versa Contrary to standard approaches the coupling term is incorporated through a boundary flux integral that arises naturally in the weak form of the governing equations rather than through a source term 6 11 2 Initial Conditions First Second and Third Kind Boundary Conditions Absorbent Boundary Condition
200. try A The amount of storage needed is a mattern oof concern here It is allmost sure that we will need at least one integer per each non null entry A first attempt is to have a dynamic array for each index and store their all the connected j indices In typical applications we have O 10 to O 10 nodes per processor and the number of connected 7 indices ranges from 10 to several hundred In order to avoid this large number of small dynamic arrays growing we store all the indices in a big array behaving as a sinly linked list Each entry in the array is composed of two integers struct Node one pointing to the next entry for this row and other with the value of the row Consider for instance a matrix with the following sparse structure A x 156 Ro K Matrix coefficients i j are introduced in the following order 1 1 2 2 1 2 3 1 1 3 3 3 The dynamic array for this mesh results in row 3 starts row 2 starts row 1 starts Pos in dyn array 0 1 2 3 4 5 6 7 8 Contents of dyn array 1 3 2 4 1 6 2 5 0 1 3 7 3 8 0 1 0 1 Fl v v 38 33 32 a 382 za Na w 2 Figure 25 Structure of darray 137 For all i the i th row starts in position 1 The first component of the pair shows the value of 7 and the position of the next connected j index The sequence ends with a terminator 0 1
201. tum or angular momentum e Same as above but depending also on gradients of the state variables for instance total elastic energy This involves some e Same as above but integrals of quantities on a manifold of a dimension lower than the embedding space In this case the integrand may involve also the normal to the surface for instance total flow of heat or volume rate through a surface Gatherers are implemented as elemsets the only difference is that they typically only process a special jobinfo named gather This jobinfo is processed after the time step i e only after the Newton loop on the converged values This jobinfo task does not assemble vectors or matrices but a series global values stored in a global C vector called vector lt int gt gather_values A typical call is as follows taken from ns cpp arglf clear arglf arg_add amp state IN_VECTOR USE_TIME_DATA arglf arg_add amp state_old IN_VECTOR USE_TIME_DATA arglf arg_add amp gather_values VECTOR_ADD ierr assemble mesh arglf dofmap gather amp time_star CHKERRA ierr Note that three arguments are passed to the gather assemble task the old and new states and the vector of gathered values Many gatherer elemsets have the suffix integrator appended to their names for in stance visc_force_integrator or volume_integrator 11 1 Dimensioning the values vector In order to use the gatherers the user must before dimension appropriately the ar ray
202. turn_wall default 1 Activate the transition from absorbing boundary condition to wall boundary con dition found in file advabsow cpp int ALE_flag default 0 Do correction for wave characteristic computation do to mesh velocity ALE found in file advabsow cpp string ldfilename default lt none gt Name of shared file where the turn wall functions is defined found in file advabsow cpp int ndim default 0 Dimension of the space found in file advabsow cpp int switch_to_ref_on_incoming default 0 Use special combination for choosing reference state If velocity is outgo ing uref n gt 0 n is exterior normal u is the flow velocity then the use_old_state_as_ref strategy is used otherwise the state reference is used found in file advabsow cpp string twf_class_name default lt none gt Name of turn wall function class found in file advabsow cpp int use_old_state_as_ref default 0 Flags whether to use the old state ad the boundary as reference for the linear ab sorbing boundary condition found in file advabsow cpp int vel_indx default 1 Turn absorbing boundary condition to wall boundary condition found in file advabsow cpp 61 7 The Navier Stokes module 7 1 LES implementation The Smagorisky LES model for the Navier Stokes module follows The implementation under PETSc FEM has presents the following particularities e Wall boundary conditions are
203. uffers are sent to the other processors following a strategy preventing deadlock in Nproc 1 stages where Nproc is the number of processors In the k th stage data is sent from processor j to processor j k nproc Where is the remainder operator as in the C language In each stage all processors other than the server do first a MPI_RecvO and after a MPI_Send while the server does in the other sense i e first a MPI_Send and after a MPI_Recv initiating the process Processor 0 1 2 3 4 5 AR Po Pai E S oa Qi S R a aoe R S gt gt i IR S 1 te of oo R ye PERES time S s s R d sy FR S A o Sf t da E a A MN I I l E S E ee ee M I S send R receive Figure 30 Simple scheduling algorithm for transferring data among processors This is a rather inefficient strategy because at each stage all sending is tied to the previous one making the whole process Nproc T where T is the time needed to send a typical individual buffer from one process to other 145 Processor 0 1 2 3 4 5 6 7 O Qe pa Seay es oe gt Q x E 3 E E p E e ee S E O lt 0 3 E EC lt q aa a SN time EQ lt 0 E Eos E a E9 E EE E9S 0 E gt Fo lt 0 E i C Eo lt Fox p E JE E exchange Figure 31 Improved scheduling algorithm for transferring data among processors An improved
204. un scalar_fun_with_args_t function void user_args Apply a function with optional arguments to all elements 9 4 3 Generic sum operations sum over indices These operations perform some operation an all the indices of a given dimension resulting in a matrix which has less number of indices It s a generalization of the sum max min operations in Matlab that returns the specified operation per columns resulting in a row vector result one element per column Here you specify a number of integer arguments in such a way that e if the j th integer argument is positive it represents the position of the index in the resulting matrix otherwise e if the j th argument is 1 then we perform the specified operation sum max min etc over all this index For instance if we declare FastMat2 A 4 2 2 3 3 then B sum A 1 2 1 1 means Bij y Agia for 1 3 j 1 2 114 k 1 2 1 1 3 These operation can be extended to any binary associative operation So far we have implemented the following e FastMat2 sum const FastMat2 amp A const int m 0 Sum over all selected indices e FastMat2 sum_square const FastMat2 amp A const int m 0 Sum of squares over all selected indices 94 e FastMat2 sum_abs const FastMat2 amp A const int m 0 Sum of absolute values all selected indices e FastMat2 amp min const FastMat2 amp A const int m 0 Minimum over all selected indices e FastMat2 max const FastMat2
205. und in file readmesh cpp 4 4 2 Elemset options This options are used in the Elemset class e int chunk_size default ELEM_CHUNK_SIZE Chunk size for the elemset found in file elemset cpp e int debug_compute_prof default 0 Debug the process of building the matrix profile found in file elemset cpp e int element_weight default 1 Element weight for the processor found in file elemset cpp e double epsilon_fdj default EPSILON_FDJ The increment in the variables in order to compute the finite difference approxima tion to the Jacobian Should be order epsilon sqrt precision typical magnitude of the variable Normally precision 1e 15 so that epsilon 1e 7 typical magnitude of the variable found in file elemset cpp e int print_local_chunk_size default 0 Print the local chunk size used for each elemset in each processor for each chunk found in file elemset cpp e int report_assembly_time default 0 Debug the process of building the matrix profile found in file elemset cpp e int report_consumed_time default 0 Report consumed time for the elemset Useful for building the table of weights per processor found in file elemset cpp 23 int report_consumed_time_stat default 0 Print statistics about time spent in communication and residual evaluation found in file elemset cpp 4 4 3 PFMat IISDMat class options This options are used in the PFMat class int iisd_sub
206. underscores __ while single underscores are used to separate words inside the keyword For instance an elemset section is of the form elemset volume_euler 4 geometry cartesian2d ndim 2 npg 4 14 chunk_size 1000 lumped_mass 1 shock_capturing 1 gamma 1 4 lt Other element options go here gt __END_HASH__ 1 2 81 80 2 3 82 81 3 4 83 82 4 5 84 83 5 6 85 84 6 7 86 865 7 8 87 86 8 9 88 87 lt more element connectivities follow here gt __END_ELEMSET__ In this example the keyword elemset is followed by the parameters volume_euler that is the elemset type and 4 that is the number of nodes connected to each element The line starting with props describes some per element quantities more on this later see 85 4 Follows the assignation of some values to parameters for the actual elemset for instance the value 4 is assigned to the npg parameter that is the number of Gauss points The assignation of parameters ends with the __END_HASH__ terminator Follows the element connectivities one per line ending with the terminator __END_ELEMSET__ 4 1 Preprocessing the user input data file It s very handy to have some preprocessing capabilitites when reading input files for instance including files if else constructs macro definitions inline arithmetics etc Some degree of preprocessing is performed inside PETSc FEM this includes file inclusion skipping comments and continuation lines and is described i
207. utines from stdio sscanf and friends You should not try to modify this strings for instance with the strtok function In that case copy the string in a new fresh string remember to delete it after to avoid memory leak In this way you can pass arbitrary information strings integer doubles to the element routine 5 5 Reading with get_int and get_double As most of the time element properties are either of integer or double type two specific routines are provided get_int and get_double for instance 31 ierr get_int thash npg npg where the integer value is directly returned in npg You can specify also a default value and read several values at once 5 6 Per element properties table Many applications need a mechanism for storing values per element for instance when dealing with physical properties variable varying spatially in a continuous way If the properties is piecewise constant then you can define an elemset for each constant patch but if it varies continuously then you should need an elemset for each element which conspires with efficiency We provide a mechanism to pass per element values to the element routine At the moment this is only possible for doubles These properties are specified in the same line of the connectivities for instance elemset nsi_tet 4 elemset header props cond cp emiss name of properties to be defined per element Elemset properties hash t
208. ve system In fact the discretization of advective systems may be put in a completely abstract setting where the unique thing that varies from one system to another is the definition of the flux function itself The discretization of advective systems in PETSc FEM has been done in this way so that it is easy to add other advective systems by only adding the new flux function Applying the chain rule and noting that the fluxes only depend on position through their dependence on the state vector we arrive to OF OU A 4 Ox Ox 4 where JF A SU 5 are the jacobians of the advective fluxes 6 2 Discretization of advective systems Using the Finite Element Method with weight functions W x and interpolation functions N 5 x results in MU F U G 6 where Ur U US OS 7 UN noa so that U has Nao f Ndof X Npoa components M of dimension Na o f X Na o t is the mass matrix If we look at it as an Nnoa X Nnoa block matrix with blocks of size Nnaof X Naor then the i j block is M xo N x dx 8 35 the i block of the global flux and source vector F and G are OF F Nod a 0 SE ax G f N G x dx If the flux vector term is integrated by parts then we have the weak form Ni F f 2 Fedx mirlo dx 10 Q Ory T where T is the boundary of 2 This formulation is the Galerkin or centered one It is equivalent to approximate first derivatives by centered diff
209. vely we arrive to an estimation of a total time of T n ines F Nproc 2 eee 117 2Nproc DT 161 O 2NprocT Then it is significantly better than the previous algorithm Scheduling algorithm for exchanging data between processors void exchange j if myrank gt j 146 send data to processor j receive data from processor j else send data to processor j receive data from processor j 17 3 Mesh refinement Warning This is a work in progress FEM mesh Figure 32 FEM mesh and graph representation Figure 33 Geomtrical objets 5 nodes and an edge We conceive a mesh as a graph connecting nodes and other higher dimension entities based on these nodes Consider for example five nodes labeled from 0 to 40 and an edge connecting two of these nodes as in figure 33 In total there are 6 geometrical objects 5 nodes and the edge in the figure In order to identify higher order objects like the edge we could add a new index for it say index 5 but instead we can associate the edge with the connected pair of node indices edge 2 3 Note that if we consider that the edge has no orientation then the sequence must be considered as a set i e edge 2 3 edge 3 2 Lisp like S exp expressions will be used throughout in order to describe objects whereas if oritentation matters then edge 2 3 edge 3 2 We could then define geometrical objects of ty
210. xecuted always in the same order is called a branch Looking at the previous code we have one branch starting at the x ir 1 k set line through the J rs detO line This sequence is repeated many times one for each element so that it is interesting to reuse the cache list For this we create a branch object bi class FastMat2 CacheCtx2 Branch object and jump to this branch each time we enter the loop The first time we enter the loop the cache list is created and stored in the first position of the cache structure In the next and subsequent executions of the loop the cache is resued avoiding recomputation of many administrative work related with the matrices The problem is when the sequence of operations is not always the same In that case we must issue several jump commands each one to the start of a sequence of FastMat2 operations Consider for instance the following code FastMat2 CacheCtx2 ctx FastMat2 CacheCtx2 Branch bi b2 FastMat2 x ctx 1 3 ctx use_cache 1 int N 10000 in 0 out 0 for int j 0 j lt N j ctx jump b1 x fun rnd double len x if len lt 1 0 in norm_p_all ctx jump b2 x scale 1 0 len printf total d in d n N in double in N A vector x of size 3 is randomly generated in a loop the line x fun rnd Then its length is computed and if is shorter than 1 0 it is scaled by 1 0 len so that its final length is one In this cas
211. y of communication when generating a sequence of frames for a video with a DX sequencer for instance Note that if you don t use a sequencer then you have to arrange in someway to make ExtProgImport awake and connect to PETSc FEM otherwise the update in the visualization is not performed and the PETSc FEM job is stopped waiting for the connection e In asynchronous mode steps 0 in contrast PETSc FEM monitors each port after computing a time step If a DX client is trying to connect it answers the request and resumes computing otherwise it resumes computing immediately This is ideal for monitor a job that is running in background for instance Note that in this case the interference with the PETSc FEM job is minimal since once PETSc FEM answers the request resumes processing automatically until a new connection is requested The steps state variable is internal to PETSc FEM It can be set initially with a dx_steps options line 1 by default After that it can be changed by changing the steps 113 input tab However note that the change doesn t take effect until the next connection of DX to PETSc FEM If you don t want to change the internal state of the steps variable then you can set it to NULL or 1 13 2 Building and loading the ExtProglmport module This module allows PETSc FEM to exchange data with DX through a socket using a predefined protocol The module is in the PETSCFEM_DIR dx directory of the PETSc F
212. you should decide whether it should be assumed to be constant for all the elemset or whether it can be taken per element The second is the more general case but to take all the possible properties as per element may be too much core memory consuming There is then a mechanism to allow the application writer to get physical properties without bothering of whether the user has set them in the properties hash table or in the per element properties table First the application writer reserves an array of doubles large enough to contain all the needed properties This doesn t scale with mesh size so you can be gen erous here or either use dynamic memory allocation Before entering the element loop the macro DEFPROP prop_name determines whether the property has been passed by one or the other of the mechanisms This information is stored in an in teger vector elprpsindx MAXPROP Also it assigns a position in array propel so that propel prop_name_indx contains the given property Then once inside the element loop a call to the function load_props loads the appropriate values on propel MAXPROP independently how thay have been defined A typical call sequence is as follows Maximum number of properties to be loaded via load_prop define MAXPROP 100 int iprop 0 elprpsindx MAXPROP double propel MAXPROP determine which mechanism passes conductivity DEFPROP conductivity

User's Guide - Centro Científico Tecnológico CONICET Santa Fe

Contents

Download Pdf Manuals

Related Search

Related Contents

User's Guide - Centro Cient&iacute;fico Tecnol&oacute;gico CONICET Santa Fe

Contents

Download Pdf Manuals

Related Search

Related Contents

User's Guide - Centro Científico Tecnológico CONICET Santa Fe