Home

Fast Marching Methods - Parallel Implementation and Analysis

1. 1 7 At this point we can expect u to be some kind of solution of our initial value problem 1 5 but we only know that u is continuous and we do not have any information on the derivatives of u As 0 we do not have a uniform bound on Dus which would allow us to show that Du Du in some sense To prove that we introduce a smooth test function v C Q vlr 0 and suppose that u has a strict local maximum at x Q 1 8 This means u xo v zy gt u x v x in some neighborhood of xo with zo Recalling 1 7 we claim that for each e gt 0 sufficiently small enough there exists a point xe such that ue v has a local maximum at zej and ve o as j 00 1 9 To confirm this note that for each sufficient small r gt 0 equation 1 8 implies max u v lt u v zo where B zp r is the closed ball in Q C R with center 10 at Ty and radius r Applying Arzela Ascoli compactness criterion we claim that for small enough ue u uniformly on B and max ue v lt ue v xp Consequently u v attains a local maximum at some point in the interior of B Considering now a radii sequence r such that r 0 we obtain 1 9 Using the second derivative test and relation 1 9 we can relate the derivatives of Ue and v Due xe Dv ze 1 10 Aue re gt Av zes We show that H xo Du xo lt 0 H xe Du xe
2. max D T 0 max 0 DT a 2 9 1 DR D a 1 3 Using the finite difference operators D T and DYT j we get 2 2 Tij Tij 4 Din liz 1 Ag Ay Fe ij and Ax Tij Tij Tiny Tig FZ 2 16 ij Solving the quadratic equation in T we get Liga ig Tij 2 17 Since Xi A irig A Q Tij gt Teci Tij gt Lisig and therefore Tiy gt Darthan Hence we consider only the solution of 2 17 involving the sign higi taig t a ja Taig Tij al 2 18 Since T is the solution of equation 2 16 we can rewrite it as Ax F2 i j Tij Taj Taraj Ti and therefore 45 2 2 Since both sides of the above inequality are positive we can take the square root and get A Ty Tiga lt 2 19 y gt Fij Now we can conclude that Ax i j Tors digz Ss ja In order to complete the proof for this case we need to show that the ex pression that appears under the square root in the calculation of Tj as a function of two of its neighbors in equation 2 18 can never be negative Since node X 41 was accepted before node X 1 we know that T 1 lt Ti41 In this setting Tf was first computed based on X 41 and following the arguments from case 1 we can say that Ax Tag S Tij lt luig t T 2 9 Since X 1 becomes accepted before X we have that Ti 1 lt T From all of these we ded
3. 2 Fast Sweeping strategy perform a Fast Sweeping 28 at the interfaces and update the overlapping regions of the sub domains The advantage of these strategies is to not keep a distributed list for the narrow band which would require much more communication time The thesis is organized as follows In Chapter 1 I present the theoretical background necessary to deal with the problem of moving interfaces I begin by introducing the Hamilton Jacobi equations and viscosity solution emphasizing the Eikonal equation and its application In the second part of this chapter I explain the numerical approximation scheme and the numerical methods used to solve the Eikonal equation over the past years Specifically I talk about the main features of the Rouy Tourin algorithm 21 and the Fast Sweeping algorithm 28 Chapter 2 is devoted to the sequential Fast Marching Methods which are the optimal way to solve Hamilton Jacobi equations 23 I begin by explaining the idea of classifying the nodes as accepted narrow band and far away I follow with the description of the algorithm pointing out how the upwind nature of the nu merical scheme is preserved I end up presenting how the numerical solution built by the Fast Marching Methods converges to the viscosity solution of the Eikonal equation To prove this I first show how the solution of the discrete Eikonal equa tion converges to the viscosity solution 3 Secondly I show that the s
4. GeomStruct 86 where Na and Ny denote the global dimensions and for each processor we have the local dimensions limits xs xm and ys ym respectively By ghostx ghosty ghostxw ghostyw we denote the ghost zone rows and columns dimensions Also for each processor we need to define its possible neighbors left right top and bottom as in structure Neighbor Neighbor top ghostx ghostxw ghostyw XS xm ym Neighbor left Neighbor right RE RE ghosty YS Neighbor bottom FIGURE 4 16 Domain structure with the points and ghost points limits As presented in Algorithm 4 the ghost point synchronization procedure is the part where we can apply different strategies in order to restart the Fast Marching algorithm The purpose of such strategies is to assure the monotonicity and upwind nature of the algorithm even when we have inter domain communications The difference between these strategies is in the way the nodes in the boundary are computed using the ghost points We developed several strategies to accomplish this task among which the following are the most viable 1 Ordered Overlap strategy group the overlapping nodes of each sub domain in a sorted list and use this list to recompute the boundary values 2 Fast Sweeping strategy perform a Fast Sweeping 28 at the interfaces and update the overlapping regions of the sub domains 87 The advantage of these strategies is to not keep a distribut
5. if Tli 5 lt min then F1fi fj 0 else if T i 5 gt max then Fl b 2 else x min lt Ti 5 lt max Et end end end return Fl 4 4 4 Stopping Criteria Modeling the problem numerically we introduce numerical error The numeri cal error is caused by the finite precision of computations involving floating point values round off error and by the difference between the exact mathematical so lution and the approximate solution obtained through discretization truncation error We introduce the notion of numerical stability an algorithm is numer ically stable if an error once it is generated does not grow too much during the calculation This is only possible if the problem is well conditioned meaning that the solution changes only a small amount if the problem data is modified by a small amount Indeed if a problem is ill conditioned then any error in the data will grow a substantially In iterative methods we start from an initial guess and form successive approx imations that converge to the exact solution only in the limit Let T and T denote the solution computed by the parallel Fast Marching algorithm at iteration 93 n and n 1 Then the global error after each iteration is Naively we would iterate until en 0 But this approach can bring us to an infinite number of iterations Therefore a convergence criterion needs to be specified in order to decide when a sufficiently accurate s
6. 49 3 1 General Id es s en em LS ae ade ac ares ane 49 Oe NOTATION amp sos SR RS Ge SE oe Gi IA 51 3 3 Parallel Fast Marching Algorithm 54 3 3 Gh st POMS se by de nr he sa su ark re ame 55 3 4 Convergence 2 y La le de Fi ide Si Ed ais asser 57 Chapter 4 Implementations and Numerical Experiments 65 4 1 Sequential Implementation ox 4 444 ss a a 65 4 2 Sequential Numerical Experiments 73 4 2 1 Complexity of the Algorithm 74 422 Approximation Error ce e states rdi pper ses 74 4 2 3 Numerical Results 5 aooaa a e 76 4 3 Background on Parallel Computing TT 4 4 Parallel Implementation 4 44 4 24 sus 884 4 sua 83 ill 4 4 1 Ordered Overlap Strategy 88 4 4 2 Fast Sweeping Strategy 91 4 4 3 Flags Reconstruction gt ca gece 2k Gee du Le LAS 92 4 4 4 Stopping Criteria sua Shan Stand Cee ce Sea Ss 93 4 5 Parallel Numerical Experiments 96 4 5 1 Weak Scalability soci 66 03 ba 0k RA 100 4 5 2 Strong Scalability 4e sx pas se ssaseisasstd 105 References E RO 111 Appendix A Modules Hierarchy 113 Appendix B PETSc Features ommsesprsinrarcos presione 118 NWA ri ei A A AAA AAA AA A AAA 120 List of Figures 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 1 2 2 2 3 2 4 2 9 2 6 2 1 3 1 3 2 3 3 3 4 3 9 3 6 3 7 3 8 Curve
7. Now we have all the tools to prove Lemma 2 5 for our numerical scheme In our case the numerical approximation scheme will satisfy 9ij D T DT DUT DT 0 rea Gl j j j j 2 12 fi 0 zer where gi DT j DT j DT j DYT 5 is computed as 1 97 1 P 9 max D T D T max DT DYT j The discretization p represents a pair Az Ay of space discretization steps u T is a function defined on A 3 Ya i 0 N j 0 M A Q and for all x Q we define u and as u liminf inf u y n gt 0 ye B a n NAp 0 lt p lt n and u limsup sup u y n gt 0 yEB a n NAp 0 lt p lt n Our scheme is monotone i e if U gt V then for all 7 j we have DU DHU j Dg Dag lt DV DHV g D Vag DAV The scheme is stable since u has its minima in Q and is bounded below by a constant independent of p which is the minimum value on Q The scheme is consistent with the Hamilton Jacobi equation 2 6 as for each 1 7 1 gis a a b b Va b F for all a b R 19 We can also prove as in 3 that u and u are the sub solution and the super solution of equation 2 6 respectively and that u lt u on OQ Since our approximation scheme satisfies all the assumptions made in the hypoth esis of Theorem 2 12 we can apply Theorem 2 12 to prove that the solution of the discrete Eikonal equation converges toward the viscosity solution as the
8. we define the neighbors set as VOX D Xi5 U GX 92 c Node on a column boundary d Node on a corner boundary FIGURE 3 3 Neighbors of X j In Figure 3 3 the sub figures illustrate the difference between the neighbors and the ghost neighbors based on a typical situation Thus the sets of neighbors and ghost neighbors of X are o figure 3 3 a DY XG As TER POSE Aia and GAG 0 o figure 3 3 b Diggs Ags er Mit and G X 0 o figure 3 3 c DX iz Xin 5 aie Xi A and C X Xij o figure 3 3 d D Xi Xij Ait and G X Xij Aig 53 Remark 3 5 Again we commit an abuse of notation since special care has to be applied when Xi is at the boundary of Qg We let the reader distinguish between the indices of a ghost node and a node in the domain Xo OQ and X_10 G Xo0 butO lt i lt N Definition 3 6 By A Q NB Q and FA Q we denote the sets of accepted narrow band and far away nodes over the sub domain Ng We also define e the set of accepted nodes over the boundary of domain Qy as A 0Q A Qy N Nx e the set of narrow band nodes over the boundary of domain x as NB AQ NB Q4 N 00 e the set of far away nodes over the boundary of domain x as FA 00 FA Q N Nz 3 3 Parallel Fast Marching Algorithm The main idea of the parallel algorithm is to perform Fast Marching on the sub domains update the boundary
9. 1 1 Motivation Let us consider the unidimensional Eikonal equation with homogeneous Dirichlet boundary condition lu x 1 in 1 1 1 3 i gH The general solution of the differential equation is u a c We cannot choose a sign for x and a constant of integration to satisfy both boundary conditions but there are weak solutions that satisfy the differential equation almost everywhere The function u x 1 x satisfies the boundary conditions and satisfies the differential equation everywhere except x 0 This solution gives the distance to the boundary of the domain but is not unique As shown in Figure 1 3 there exists infinitely many weak solution continuous function with slope 1 satisfying almost everywhere the boundary conditions JSON IAN c d FIGURE 1 3 Weak solutions of u 1 u 1 u 1 0 By adding a small viscosity term to Equation 1 3 one can obtain a second order equation for u 1 eu u 1 in 1 1 1 4 ue 1 u 1 0 gt 0 It is well known that equation 1 4 has a unique solution of the form ue x 1 z ee 1 eee In Figure 1 4 we graph the solution of equation 1 4 for e 1 5 1 10 1 20 1 40 1 100 For small e the viscosity term smooths out part of the solution u x In fact it will smooth out the corners to make the solution C In this example we choose gt 0 to smooth the solution at its rel
10. H ae Due xe using 1 10 gjAus xe using 1 6 IA e Av ze using 1 10 Since v is smooth H is continuous te o as j and letting e 0 the previous inequality becomes H to0 Du zo lt 0 1 11 Similarly we deduce the inverse inequality H xo Du xo gt 0 1 12 provided that u v has a local minimum at zo 1 13 Equations 1 8 1 11 1 13 and 1 12 define the concept of weak solution of 1 5 as Definition 1 1 version I A bounded uniformly continuous function u is a vis cosity solution of the initial value problem 1 5 for the Hamilton Jacobi equation 11 provided that the function satisfies the boundary condition u on OQ and for all v CMU ifu v has a local mazimum at xo then H xo Du zo lt 0 1 14 u is called a viscosity sub solution and ifu v has a local minimum at xo then H xo Du xo gt 0 1 15 u is called a viscosity supra solution Remark 1 2 In the light of Definition 1 1 u defined in 1 7 and solution of 1 6 converges to the viscosity solution of 1 5 To verify that a given function u is a viscosity solution of the Hamilton Jacobi equation H x Du 0 equations 1 14 and 1 15 must hold for all smooth functions v Evans noted in 8 that if we used the vanishing viscosity method to construct u then it would indeed be a viscosity solution Viscosity solutions can be seen as the limit functi
11. Sweeping strategy we present the time complexity versus N the number of points in leading dimension in Fig ure 4 23 As we can see for smaller images the Ordered Overlap strategy performs better Figure 4 23 a but for large images the Fast Sweeping strategy gets better time Figure 4 23 b The time complexity of Fast Sweeping strategy is closer to O N in the worst case scenario Due to this fact we can consider Fast Sweeping strategy better than Ordered Overlap strategy L error In order to compute the L error we relate the gradient with N the number of points in the leading dimension In Figure 4 24 the blue line represents the gradient as a function of the leading dimension of the domain and the red line is the linear least square approximation of those simulations We specify the value of the slope for each graph We can see that the gradient is proportional to N where a is the slope a lt 0 For the worst case scenario a 1 2661 and for the normal case scenario a 1 0094 For the Fast Sweeping strategy the results are presented in Figure 4 25 and in this case also the gradient is proportional to N where a lt 0 is the slope For the worst case scenario a 0 4186 and for the normal case scenario a 0 5745 Fast Marching execution time with respect to the number of processors To study the execution time of the Fast Marching algorithm with respect to the number of CPU s we consider the cases where
12. X 0 otherwise T X 00 For exemplification purposes Figure 1 8 shows the stages of the algorithm for a particular case At each iteration of the algorithm we solve the local Eikonal equation at all the points of the domain Note how T changes 25 only at a few grid points In Figure 1 8 these points are shown in green while the points with exactly computed T are represented by red cells and the points where Iteration 0 Iteration 1 00 00 00 00 00 00 00 00 00 00 00 00 Iteration 2 Iteration 4 Iteration 5 Iteration 6 FIGURE 1 8 Rouy Tourin Iterations 26 T are the white cells At each iteration the area of the nodes where T is computed exactly starting from the corners extends until convergence 1 2 3 2 Fast Sweeping Method The Fast Sweeping Method is an efficient iterative method which uses upwind difference for discretization to solve the Eikonal equation 28 The idea behind Fast Sweeping is to sweep through the grid in certain directions computing the distance value for each grid point The sweeping ordering follows a family of characteristics of the corresponding Eikonal equation in a certain direction simul taneously For example in a two dimensional domain T at a grid point depends on its neighbors in the following four ways e left and bottom neighbors e left and top neig
13. Z becomes accepted node X can be in one of the following cases Case 1 only one of the neighbor of X is accepted Case 2 two of the neighbors of X are accepted Case 3 more than two neighbors of X are accepted Consider the particular setting where sub domain 2 is to the left of sub domain Og which is to the left of 1 Case 1 Only one of the neighbors of X is accepted and this node is Z Up to a rotation in the domain we can assume without loss of generality that node Z Qg or that Z Oi 1 1 Let node Z Qy and Z D X be the only accepted neighbor of X see Figure 3 6 Since Z A Q by part 1 of remark 3 8 X NB Q and all the other neighbors of X are narrow band or far away Hence there is no accepted 1 n gt 3 npt ghost node to influence e and Tr X 5 Ty Xi Therefore 1 1 n 3 n 3 Le er Arr 1 2 Let node Z G X and Z see Figure 3 7 2 3 60 Ot Ox Ox 1 FIGURE 3 6 Node Z is in the same sub domain as X pa y Xx y e OR Ox Okri FIGURE 3 7 Node Z is a ghost node of X j Since Z A Ox Z GX Xij NB Q and T Xi gt T2 Z By 1 n 3 Proposition 3 9 T X gt Ty X For computing Tox we used n 3 net its accepted ghost neighbor Z and relation TI Z lt T Xij holds true Case 2 Two of the neighbors of X are accepted Assume that node A was accepte
14. a better illustration of the solution s shape in both cases we construct the elevated surface of the solution and plot it on the discretization grid see Figure 4 8 b and Figure 4 9 b The second case is relevant for the intersection of gradient directions of T and how the solution is computed in this situation see Figure 4 9 b 76 P yyy distance 15e 01 6le 1 1 054e 01 0 000e 00 FIGURE 4 9 Solution s shape for one processor two starting nodes 4 3 Background on Parallel Computing To solve a problem numerically we construct an algorithm as a discrete series of instructions Usually an algorithm is implemented to be sequential that is instruc tions are executed one at a time in order of appearance on a single Central Pro cessing Unit CPU For the parallel implementation we use multiple processing units simultaneously Our problem is divided into sub problems each sub problem being assigned to a different processing unit Furthermore each sub problem is divided into a series of instructions or tasks a task is a program or program like set of instructions that is executed by a processor Thus each CPU can execute T7 its part of the algorithm concurrently in parallel with the others Resources that we can use as processing elements include a single computer with multiple pro cessors several networked computers specialized hardware or any combination of the above 20 Parallel computi
15. a l ho Sig O Slope 0 57454 b Normal case FIGURE 4 25 L Error Fast Sweeping Strategy 107 2 FM execution time versus number of CPU Ordered Overlap approach 10 i qEA lt A lt A gt lt A TO r rr FM Time Slope 1 3789 10 I ees eee 10 10 CPU a Worst case FM Execution time versus number of CPU Ordered Overlap approach Time Slope 1 63 y gl ig CPU b Normal case FIGURE 4 26 FM execution time with respect to the number of CPU s Ordered Overlap Strategy 108 2 T FM execution time versus number of CPU Fast Sweeping approach 10 10 E E Fa Slope 1 429 10 7 107 o 10 10 10 CPU a Worst case FM Execution time versus number of CPU Fast Sweeping approach E Foo 10 J Slope 1 6219 10 40 10 10 CPU b Normal case FIGURE 4 27 FM execution time with respect to the number of CPU s Fast Sweeping Strategy 109 Ordered Overlap approach execution time versus number of CPU o o FM time linear FM time approximation o 5 10 F Slope 1 4824 7 iL o 10 10 10 CPU a Ordered Overlap Approach FM Time versus number of CPUs Fast Sweeping approach 10 o FM time linear FM time approximation o 10 J o s 10 Slope 1 6828 E LL 107 E 10 10 10 10 CPU b Fas
16. all the nodes become accepted 2 2 Fast Marching Algorithm Description Let Q be the rectangular domain 0 1 x 0 1 of R Given the discretization steps 1 1 Ag N Ay iv gt 0 we denote by T the value of our numerical approximation of the solution at x yj Azr jAy i 0 N j 0 M Similarly F represents the value of F at node x y We define the neighbors of a grid point x y Definition 2 1 The set of neighboring nodes of a grid point X aj yj is V X miss Yj Li1 Yj Ti Yi Ti Yi 32 for1 lt i lt N 1and1 lt j3 lt M 1 Remark 2 2 The definition of V X has to be adapted fori 0 or j 0 and i N or j M by eliminating the appropriate points These are the nodes appearing in the stencil of the finite difference discretization The Fast Marching algorithm is presented in Algorithm 3 Algorithm 3 Fast Marching algorithm Initialization Set T 0 for accepted nodes Tag all neighbors of accepted nodes that are not accepted as narrow band Compute T for all narrow band nodes Set T 00 for the rest of the nodes Main Cycle repeat Let A be the smallest T from all narrow band nodes Add node A to accepted Remove A from narrow band Tag as narrow band all neighbors of A that are not accepted for all y V A do if y is in far away then Remove neighbor from far away Add it to narrow band set end if y is in narrow band then Update T by solving th
17. also that H H x p is convex with respect to the variable p on R for each x Q and satisfies 1 17 and the following conditions 6EC Q NC Q such that d lt u inQ and 1 18 sup H z Vo x lt 0 foral CQ EN Then uy lt us in Q This theorem can be applied to the Eikonal equation whenever F x is Lipschitz in Q and it is strictly positive 6 Conditions 1 18 are satisfied by taking o x minus En 1 1 3 Application to the Eikonal Equation Proposition 1 15 Let Q be bounded open subset of R Set u x distance x 00 x E Q Then u is Lipschitz continuous and is a solution of the Eikonal equation Du 1 m0 15 This means that for each v C Q if u v has a maximum minimum at a point zo E Q then Du xo lt 1 Du xo gt 1 In the initial model problem 1 5 we can obtain the Eikonal equation by consid ering the Hamiltonian H z Du F Vu 1 For a general set 2 we have the following corollary Corollary 1 16 If F x gt 0 for all E Q CR then the equation F z VT z 1 reQCcR 1 19 Ni 0 zer admits a unique viscosity solution T Remark 1 17 In 9 it is proved that if F is Lipschitz in R N LS R then the unique viscosity solution T is locally Lipschitz in Q For the front propagation problems we want to have an Eulerian formulation for the motion of the initial curve p C Q under the influence of normal vel
18. between modules are presented in Figure 4 29 We start defining the structure and the double chained list and after that we continue with the flags assignment rules Next in hierarchical pyramid is the algorithm mod ule part responsible for all the Fast Marching related procedures The last module is the display module which manages the reading writing of the image from in pgm format the writing of the result files for graphical analysis and displaying in formation in between iterations There is the string conversion module responsible for creating the names of the simulation files The main function assembles all the modules together and establish their order of execution FIGURE 4 29 Modules Hierachy In the following algorithms we present the headers of all the modules considered in the numerical implementation 113 Algorithm 22 Main Function include mpi h extern C include lt pgm h gt include Stringconversion h include DisplayMat1 h static char help Read the pmg file and do the Fm algorithm int stages 19 int main int argc char argv Algorithm 23 Structure Module Header include lt stdio h gt include lt stdlib h gt include lt math h gt include petsc h include petscvec h include petscmat h include petscda h include petscsys h typedef struct GeomStruct int Nx Ny rank int xs ys xm ym ghostx ghosty
19. computation 19 One can scatter the local patches into the distributed vector with the command DALocalToGlobal DA da Vec 1 InsertMode mode Vec g Note that this function is not subdivided into beginning and ending phases since it is purely local A third type of distributed array scatter is from a local vector including ghost points that contain irrelevant values to a local vector with correct ghost point values This scatter may be done by commands DALocalToLocalBegin DA da Vec L1 InsertMode iora Vec L2 DALocalToLocalEnd DA da Vec Li InsertMode iora Vec L2 In our applications there are cases where we need to use the local ghosted vectors PETSc provides way to obtain these work vectors and return them when they are no longer needed This is done with the routines DAGetLocalVector DA da Vec 1 use the local vector 1 DARestoreLocalVector DA da Vec 1 Also we can set values into the DA Vectors and access them using the natural grid indexing by calling the routines DAVecGetArray DA da Vec 1 void array use the array indexing it with 1 or 2 or 3 dimensions depending on the dimension of the DA DAVecRestoreArray DA da Vec 1 void array where array is a multidimensional C array with the same dimension as the dis tributed array The vector l can be either a global vector or a local vector The array is accessed using the usual global indexing on the entire grid but the user may only refer to the local
20. geom Vec L Vec L_F int xs int ys int xm int ym void UpdateNarrowBand GeomStruct geom Vec L Vec L_F ListNode head ListNode tail 115 Algorithm 26 Algorithms Module Header include lt stdio h gt include lt stdlib h gt include lt math h gt include petsc h include petscvec h include petscmat h include petscda h include petscsys h include TestFlag_Functions h PetscReal mini PetscReal a PetscReal b PetscReal maxi PetscReal a PetscReal b PetscReal pitagora PetscReal a PetscReal b PetscReal distance int i int j PetscReal val GeomStruct geom PetscReal distanceNB int i int j PetscReal val GeomStruct geom void UpdateNBDist GeomStruct geom Vec L Vec L_F ListNode head ListNode tail void UpdateDistNbr GeomStruct geom Vec L Vec L_F ListNode head ListN ode tail ListNode crt void FastMarching GeomStruct geom Vec L Vec L_F ListNode head ListNode tail int numprocs PetscReal GhostPtsH GeomStruct geom PetscReal gh Vec L Vec L_F int a int b int d PetscReal GhostPtsV GeomStruct geom PetscReal gv Vec L Vec L_F int a int b int d PetscReal UpdateGhostPtsH1 GeomStruct geom Vec L Vec L_F ListNode head ListNode tail PetscReal gh int a int b int c int d PetscReal UpdateGhostPtsH2 GeomStruct geom Vec L Vec L_F ListNode head ListNode tail PetscReal gh int a int b int c int d
21. ghostxw ghostyw struct neigh int left right top bottom Neighbor DA Val_DA DA Flg_DA Y GeomStruct void Init Geom GeomStruct geom void Do_Structure GeomStruct geom 114 Algorithm 24 List module header include lt stdio h gt include lt stdlib h gt include lt math h gt include petsc h define S 1000 00 typedef struct ListNodef struct position int row int col entry struct ListNode next struct ListNode previous List Node struct ListNode MakeListNode int j int i void RemoveNode ListNode current void InsertAfter ListNode newnode ListNode current void InsertBefore ListNode newnode ListNode current struct ListNode FindPointer int i int j ListNode head ListNode tail void UpdateList ListNode p PetscReal val ListNode head ListNode tail void Print_List ListNode first ListNode last void Print_List_val PetscReal val ListNode first ListNode last PetscReal F struct ListNode FirstNarrowBand ListNode first List Node last PetscReal F PetscReal val void Do_List int x int y int xw int yw PetscReal val ListNode head ListNode tail Algorithm 25 Flag Module Header include lt stdio h gt include petsc h include TestStruct_Functions h include TestList_Functions1 h void countNB GeomStruct geom Vec L_F Vec L List Node head ListNode tail ListNode crt void UpdateFlag GeomStruct
22. have any node added to the narrow band then we set the minimum of the recomputed T on the narrow band to S Similarly the maximum of the recomputed T on the narrow band is set to 1 0 Algorithm 19 Boundary recomputation based on ghost points UpdatedGhostH geom c T Fl source head tail Input geom c T Fl source head tail Output recomputed NB values if Flfi c 0 then if sourceli gt T i c then if Flfij c 1 then if Flf ij c 2 then L Fl a e 0 else FIfl el 1 prt FindPointer i c head tail T i c distanceNB i c T geom UpdateList prt T head tail if no narrow band point then minmax 1 else minmax Tli c return minmaz 4 4 2 Fast Sweeping Strategy We want to apply the Fast Sweeping algorithm to update only the boundary of each sub domain Following Remark 4 2 part 1 we can update the boundary at each edge separately Since the boundary is just a segment i e row or column the 2D domain is reduced to a 1D domain Thus only two sweeps are necessary to compute the right values In general we need to do e 1 sweep for i 0 to n compute T at the boundary using the ghost nodes nki and save the minimum between Tp X and q Xij 91 e 2 sweep for i n to 0 compute T at the boundary using the ghost nodes nt and save the minimum between T X and qT Xiz gnd e save the minimum between the T at 1 and sweep The pseudo code of the algorith
23. is determined only by those neighboring nodes of smaller T 2 3 Convergence Theorem 2 4 As the mesh size goes to zero i e Ax 0 and Ay 0 the numerical solution built by the Fast Marching Methods converges to the viscosity solution of 1 1 Proof The proof is performed in two steps through the following lemmas Lemma 2 5 The solution of the discrete Eikonal equation converges toward the viscosity solution as the mesh sizes go to zero 3 Lemma 2 6 The sequence built by Fast Marching Methods converges to the solu tion of discrete Eikonal equation 35 2 3 1 Proof of Lemma 2 5 The proof is borrowed from Barles and Souganidis 3 Consider an approximation scheme of the form Spa O in Q 2 1 where S R x Q x L2 Q R is locally bounded L Q is the space of bounded functions defined on 2 and uw is the solution of 2 1 Let u be the viscosity solution of the Hamilton Jacobi equation H z Du 0 on Q and u on ON If this scheme is monotone stable and consistent with the Hamilton Jacobi equa tions it converges to the solution of 1 1 For this we need to recall some results from 3 Definition 2 7 Monotonicity The scheme S is monotone if and only if it sat isfies S p x u lt S p 2 v fu gt v Vp gt 0 EN andu v L A 2 2 The purpose of the scheme S is to approximate the Hamilton Jacobi equations and thus the monotonicity condition for S is equivalent
24. j i lj ij 1 i a b FIGURE 1 5 Upwind Discretization 2D case 20 1 2 2 Approximations of the Eikonal Equation Crandall and Lions 5 proved that consistent monotone schemes converge to the correct viscosity solution Starting from equation 1 1 in order to construct a numerical scheme which guarantees a correct upwind direction and a correct ap proximation of the viscosity solution it is necessary to have a good approximation of the derivative Extending the ideas of upwind approximations for the gradient to multiple dimensions we have the following schemes e Godunov s scheme 16 mae DF DT a as 2 1 maz DT Dj T 0 aa e Osher and Sethian s scheme 18 max D T 0 min DT 0 1 maz D T 0 min DT 0 aj ul where the forward and backward operators D Di are those defined earlier in equation 1 20 for the x and y directions Remark 1 20 Rewriting Godunov s scheme we obtain the so called Rouy Towrin s scheme 21 max maz D T 0 min D T 0 1 1 22 max max DE 0 min DT Nl ee a aj Recall the boundary value problem F VT 1 can be written in the form of general Hamilton Jacobi equation Higa iT DT 0 21 where the Hamiltonian is given by H x y DT DYT F D T DIT 1 1 23 To solve numerically the Eikonal equation we need a correct approximation of the viscosity solution 18 22 24 A smart way t
25. maximum values of the boundary 97 nodes We have to prove that the minimum T over the narrow band cannot be decreased if we do not have ghost nodes influence Proposition 3 9 For all sub domains Q and Q consider a node Xij OQ with T Xij and a node Xap OQ with T Xab such that Xap G X 1 n 3 Let T 7 X be the solution of the local Eikonal equation at Xi in Oy FONTS Me E TETAS 1 otherwise 3 TE Xiz Proof As we saw in the previous sections in all the computations a lower T for a node can be obtained only using a stencil that contains nodes already accepted in the previous iterations Ghost zone part 2 l Ghost zone part 1 l S J AO OW Ox Owe FIGURE 3 5 Ghost zone influence on sub domains For illustration purposes suppose that the sub domain 2 is on the right of Q ie Q Qg 1 as drawn in Figure 3 5 Up to a symmetry or rotation in the domain all possible configurations of sub domains can be reduced to this case Depending on the position of X on the grid we have the cases 58 e if X is a corner node then it has two ghost nodes and one local Eikonal equation to solve in sub domain Q e if X is located at the edge of Qg but it is not a corner node then it has one ghost node and two local Eikonal equations to solve in sub domain Q We need to solve the local Eikonal equations on the extended domain 9 and take the minimum over a
26. min T fi 1 51 Tle Ub d2 quadratic T i j 1 Tli 1 5 d3 quadratic T i 1 5 T 5 1 d4 quadratic T illj 1 Tli 1 5 d5 quadratic T i 1 5 Ti 1 O min valli 5 min d1 min min d2 d3 min d4 d5 return Tfi j Flag Assignment To assign the flags we check if there is any accepted node as neighbor If there is then our node is transformed from far away to a narrow band node Otherwise we leave it as a far away node The function that assigns the flags to the nodes is called UpdateFlag and it is presented in Algorithm 8 for any generic grid point X 67 Algorithm 8 Flag update UpdateFlag N M F1 Input N M Fl Output updated flag list Fl for i 0 to N do for j 0 to M do if none of the neighbors of point i j is accepted then Fiy 2 end else if Flfi 5 0 then point is not accepted Fla 1 end end end end return Fl Sorting Algorithm When node X becomes narrow band or accepted its T is computed based on its neighbors Therefore we need to delete from the list the element with T Xiz and insert at the right position the element with Ts The algorithm of removing a node from the double chained list is given in pseudo code in Algorithm 9 and it is illustrated graphically in Figure 4 2 When removing node current from the list the list connectivity needs to be preserved Therefore first we remove the forwa
27. nodes along the x and y axis for each cell the dimension of lx is m and the dimension of ly is n Rather than defining each argument PETSC_NULL may be passed to the function allowing the processor to 85 pick the appropriate values by itself Two types of distributed array communica tion data structures can be created as specified by st DASSTENCIL_STAR and DA_STENCIL_BOX The type of stencil used is DA_STENCIL STAR which is the standard 5 points stencil with width 1 We notice that this type of stencil will save us communication time since some of the ghost nodes are ignored We emphasize that a distributed array provides the information needed to com municate the ghost value information between processes At certain stages of the applications there is a need to work on a local portion of the vector including the ghost points For more information on how to make the connection between local and global vectors check Appendix B In the parallel Fast Marching Algorithm we consider two distributed arrays one for the T and one for the flags Fl j for all 0 lt 1 lt N and 0 lt j lt M The ag structure used in the parallel implementation has the following format and for clarity consider the graphical representation in Figure 4 16 typedef struct GeomStruct int Nx Ny int xs ys xm ym ghostx ghosty ghostxw ghostyw int rank struct neigh int left right top bottom Neighbor DA Val_DA DA Flg_DA
28. propagation with speed F in normal direction Curve propagating with PS os cewek eed be eb basis Weak solutions of ju 1 u 1 u 1 50 1423444444 Solution of Equation 1 4 for e 1 5 1 10 1 20 1 40 1 100 Upwind Diseretization 2D case 42 4 66224 ce qu am ge Diseretization for the 1 D case 244448245 82 22 2h dS 066 Grid points for Rouy Tourin algorithm Row Tour Iterations s s 44 6244 868560 Woe ed SS Sweeping in 2 directions 6 02 54 8 54 ee bee evden Dijkstra algorithm oscars eh Ske OEE Far away narrow band and accepted nodes Update procedure for Fast Marching Method Matrix of neighboring nodes of Xij o 0 dus Node A has only one accepted neighbor 44 4 44 Node X has the neighbors X _1 and X 41 accepted Node X has the neighbors X _1 and X 1 accepted Domain decomposition in sub domains Ghost points for a particular process INGIGDDOKS Oy ass eee eee ewe oe a Ghost points update eos e ee RR ee eo a Ghost zone influence on sub domains Node Z is in the same sub domain as Xij Node Z is a ghost node of Ay fj ba 8G eo ad 4 ao a Node Z is in the same sub domain as Xij 3 9 4 1 4 2 4 3 4 4 4 5 4 6 4 7 4 8 4 9 4 10 4 11 4 12 4 13 4 14 4 15 4 16 4 17 4 18 4 19 4 20 4 21 4 22 4 23 4 24 4
29. the uniqueness result does not hold It can be proved that in this case many viscosity solutions or even classical solutions may exist Theorem 1 11 Assume u1 uz C Q where Q C R is open and bounded are a viscosity sub solution and a super solution of H x Vu x 0 x Q and u lt uz on OQ Assume also that H satisfies the conditions of Lipschitz continuity H x p H y a lt C p al and H x p H y q lt C x yl 1 lpl 1 17 for x y Q p q E R and some constant C gt 0 Then uz lt us Q Remark 1 12 Assuming that the Hamiltonian H satisfies the conditions of Lip schitz continuity Evans proves that there exists at most one viscosity solution of 1 5 8 Theorem 1 pg 547 14 We need to assume that H is convex with respect to the variable p in order to apply this theorem to more general cases This assumption is the key point in many theoretical results Proposition 1 13 Stability Property Let un en C Q be a sequence of functions such that un is the viscosity solution of the problem Hale Vig U LEQ nen Assume that un converges locally uniformly to u and H converges locally uniformly to H Then u is the viscosity solution of H x Vu 0 TEQ Theorem 1 14 Let Q be a bounded and open subset of R Assume that u1 us C Q are the viscosity sub solution and super solution of equation 1 5 with u lt uz on OQ Assume
30. this equation I use the Fast Marching Method a computational technique for tracking moving interfaces and modeling the evolution of boundaries Fast Marching Methods are typically sequential and hence not straight forward to parallelize 18 23 The Fast March ing algorithm is a necessary step in Level Set Methods which in today s scientific computing world are widely used for simulating front motion related processes Therefore since they have a great impact on many applications it is imperative to parallelize Fast Marching Methods to try to improve even more execution time and solution accuracy The idea of the parallel Fast Marching algorithm is to perform Fast March ing on sub domains update the boundary values at the interfaces and restart the algorithm until convergence is achieved To update the boundary values at the interfaces we need to preserve the upwind structure of the numerical scheme and to synchronize the ghost zones at each iteration Therefore the efficiency of the parallel Fast Marching implementation depends on the required amount of com munication between sub domains and on algorithm ability to preserve the upwind structure of the numerical scheme during execution To address these problems I develop several strategies among which the following are the most viable 1 Ordered Overlap strategy group the overlapping nodes of each sub domain in a sorted list and use this list to recompute the boundary values
31. values at the interfaces and restart the algorithm until convergence is achieved To update the boundary values at the interfaces we need to preserve the upwind structure of the numerical scheme and to synchronize the ghost zones at each iteration The parallel Fast Marching algorithm performs iteratively the steps presented in Algorithm 4 until convergence is achieved 54 Algorithm 4 Parallel Fast Marching algorithm repeat Compute TP Xij Vx VXij Qp k 0 Ns using Fast Marching on the sub domains local FM Ghost points synchronization Exchange and update the boundary data at the interfaces until convergence conditions are satisfied 3 3 1 Ghost Points The ghost zones synchronization is not trivial since it should preserve the upwind nature of the discretization scheme The steps of the ghost zones synchronization procedure are presented in Algorithm 5 Algorithm 5 Ghost zones Synchronization procedure begin ane 1 YOg Xij OO compute A x as the solution of the local Eikonal equation at X Qk 2 reassign the flags nai 3 save the minimum of all qa for X OQ and use it to update the flags in Or end Remark 3 7 To avoid confusions in the parallel implementation when we refer to the value of a certain point we need to specify its sub domain and also the iteration during which it was computed Therefore let Tj Xj be the T at node Xij obtained during nt iteration at the e
32. 104 L Error Ordered Overlap Strategy 106 L Error Fast Sweeping Strategy 107 vi 4 26 FM execution time Ordered Overlap Strategy 108 4 27 FM execution time Fast Sweeping Strategy 109 4 28 Best case scenario Fast Marching execution time 110 4 29 Modules Hera vir 3528658 48 oS Ao oS BA 113 vil List of Algorithms 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 The Rouy Tourin algorithm aoaaa ma es Ba dd 3 24 Fast Sweeping Algorithm 4 4 4444 66485 24444 ba eas 29 Fast Marching algorithm 5124 Li 20 be eee RS os RES GS 33 Parallel Fast Marching algorithm 55 Ghost zones Synchronization procedure 55 Solution of the quadratic equation 67 Solution of the local Eikonal equation at X 67 Flag update s ce s ma ame eet aa 66 204 65 8 2 55 2 eo wees 68 Delete current node from the list 69 Insert mode X after current node 244 2424 289524 es 70 Insert node before current node 14 444 25 646 ete 71 UpdateList os sp coes eaor ee Oe a A 72 Basic MPI example 42 3 460 2475 das a eee d os be Paes 81 A 81 Deadlock example 44 4e sa due wed ae ee Ob ae das 82 Deadlock solution version 1 82 Deadlock solution version 2 83 Ordered Overlap Algorithm 90 Boundary rec
33. 25 Node Z is a ghost node of Xij iia cea da ai dda ee ee ee 63 Double chained list structure 66 Deletion from the double chained list 69 Insertion after the current node in a double chained list 70 Insertion before the current node in a double chained list 71 DOLL NOOR 242 LI 2 ae a 2 are a a ere same 73 Sequential Fast Marching Algorithm Complexity 75 LAB a a RAEE AAA 76 Solution s shape for one processor one starting node 77 Solution s shape for one processor two starting nodes 77 Shared memory architecture lt 414 14 4 ba gba Rae da A 79 Distributed memory architecture 79 Hybrid shared distributed memory architecture 79 MPI communication between 2 processors 82 Global and Local Vector e es sesa ea ebook amp rs ee e 84 Natural and PETSC ordering for a Distributed Array 85 Domain structure with the points and ghost points limits 87 Possible situation in ordered overlap strategy 89 Back propagation of the error 95 Solution s shape for multiple processors case 95 Test cases 2444484444 084 444 4 220 SERRE REELS 98 Time complexity for 50 CPU s Ordered Overlap Strategy 101 Time Complexity for 50 CPU s Fast Sweeping Strategy 102 Time Complexity for both strategies
34. FAST MARCHING METHODS PARALLEL IMPLEMENTATION AND ANALYSIS A Dissertation Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Department of Mathematics by Maria Cristina Tugurlan B Sc Technical University of Iasi 1998 M Sc Louisiana State University 2004 December 2008 Acknowledgments I would like to express my sincere gratitude to my research advisor Professor Blaise Bourdin for the suggestion of the topic This dissertation would have not been possible without his continuous support help guidance and endless patience It is a pleasure to give thanks to all my professors at Louisiana State University and especially to my committee members Dr Jimmie Lawson Dr Robert Lipton Dr Robert Perlis Dr Ambar Sengupta Dr Padmanabhan Sundar and Dr Jing Wang for all the help and support provided during my graduate studies I am grateful to the Department of Mathematics at Louisiana State University for the great moral professional and financial support I would also like to thank Dr Robert Perlis for the financial support through the Louisiana Education Quality Support Fund LEQSF 2005 which allowed me to enjoy valuable research visits to France and Denmark Special thanks go to Dr Gregoire Allaire L cole Polytechnique France Dr Antonine Chambolle L cole Polytechn
35. Fast Marching methods converges to the viscosity solution of 1 1 as the mesh size is going to zero i e Ar 0 and Ay 0 see Theorem 2 4 Now we need to prove that the solution computed by the parallel Fast Marching is the same as the solution computed by the sequential Fast Marching Theorem 3 11 Consider Sij 0 N j 0 M be the solution com puted using the sequential Fast Marching algorithm and P j 0 N j 0 M be the solution computed using the parallel Fast Marching algorithm Then jor alli and j Pij Sij Proof Similar to the sequential case we need to prove stability consistency and monotonicity of the numerical scheme Since both algorithms solve the problem based on the same scheme i e 2 12 and since the parallel Fast Marching is based on the sequential Fast Marching consistency and stability properties are inherited from the sequential algorithm The only thing that we need to prove is 63 monotonicity Precisely we need to show that we preserve the upwind nature of the numerical scheme in the parallel version of the algorithm This has already been proved in Theorem 3 10 Ol Convergence to the same solution for both sequential and parallel Fast Marching is illustrated in the Chapter 4 64 Chapter 4 Implementations and Numerical Experiments 4 1 Sequential Implementation Since the solution of the Fast Marching is constructed by stepping away from the boundary
36. Iasi Romania From 1998 to 2002 she worked as instructor at Gh Asachi Technical University of lasi Department of Automatic Control Romania In August 2002 she came to Louisiana State University to pursue graduate studies in mathemat ics She earned a Master of Science degree in mathematics from Louisiana State University in May 2004 She is currently a candidate for the degree of Doctor of Philosophy in mathematics which will be awarded in December 2008 120
37. J Appl Math Vol 54 1994 Nr 5 pp 1335 1354 M Falcone The Minimum Time Problem and Its Applications to Front Prop agation in A Visintin and G Buttazzo Motion by mean curvature and related topics De Gruyter Verlag Berlin 1994 pp 70 88 M Falcone R Ferretti Discrete time high order scheme for viscosity solutions of Hamilton Jacobi Bellman equations Numer Math 67 1994 pp 315 344 J L Hennessy D A Patterson Computer Architecture A Quantitative Ap proach 3rd edition Morgan Kaufmann 2002 pp 713 H Ishii Hamilton Jacobi Equations with Discontinuous Hamiltonians on Ar bitrary Open Subset Bulletin of Faculty of Science and Engineering Chuo Univ 28 1985 pp 33 77 111 14 15 16 17 18 19 20 21 22 23 24 25 26 21 28 29 W Gropp E Lusk A Skjellum Using MPI MIT Press Cambridge Mas sachusetts 1999 D Knuth The Art of Computer Programming Volume 3 Second Edition Addison Wesley 1998 pp 80105 R J Leveque Finite Volume Methods for Hyperbolic Problems Cambridge University Press 2002 P L Lions Generalized Solutions of Hamilton Jacobi Equations Pitman Lon don 1982 S Osher J A Sethian Fronts Propagating with Curvature Dependent Speed Algorithms Based on Hamilton Jacobi Formulations Journal of Computa tional Physics 79 1988 pp 12 49 S Balay W D Gropp L C McInnes B F Smith Modern
38. PGM FILE file PetscReal val int xs int xm int ys int ym void Scale_for_PGM PetscReal val int xs int xm int ys int ym PetscReal vmaxi void SaveFlag FILE file PetscReal F PetscReal val int xs int xm int ys int ym void SaveList FILE file PetscReal val PetscReal F ListNode head ListN ode tail int Write Geom Struct_ASCII int Bounds char F_Namel int Write_2D_Scal_Struct_ASCIH PetscReal v_in char F_Name int NbRec int Write 2D_Vect_Struct_ASCIT PetscReal v_in char F_Namel int NbRec int WriteCase char fname char geoname char resname char vector int j Algorithm 28 String Conversion Module Header include lt stdio h gt include lt stdlib h gt include lt stdarg h gt include lt string h gt include petsc h include petscsys h char itoa int value void conversion int NumProcs int MyRank char filename 128 char fileprefix 4 int i void conversion_pgm int NumProcs int MyRank char filename 128 char filepre fix 4 117 Appendix B PETSc Features At certain stages of the applications there is a need to work on a local portion of the vector including the ghost points This may be done by scattering a global vector into its local parts by using the two stage commands DAGlobalToLocalBegin DA da Vec g InsertMode iora Vec 1 DAGlobalToLocalEnd DA da Vec g InsertMode iora Vec 1 which allow the overlap of communication and
39. PetscReal UpdateGhostPtsV1 GeomStruct geom Vec L Vec L_F ListNode head ListNode tail PetscReal gv int a int b int c int d PetscReal UpdateGhostPtsV2 GeomStruct geom Vec L Vec L_F ListNode head ListNode tail PetscReal gv int a int b int c int d void UpdateGhostPtsFlg GeomStruct geom Vec L Vec G_F Vec L_F ListNode head ListNode tail PetscReal UpdatedPtsMaxV GeomStruct geom Vec L Vec L_F PetscReal max int a int b int c PetscReal UpdatedPtsMaxH GeomStruct geom Vec L Vec L_F PetscReal max int a int b int c PetscReal UpdatedPtsMinV GeomStruct geom Vec L Vec L_F PetscReal min int a int b int c PetscReal UpdatedPtsMinH GeomStruct geom Vec L Vec L_F PetscReal min int a int b int c void UpdatedPtsFlg GeomStruct geom Vec L Vec L_F ListNode head ListN ode tail PetscReal min PetscReal max int a int b int c int d 116 Algorithm 27 Display Module Header include lt stdio h gt include petsc h include TestAlg_Functions1 h Vec CreateNaturalOrdering Vec G GeomStruct geom void OutputPGM FILE file Vec G GeomStruct geom void OutputPGM_f FILE file Vec G Vec G_F GeomStruct geom PetscReal Gradient_error FILE file Vec G GeomStruct geom void DisplayMat PetscReal val int xs int xm int ys int ym PetscReal F void SaveMat FILE fname PetscReal val int xs int xm int ys int ym PetscReal F void Save
40. RE 3 2 Ghost points for a particular process In the following sections we address the above issues and the way that we chose to implement the parallel Fast Marching algorithm in more details 3 2 Notations We partition our initial domain Q in N sub domains In order to keep the no tation simple we commit the following abuse of notation by 2 we denote both 51 the domain and the discretized domain i e 0 1 x 0 1 and 0 1 N x A MY We use k 1 lt k lt N to denote the sub domain index and by 4 we refer to the sub domain k Definition 3 1 For a sub domain Qy we denote its boundary by 00 The sub domain Ny together with its ghost points form the so called extended sub domain denoted by Qe Any point in the domain is denoted by X where 0 lt i lt N andO lt j lt M are the row and column indexes When decomposing the domain in sub domains it is necessary to distinguish between neighbors in the sub domain and neighbors that are ghost points Definition 3 2 Consider a sub domain Q and a node Xij Qg we define e the set of neighbors of Xij in y as DAs Y Aisa gs Pi gs Aegan Keg NO e the set of ghost neighbors of X as G Xi 5 Y Xij Xizi Ago Xi Definition 3 3 The ghost zone of a sub domain contains all the ghost nodes of that sub domain and it is denoted G Q k 1 N where N is the number of sub domains Definition 3 4 For any node Xi of the grid
41. Software Tools in Scientic Computing Birkhauser Press 1997 pp 163 202 P V Roy S Haridi Concepts Techniques and Models of Computer Program ming MIT Press 2004 pg 1 111 345 405 707 749 E Rouy A Tourin A Viscosity Solutions Approach to Shape from Shading SIAM J Numer Analysis 29 1992 pp 867 884 J A Sethian A Fast Marching Level Set Method for Monotonically Advancing Fronts Proc Natl Acad Sci Vol 93 1996 pp 1591 1595 J A Sethian Level Set Methods and Fast Marching Methods Evolving In terfaces in Computational Geometry Fluid Mechanics Computer Vision and Material Science Cambridge University Press Cambridge UK 1998 J A Sethian Fast Marching Methods SIAM Review Vol 41 No 2 1999 pp 199 235 M Snir S Otto S H Lederman D Walker J Dongarra MPI The Complete Reference Second Edition MIT Press Cambridge UK 1998 J Van Trier and W W Symes Upwind Finite Difference Calculation of Travel Times Geophysics 56 1991 pp 812 821 J Tsitsiklis Efficient Algorithms for Globally Optimal Trajectories IEEE Trans on Automatic Control Vol 40 No 9 1995 pp 1528 1538 H K Zhao Fast Sweeping Method for Eikonal Equations Math Comp 74 2005 pp 603 627 H K Zhao Parallel implementations of the Fast Sweeping Method Journal of Computatonal Mathematics Vol 25 No 4 2007 pp 421 429 112 Appendix A Modules Hierarchy The hierarchy and dependences
42. TSc related topics in the following sections 4 4 Parallel Implementation In many applications we need both global and local representations of a vector The difference between global and local vectors is illustrated graphically in Figure 4 14 83 e in the global vector each processor stores a local set of vertices and each vertex is owned by exactly one processor see Figure 4 14 a e in the local vector each processor stores a local set of vertices as well as ghost nodes from neighboring processors Figure 4 14 b z O Global nodes O Local nodes a Global vector b Local vector FIGURE 4 14 Global and Local Vectors PETSc provides routines to help map indices from a local numbering scheme to the PETSc global numbering scheme We need this kind of mapping to read the initial mesh and write the final solution image In PETSc User Manual the orderings for a two dimensional distributed array divided among four processors is presented as in Figure 4 15 19 Distributed arrays DAs are intended for use with logically regular rectangular grids when communication of nonlocal data is needed before certain local computa tions can occur 19 The PETSc DA object manages the parallel communication required while working with data stored in regular arrays The actual data is stored in appropriately sized vector objects the DA object only contains the par allel data layout information and communication information howe
43. and ghost entries of this array as all other entries are undefined Another important feature of PETSc is the way that it allows us to perform operations with vector and distributed array For instance for stopping criteria implementation we check if there are any changes in the boundary of each sub domain by calling the PETSc function VecAXPY Vec source PetscScalar a Vec destination 118 which performs the mathematical operation destination destination a x source Then we compute the norm of the error at each edge by calling VecNorm Vec source NORM_INFINITY amp error_t We can also use the MPI functions to go from the local to global representation of a vector and vice versa For instance for each processor 7 we compute a total error which represents the maximum of the left right top and bottom errors and using the MPI_Allreduce function we can find the total error errori maxi maxi error_b error_t maxi error_l error_r MPI_Allreduce amp errori amp error 1 MPI_DOUBLE MPI_MAX MPI_COMM_WORLD In this way we use the MPI function to gather the global error on the master node 119 Vita Maria Cristina Tugurlan was born on August 1974 in Iasi Romania She fin ished her undergraduate studies in computer science at Gh Asachi Technical University of Iasi Romania June 1998 She earned a Master of Science degree in automatic control in July 2000 from Gh Asachi Technical University of
44. ap approach Fast Marching Algorithm execution time 10 FM time linear FM time approximation 10 F 3 10 3 3 N 10 2 3 LL 10 F E O N 107 3 3 10 A zal a Worst case Fast Marching Algorithm execution time Ordered Overlap approach 10 T FM time linear FM time approximation 10 3 10 3 o 3 O N L 1 10 3 00 10 3 10 10 10 10 N b Normal case FIGURE 4 21 Time complexity for 50 CPU s Ordered Overlap Strategy For the Fast Sweeping strategy we repeat the simulations on the same grid sizes for 50 CPU s and the results are presented in Figure 4 22 For the worst case 101 scenario see Figure 4 22 a the time complexity of the Fast Marching algorithm in the Fast Sweeping strategy is almost O N For the best case scenario is between O N and O N see Figure 4 22 b R Fast Sweeping approach Fast Marching Algorithm execution time 10 T FS time linear FS time approximation 10 F 3 O N 10 F J LL 2 A O N 10 q 10 10 10 10 N a Worst case Fast Marching Algorithm execution time Fast Sweeping approach 10 T FS time linear FS time approximation b Normal case FIGURE 4 22 Time Complexity for 50 CPU s Fast Sweeping Strategy 102 Comparing the Ordered Overlap strategy and Fast
45. ative maximum By choosing lt 0 one would obtain an approximation of u x x 1 which smooths the solution at relative minimum point As 0 us converges to the viscosity solution of 1 3 which is introduced in more details in the following section FIGURE 1 4 Solution of Equation 1 4 for e 1 5 1 10 1 20 1 40 1 100 1 1 2 The General Case Let Q be an open bounded domain in R T C 0Q and consider the general static Hamilton Jacobi equation H x Du 0 onQ u 0Q on C 00 where H is convex u and are continuous As Crandall Evans and Lions pointed out in 4 and 5 problem 1 5 does not have in general a C solution It admits generalized weak solutions Hence we can approximate the problem by adding a viscosity term to 1 5 H x Du EAu 0 on Ue onl C al for e gt 0 Problem 1 6 admits a smooth solution us 4 5 8 We want to prove that as e 0 the solution u x of 1 6 converges locally uniformly to a weak solution u of 1 5 In order to do that we follow Evans s arguments presented in 8 This method is known as the method of vanishing viscosity When e 0 we can find a family of functions u which is uniformly e gt 0 bounded and equicontinuous on a compact subset of Q C R Applying the Arzela Ascoli compactness criterion there exists a subsequence we ue such that ue u locally uniformly in Q C R
46. band We use T 3 to update the flags of the whole sub domain i e for all 56 Kij E Qg n 4 n l if Ty 2 Xiz lt T a then X A Qg ai nad ifT Xij T then X NB O n 3 3 iT Xi gt T then X FA O 1 n t3 1 In Figure 3 4 b T X for the left sub domain and T Y for the right sub domain The red region represents the accepted nodes the yellow region rep resents the narrow band and the white one is for the far away nodes As presented in Algorithm 4 the ghost point synchronization procedure is the part where we can apply different strategies in order to restart the Fast Marching algorithm The purpose of such strategies is to assure the monotonicity and up wind nature of the scheme even when we have inter domain communications The difference between strategies is in the way the boundary is computed using the ghost points We will describe these strategies in detail in Chapter 4 3 4 Convergence First let us remark that Remark 3 8 For any iteration n the following hold true e for any two nodes Xi NB Q and X A Q we have TAG gt lA fOr US lt M and0 lt c j lt N e if Xij NB Q with TF X m Y then at the next itera tion Xi NB Q and Xij A Q it will be the next node labeled as accepted In the ghost zone synchronization procedure we reconstruct the narrow band of each sub domain as a function of minimum
47. bors An accepted node labels an adjacent node by changing its status to narrow band setting its flag to one Fl 1 An ac cepted node does not change the status of another accepted node HEAD previous value T 0 0 E position 0 0 next flag FI 0 0 aed previous position i 1 j value T i 1 j flag Fl i 1 j next Ey j previous position i j value T 1 5 next Le Solving the Local Eikonal Equation A previous position 1 j value T 1 next flag Fl i j previous FIGURE 4 1 Double chained list structure position N M value T N M flag FI N M TAIL The way to solve the local Eikonal equation was presented during the proof of Theorem 2 13 66 Algorithm 6 Solution of the quadratic equation Quadratic a b Input a and b real numbers Output d the solution of the quadratic eqaution Calculate A 4 x 2 a b if A gt 0 then d a b V A 2 else d S end return d The implementation of the first order adjacent difference scheme for the Eikonal equation is presented in Algorithm 7 and uses Algorithm 6 to solve the quadratic equation Algorithm 7 Solution of the local Eikonal equation at X Distance i j Thijn Input i j Tli j Output distance T if i row and j col then dl 1 min min T 1 j 1 Tl2 5 1
48. ce it is the easiest one to implement 1 2 3 Numerical Methods There have been lots of trials to solve the Eikonal equation directly starting from upwinding schemes 26 Jacobi iterations 21 semi Lagrangian schemes 11 fast marching type methods 18 24 fast sweeping methods 28 and many others meth ods In the following subsections we briefly present the Rouy Tourin and Fast Sweeping algorithms 1 2 3 1 Rouy Tourin Algorithm An iterative algorithm for computing the solution of Eikonal equation was in troduced by Rouy and Tourin in 21 The idea of this algorithm is to solve the quadratic equation 1 22 at each point of the grid and iterate until convergence 23 In Figure 1 7 we present the stencil structure for Rouy Tourin algorithm and the algorithm is presented in Algorithm 1 Tij t Taig Tij Tij FIGURE 1 7 Grid points for Rouy Tourin algorithm Algorithm 1 The Rouy Tourin algorithm Consider S gt diam Q Initialization fee 0 on 00 Ti S otherwise Iteration repeat for i 0 to n do for j 0 to n do compute R solution of local Eikonal equation at X j end end Tij min T R Deel until IT Tl lt e For any node i j in the domain we should solve the equation max max D FT 0 min D T 0 1 1 24 max max D T 0 min D PT 0 i pe in order to compute its T j Let us consider the case presented in the previous section in Figure 1 5 a whe
49. condition in a downwind direction we need to preserve the upwind nature of the numerical scheme In order to arrange the nodes in increasing order we use a double chained sorted list We consider the boundary conditions as the starting values for our list and basically update the list one entry at a time Each iteration we re compute T at some of the nodes and insert those T at the right position in the sorted list Since the list is double chained we can traverse the list forward or backward and insert T at the right place The structure of the double chained list is presented in Figure 4 1 The list stores the following information in each element pointers to the previous graph ically represented by the red link and next graphically represented by the blue link elements of the list position in the grid structure i e row 2 column j value of T and flag F l at each grid node to indicate its status The first element in the list is called head and the last element of the list is called tail Initial Tagging During the initialization phase of the algorithm all nodes that approximate our initial curve are tagged as accepted with zero distance T 0 and zero flag Fl 0 All other nodes are tagged as far away nodes with T S where S is a very large number S 5 N M and flag two Fli 2 65 The first accepted nodes represent the starting points in our algorithm we use them to label all the adjacent neigh
50. d before node Z Let A D X A A Q and since A was the first accepted neighbor of X Ty Xi gt Tr A When Z becomes accepted Tj X needs to be recomputed 61 Up to a rotation in the domain we can assume without loss of generality that node Z Qg or that Z Oe 2 1 2 2 Let node Z D X and Z Qy see Figure 3 8 Ok Ox Qe FIGURE 3 8 Node Z is in the same sub domain as X j Since A A Q during n th iteration when Z becomes accepted X NB Q with T X gt TR Z gt Tr 4 All other neighbors of X are narrow band or far away nodes Solving the local Eikonal equation at X in 4 we do not have ghost point influence and applying Proposition 3 9 we get that EOL Tr Xij Thus relation Ti 7 lt 71 holds true Let node Z G X and Z 4 see Figure 3 9 If A A O then X becomes a narrow band node and T A lt T Xi When Z becomes accepted Ty 2 lt TF X and computing TO on the extended domain 2 we have ghost node influence Therefore applying nae Hit Proposition 3 9 we get that T Z lt Tr Xi lt TX 62 Ox 1 Ox Qu FIGURE 3 9 Node Z is a ghost node of X j Case 3 More than two neighbors of X were accepted before Z was accepted The proof for this case follows the arguments of the previous cases Ol In Chapter 2 we proved that the numerical solution built by the sequential
51. e 6 Fmin Ag lt V2 1 2 13 Ly Then we have the upwind condition satisfied A Tia STiy lt Ty at 2 14 F5 J Remark 2 14 The above theorem shows that the value of the smallest node in the narrow band can be computed exactly at the next iteration An approximate value is considered to be exact within the consistency error of the scheme if at the next iterations of the algorithm we cannot obtain a lower value 6 Remark 2 15 We can make the assumptions that Tj415 lt Ti 15 and Tij 1 lt Tij 1 without loss of generality up to two mirror symmetries of the domain Proof Theorem 2 13 To compute T at X we distinguish the following cases 22 23 6 Case 1 one of the X neighbors is accepted Case 2 two of the X neighbors are accepted Case 3 three of the X neighbors are accepted Case 4 all the X neighbors are accepted 42 Case 1 Suppose that only one of X s neighbors is accepted Up to a rotation we can assume without loss of generality that this node is X _ All the other possible situations will be treated in a similar way This case is graphically illustrated in Figure 2 5 Ki nb or far acc Xi nb or far Xi 1 nb or far Xi FIGURE 2 5 Node X j has only one accepted neighbor Note also that Tij lt Tizi Ti lt Tijai and Tij lt Ti41 since these three neighbors are either far away or narrow band nodes and X is the narrow ba
52. e curve passes through itself developing the so called swal lowtail Analyzing the swallowtail from the geometric point at view at a time t we have a multivalued solution Since the solution should consist of only the set of all points located a distance t from the initial curve 22 we need to remove the tail from swallowtail In 23 Sethian described this situation as if the front is viewed as a burning flame then once a particle burnt it stayed burnt Therefore from the family of solutions we pick the physically reasonable solution with the shape presented in Figure 1 2 b Equation 1 1 is a particular form of first order Hamilton Jacobi equation The Dirichlet problem for a static Hamilton Jacobi equation can be written in the form H x Du 0 on R x 0 00 19 u o on R x t 0 where the Hamiltonian H H z Du is a continuous real valued function on R x R 8 pg 539 and y R R is a given initial function In general this equation does not have classical solutions i e solutions which are C The problem does have generalized solutions which are continuous and satisfy the partial differential equation almost everywhere Using the notion of viscosity solution introduced by Crandall and Lions 4 5 for first order problems one can choose the correct physical solution from the multitude of solutions 1 1 Viscosity Solutions 1
53. e execution is stopped and all the information is saved in specific files To read the initial image and write the final computations in a image format we developed modules that perform the reading writing in pgm format and also generate the files for visualization with EnSight For a global view on the modules dependency and more implementation details check Appendix A 4 5 Parallel Numerical Experiments All the simulations for the distributed memory implementations are performed on clusters e SCHUR 64 CPUs Dual 3 06 GHz Pentium IV Xeon Processors Intel R C Compiler for 32 bit applications Version 10 1 Gb Interconnect e LONI machines mostly on ERIC 128 nodes each node has two 2 33 GHz Dual Core Intel Xeon 64 bit Processors 4 GB RAM per node 4 77 TFlops Peak Performance 10 Gb sec Infniband network interface Red Hat Enter prise Linux 4 The implementations are MPI MPICH2 based and PETSc based PETSc version 2 3 3 The visualizations and post processing are realized using EnSight Visual ization Tool 96 To make the parallel algorithms efficient we mainly focus on minimizing the neces sary time to reach the solution for our numeric implementation Besides the algo rithm s specific factors the upwind nature of the numerical scheme synchroniza tion of the ghost nodes this minimization also depends on additional factors such as processor memory communication and workload balancing and input output time
54. e local Eikonal equation 1 22 at X j end end until all nodes are accepted In Figure 2 3 the Fast Marching algorithm is graphically illustrated In order to have a better understanding of the algorithm we consider the case of an isolated accepted node in the domain and we replace the network grid representation with the table format The algorithm steps are e start from the accepted node X shown as a red cell in Figure 2 3 a 33 a Start with an accepted point b Update neighbors values c Choose the smallest value i e A d Freeze value of A update its neighbors e Choose the smallest value i e D Freeze value of D update its neighbors FIGURE 2 3 Update procedure for Fast Marching Method 34 e march downwind from X solve the local Eikonal equation 1 22 at all neighbors Y V X T Y for all Y V X represented by yellow orange cells in Figure 2 3 b are narrow band points e consider the neighbor A of X with the smallest T and tag it as accepted see Figure 2 3 c and 2 3 d Due to the upwinding property of the difference operator used in equation 1 22 T is now correct up to the discretization error e tag the neighbors of A as narrow band points yellow orange cells in Figure 2 3 d e choose the narrow band point with the smallest T i e D see Figure 2 3 e and 2 3 f e repeat the algorithm until all the nodes become accepted Remark 2 3 T
55. e most used are those that give us the number of processors and the rank of each processor see Example 14 Algorithm 14 MPI functions int size rank MPI_Init amp argc amp argv MPI_Comm size MPI_COMM_WORLD amp size MPI_Comm rank MPLCOMM_WORLD amp rank MPI Finalize When programming in MPI we should be careful with the communications pattern to avoid getting into a deadlock A deadlock is a situation where the dependencies between processors are cyclic a processor cannot proceed because it is waiting on another processor which is itself waiting on the first processor Since MPI does not have timeouts this job is killed only when our time in the queue runs out For example let us consider two processors that are trying to communicate through MPI as shown in Figure 4 13 reproduced from 20 81 machine A machine B network send data receive data FIGURE 4 13 MPI communication between 2 processors We can illustrate the deadlock by considering the pseudo code example Algo rithm 15 Algorithm 15 Deadlock example if rank 0 then Recv recvbuf 1 Send sendbuf 1 else rank Recv recvbuf 0 Send sendbuf 0 For instance deadlocks happen when each processor gets stuck during its receive operation while waiting for the corresponding send operation on the other processor to complete One possible solution to this problem would be to match each send operation to a corr
56. ed list for the narrow band which would require much more communication time 4 4 1 Ordered Overlap Strategy The key observation for the implementation of this strategy is that after the local Fast Marching the algorithm satisfies the upwinding property on the whole domain Taking advantage of this observation we decided to process each edge after the other in order to simplify our implementation e go through the ghost nodes in increasing order of T and recompute T at each boundary node in Q solve the local Eikonal equations e reconstruct the sets of accepted narrow band and far away points Let us consider the case presented in Figure 4 17 where we have a top row ghost zone and focus on T 3 Xij Up to a symmetry or rotation in the sub domain all the other cases can be treated in the same way There is no need to solve the local Eikonal equations in the cases when internal sub domain nodes are involved because the solutions of the local Eikonal equations involving internal nodes are the same as the ones computed at the end of local Fast Marching algorithm In order to avoid these unnecessary re computations we can use a special distance function called distanceNB to solve the local Eikonal equations only based on the ghost nodes and the boundary of the sub domain ignoring the internal nodes For example in Figure 4 17 for node X we consider only the local Eikonal equations in first quadrant and second quadrant ba
57. en O N and O N Thus our theoretical estimation is confirmed experimentally 4 2 2 Approximation Error Based on the grid sizes listed above we run some numerical experiments trying to approximate the solution of the modeled problem for which we know the exact 74 Sequential Fast Marching Algorithm execution time gt gt r r SEES Tee H FM time linear FM time approximation FM time FIGURE 4 6 Sequential Fast Marching Algorithm Complexity solution Let T denote the exact solution and T the solution computed by the Fast Marching Method We can express the error on the field as Eso Ax max T Tal Eras Ax Ti Tig 4 1 oJ ij and the gradient error as Goode max VTij 1 Grae Ae NIVE 1 4 2 UN In Figure 4 7 we present the V7 1 The graph presents the gradient error versus the leading dimension N of the grid size We can see that the gradient error is in O N73 75 L error FIGURE 4 7 L error 4 2 3 Numerical Results Consider an image with the size 100 x 100 and run the Fast Marching algorithm for one processor Figure 4 8 and Figure 4 9 represent the numerical solutions of the Eikonal equation with initial conditions imposed at one node corner and at two opposite node corners In Figure 4 8 a and Figure 4 9 a we can follow the evolution of the level lines from the starting node s toward the boundary of the domain For
58. end 1 The algorithm is equivalent to Fast Marching on a N x2 domain where N is the number of sources In this setting the ghost nodes represent the accepted nodes and the sub domain boundary represents the narrow band nodes 2 In the light of all the above observations by applying the algorithm edge by edge we do not need to reconstruct narrow band for the whole boundary or order the sources of the whole boundary four edges The sources order is in herited from the previous steps of the algorithm Thus we save computational time Basically T 3 Xij is a function of Thra Xi 1 Ta Xi41 and T G X Using the upwind condition translated in the order of sources we need to compute T X 4 such that Ta Xe 5 solves the local Eikonal equations in the N x 2 domain This way we have the upwind condition satisfied inside the sub domains and also between sub domains Since we apply the algorithm edge by edge on each edge the type of overlap row or column and the order of sources are the things that make the implementation different Therefore we create a function for vertical processing neighbor sub domains located at the left or right of the sub domain and a function for horizontal processing neighbor sub domains located at the top or bottom of the sub domain In Algorithm 19 we present the steps for a row boundary c If at the end of the Ordered Overlap algorithm during reconstruction procedure 90 we do not
59. equence built by the Fast Marching Methods converges to the solution of discrete Eikonal equation Chapter 3 deals with the parallel Fast Marching Methods First of all I focus on formulating the problem in terms of parallel programing I define the necessary notions such as sub domains neighbors ghost nodes ghost zones After that I present the convergence results for the parallel Fast Marching algorithm Chapter 4 focuses on the aspect of implementation and on numerical experi ments The chapter starts with a brief presentation of the sequential implemen tation and elaborates the main problems through numerical experiments Then it gives a short background on parallel computing before focusing on the actual parallel implementation of the algorithm I bring to one s attention parallel pro gramming concepts such as parallel architectures programming models and the issues that differentiate it from sequential programming I also focus on the ob stacles encountered during parallel implementation and how to get over them I illustrate the strengths of the parallel approach with a series of benchmarks which include the study of convergence error estimates In addition I show the algorithm is monotone and stable Chapter 1 Background Let us consider a boundary a closed curve in two dimensions or a surface in three dimensions separating two regions Let this curve or surface evolve with a known normal velocity F as shown in Figu
60. esponding receive operation before the other send operation starts as shown in Algorithm 16 Algorithm 16 Deadlock solution version 1 if rank 0 then Send sendbuf 1 Recv recvbuf 1 else Recv recvbuf 0 Send sendbuf 0 Another solution consists in using non blocking versions of the common send and receive operations These versions are known in the MPI core as Send and 82 IRecv The Send Receive pair does not block in this case but we must check for completion of communications before using the buffers as shown in Algorithm 17 Algorithm 17 Deadlock solution version 2 if rank 0 then ISend sendbuf 1 IRecv recvbuf 1 else IRecv recvbuf 0 ISend sendbuf 0 Wait all In combination with MPI to create the parallel implementation of the Fast Marching algorithm we use PETSC PETSc Portable Extensible Toolkit for Scientific Computation provides sets of tools for parallel as well as serial nu merical solutions of partial differential equations that require solving large scale sparse nonlinear systems of equations PETSc provides many of the mechanisms needed within parallel application codes such as simple parallel matrix and vector assembly routines that allow the overlap of communication and computation In addition PETSc includes support for parallel distributed arrays useful for finite difference methods 19 We discuss the parallel distributed arrays and all other PE
61. exactly from u n U_n 2 can be computed exactly from u n 1 u_ can be computed exactly from u_2 ug can be computed exactly from u_1 This is an upwind scheme we compute derivatives using points upwind or to ward the boundary condition i e each ordinary differential equation is solved 18 away from the boundary condition Modeling numerically the Eikonal equation we consider that the information is propagating like waves with certain speeds along the gradient directions The upwind discretization methods compute the values of the variables using the direction s from which the information should be coming More precisely the discretization of the partial differential equations uses a finite differences stencil biased on the direction determined by the sign of the gradient 16 The upwind scheme uses backward differences scheme if the velocity is in the positive x di rection and forward differences scheme for negative velocities Thus in a one dimensional domain for any point i we have only two direction left i 1 and right i 1 If the velocity is positive the left side of the axis is called upwind side and the right side is called downwind side and vice versa for negative ve locities Let u be the computed solution at the point 7 at iteration n We have the following schemes 1 Forward scheme u u ArDur 2 Backward scheme u u AxD ur Similarly to the one dimensional case usin
62. g forward backward or centered Taylor series expansions in x and y for the value u around the point x y with xi yj iAx jAy and uij u x y we can define four differentiation operators for the two dimensional case DiFultow HERE D ne 1 20 DY J _ ij Ui j 1 pt O 2 ig UX Y Ay ij Ul Ti Yj hay where 19 e D computes the new value at i j using information at and 1 thus information for the solution propagates from right to left e D computes the new value at 7 7 using information at i and 1 thus information for the solution propagates from left to right e D Y computes the new value at i j using information at j and j 1 thus information for the solution propagates from top to bottom e DY computes the new value at i j using information at j and j 1 thus information for the solution propagates from bottom to top For two dimensional domains the upwind method uses the gradient direction in order to select which differentiation operator to use In Figure 1 5 we illustrate only two possible situations all the other cases being similar up to a rotation in the system of coordinates In case 1 5 a we use the backward differences scheme in x and y and define the third quadrant as the upwind side In case 1 5 b we use backward differences in x and forward differences in y and the upwind side is quadrant two ijt ijt Du i 1 j i 1
63. hbors e right and bottom neighbors e right and top neighbors Thus for any point X in a domain we compute the solution as being the mini mum between the local discrete Eikonal equations solutions based on its neighbors Xi tj Xijti Repeatedly we sweep the whole domain along the diagonal and Ja anti diagonal from top to bottom and from bottom to top This sweeping idea is illustrated by solving the Eikonal equation in 1 1 dT q SL TED 7 0 1 25 From this point on by a sweep we mean a computation in a certain direction and by an iteration we mean a completed set of sweeps i e four direction sweeps 27 in 2D problems Let T T x denote the values for the grid points 1 o lt 41 lt lt Zn 1 We solve 1 25 in different directions using the discretized nonlinear system based on Godunov scheme 1 21 maz DST D T 0 1 T T 0 1 26 a b c FIGURE 1 9 Sweeping in 2 directions First we have a sweep from left to right enforcing the boundary condition at x 0 This is equivalent to following the directions emanating from x as shown in Figure 1 9 a We obtain the solution T 2 Secondly we sweep from right to left which means following the directions ema nating from p as shown in Figure 1 9 b We obtain the solution T 1 x Since in 1 D there are only two directions of sweeping i e left to right and right to left two sweeps a
64. he greatest barriers to getting good performances All of these are aspects that I have encountered while preparing my thesis In order to set this work into perspective let me describe first the front prop agation problem that I want to solve employing parallel computing It is known that interfaces propagation occur in a lot of settings including ocean waves mate rial boundaries optimal path planning construction of geodesic path on surfaces iso intensity contours in images computer vision and many more I consider the case of a curve propagating in a domain with a normal velocity F gt 0 the tangent direction of the movement being ignored I want to track the motion of this in terface as it evolves To describe interface motion I formulate the boundary value problem and the partial differential equation with initial value known as Eikonal equation FigiVria 4 0 T x 0 zer where Q is a domain in R P is the initial position of a curve evolving with normal velocity F V denotes the gradient and is the Euclidean norm The solution T of the Eikonal equation can be view as the time of first arrival of the curve I or as the distance function to curve I if F 1 All the mathematical details on how to find the solution to Eikonal equation are presented in Chapter 1 The primary goal of my thesis is to solve the Eikonal equation employing par allelized computational methods and algorithms To solve
65. he previous sections we saw that Fast Marching Methods are inherently se quential and hence not straightforward to parallelize and implement on today s supercomputers Parallelizing Fast Marching Methods is a step forward for em ploying the Level Set Methods on parallel supercomputers 3 1 General Idea The idea behind the distributed implementation of the Fast Marching Methods is to divide the whole computational grid between processors giving each processor access to only its own sub domain and use message passing strategy to commu nicate between different processors more details about parallel computing can be found in section 4 3 This approach has its own drawbacks e when the grid is simply split between processors in some cases there is no longer enough data to update all the points in each sub domain e we cannot preserve the upwind nature of the numerical scheme for the whole domain just in one pass e communication through message passing is really expensive since it requires a lot of time for data transmissions The efficiency of the parallel algorithms depends on the required amount of com munication between sub domains i e how often boundary nodes change status and our ability to preserve the upwind structure during the algorithm execution 49 All of the above are main obstacles for good scalability To overcome these problems ghost zones with ghost points are used Ghost zones are additional dupl
66. icated grid points which are added to each processor and they contain necessary data to advance to the next iteration Hence we expand each sub domain by adding 1 layer of ghost nodes All ghost nodes of a sub domain form the ghost zone or overlap of that sub domain Figure 3 1 illustrates the particular decomposition of a two dimensional 10 x 10 grid into four sub domains CPU 2 CPU 3 CPU 0 CPU 1 I Inter subdomains communication FIGURE 3 1 Domain decomposition in sub domains the dark blue cells are the ghost cells the dashed line delimits the ghost zone for the top right sub domain 50 The dashed line shows how the ghost zone is formed for the top right sub domain In the same manner we formed the ghost zones for the other sub domains The second part of the picture presents the sub domain decomposition and indi cates in dark blue the ghost zone for each sub domain After decomposition each sub domain has dimension 6 x 6 because there are an extra row and column for the ghost points to make sub domains interaction possible In two dimension a sub domain can have at most four neighbor sub domains and also at most four ghost regions depending on the position on the global grid Fig ure 3 2 reproduced from PETSc User Manual 19 illustrates the ghost points for process six of a two dimensional regular parallel grid Each box represents a process the ghost points for the sixth process are shown in gray FIGU
67. ique France Dr Martin P Bendsge Technical University of Denmark Lyngby and Dr Mathias Stolpe Technical University of Denmark Lyngby for their hospitality valuable support and interesting discussions Finally I would like to express my deepest gratitude to my family for their support encouragement and understanding over all these years Also I want to thank Laurentiu for his patience love and support 11 Table of Contents Acknowledgments cirio A ii List of PISULES rats ls E A A A vii List of Alcorisa rr ii a vende ix Astratt sosuicarois rar AR A A nce x Introd ctiom isidro ni een ee Dicota ees 1 Chapter 1 Background 5 e4isesmbs s etes ers rd va 5 1 1 Viscosity Solutions Lis pb Gerace ideas EA Ss 7 1 141 Motivation 2 424 a ee eee ee ae a Ye a 7 1 12 The General Case 9 1 1 3 Application to the Eikonal Equation 15 1 2 Numerical Approximations Er 1 2 1 Motivatie 4 4 218 o as ca e 17 1 2 2 Approximations of the Eikonal Equation 21 1 2 3 Numerical Methods 2 936 406 a esas a 23 Chapter 2 Sequential Fast Marching Methods 30 2 1 General ldea ee 30 2 2 Fast Marching Algorithm Description 32 2 3 Convergence seis oa EA 39 2 0 Proof of Lenma 25e oa india ass sdes see 36 20 2 Proof of Lemma 2 6 oa 3 Lido se dd as Dites cute 41 Chapter 3 Parallel Fast Marching Methods
68. ires cooperative operations to be performed by each process For example a send operation must have a matching receive operation From the programming perspective message passing implementations commonly comprise a library of subroutines embedded in source code The programmer is responsible for determining all parallelism Message Passing Interface MPI was created in 1994 from the necessity of a standard interface for message passing implementa tions The MPI is an industry standard developed by a consortium of corporations government labs and universities MPI is a message passing library a collection of routines for facilitating communication exchange of data and synchronization of tasks among the processors in a distributed memory parallel program 14 Every MPI program must contain the preprocessor directive include mpi h the initialization and termination of the MPI execution environment We present all of this on a simple C example Algorithm 13 80 Algorithm 13 Basic MPI example include mpi h include lt stdio h gt int main int argc char argv MPI_Init amp arge amp argv printf Hello World MPI Finalize return 0 MPI programs are made up of communicating processes Each process has its own address space containing its own attributes such as rank and size argc argv and so on The default communicator is MPILCOMM_WORLD MPI provides functions to interact with it and th
69. ll such solutions aa 7 T Xi min int Tg h Ox If TX lt TP Xap for Xij 00 and Xap G X z see red region part 1 n 3 of Figure 3 5 then T X is influenced more by the accepted neighbors from its sub domain than by its ghost neighbors min min T TEX TA Qk Then for any such X OQ if we do not have ghost point influence we get n 3 n Th Kia Te Xiz If TEX gt T Xap for Xij OQ and Xap G X izj then the computation nti of T 3 Xij is influenced mostly by its ghost nodes see blue region part 2 of Figure 3 5 Solving the local Eikonal equation at X in De we get that TAX min min Ta Xa min Ti Xi Ok Qk Therefore we can decrease T X where X OQ only if we have ghost points influence O Theorem 3 10 Reconstruction of the A NB FA sets Consider Q and Q defined as before Assume that Xij Qk Xij NB O Z on and Z V X During the n th iteration of local Fast Marching algorithm node Z becomes accepted Then we have T X gt T Z 59 Proof To prove the theorem we use induction on the number of iterations of the algorithm At the first step the theorem holds true by construction initialization At the n th step of the algorithm the remarks from 3 8 hold true Since we have an ordering relation between boundary nodes we can apply Proposition 3 9 to complete the proof When node
70. m is presented in Algorithm 20 Algorithm 20 Fast Sweeping Boundary Update FastSweepingUpdated geom n c T Fl ghost head tail Input geom n T Fl ghost head tail Output recomputed NB for i 0ton do k i FastSweep geom n m c T Fl ghost head tail end for n to 0 do 1 i FastSweep geom n m c T Fl ghost head tail sol i min k i 1 i Thil e solfi prt FindPointer i c head tail UpdateList prt T head tail end return T 4 4 3 Flags Reconstruction In order to restart the Fast Marching algorithm we need to reconstruct the NB A and FA sets over the whole sub domains Thus after applying one of the strategies for computing T at the boundary based on the ghost points we need to e save the minimum maximum of T over the boundaries all four edges of the sub domain The algorithm for this part is the one that finds the minimum and maximum in a vector If the vector is empty no narrow band nodes in that sub domain then the default value for the maximum is max 1 0 and for the minimum is min S e use the minimum maximum of T over the boundaries to update the flags in the whole sub domain as described in Algorithm 21 92 Algorithm 21 Flag update on the sub domains UpdatedPtsFlag n m min max T FI Input n m min max T Fl Output reconstructed flag list Fl if min S amp amp maz 1 then for i 0 ton do for j 0 to m do if Flfi fj 1 then
71. mesh sizes go to zero 40 2 3 2 Proof of Lemma 2 6 To prove convergence Lemma 2 6 we need to demonstrate that the field computed by the Fast Marching Methods solves the discrete Eikonal equation Since T is built by marching forward from the smallest value to largest we need to show that whenever a narrow band node X is converted into an accepted one none of its neighbors has a T less than T X Thus the solution of the local Eikonal equation at node X can be considered as exact and there is no need to go back and readjust previously set values This way the upwind nature of the algorithm will be satisfied Consider a two dimensional grid like in Figure 2 4 For simplicity we assume that in our domain N M and Az Ay Xij acc Xij Xi j 1 Xi 1 J FIGURE 2 4 Matrix of neighboring nodes of X j Theorem 2 13 Assume that the nodes in the domain Q are partitioned into ac cepted narrow band and far away nodes By construction we have that for any two nodes Y and Z if Y has become accepted before Z then T Y lt T Z Assume that X 1 is the narrow band point with the smallest T The algorithm will label X 1 as accepted and start to compute the neighboring nodes that are Al not accepted Furthermore assume that e Fe Lip R A L R with Lipschitz constant Ly e F x gt 0 forxe Q and Fin min F z gt 0 e the Courant Friedrichs Lewy like CFL condition holds tru
72. n the following number of iterations e in the worst case scenario the number of iterations is less or equal to n m 1 CPUs Decomposition Iterations e in the normal case scenario for perfect square number of processors n m the number of iterations is close to n 1 in some cases the stopping criteria will bring us to n iterations 99 4 5 1 Weak Scalability In practice weak scaling is really what we care the most because the very reason we scale up the cluster size is to solve bigger problems higher resolution models more time steps We study weak scalability we analyze our two strategies Ordered Overlap and Fast Sweeping in terms of time complexity L error and Fast Marching execution time with respect to number of CPU s We choose to study the worst and normal case scenarios for 50 CPU s for the following grid sizes 30x30 60x60 100x100 120x120 150x150 164x164 180x180 200x200 240x240 256x256 300x300 600x600 Fast Marching algorithm time complexity For the Ordered Overlap strategy the numerical results show that the time complexity is between O N and O N where N is the number of points in leading dimension i e domain is N x N In Figure 4 21 we plot in red on a logarithmic scale the execution time of Fast Marching algorithm as a function of N as resulted through simulations The green line represents a linear least square fitting for our numerical data 100 Ordered Overl
73. nd node with smallest T Therefore we have that Dlg 0 Dis 2 0 DARDO DUO and equation 1 22 becomes max D Ty 0 max 0 0 1 7 1 OG T UJ Using the definition 1 20 of the finite difference operator D T we obtain Tig Tyja 1 2 15 2 15 43 and Ag Tij Tija E Since we assumed that X 1 is the only accepted neighbor of X j we have that Ax Tis gt Liga Therefore D Lia F ij A In the case of only one accepted node we proved that T 1 lt Ti Tij 1 a i j Case 2 Suppose that two neighbors of X are accepted From hypothesis we know that node X 1 is one of the accepted neighbors We have three ways of positioning the other accepted node at the top at the bottom or at the right of X Let us study the case of top node and the case of right node 2 1 Up to a rotation in the domain we can assume without loss of generality that the accepted nodes are Xj41 and X j 1 with Tiy1 lt Tj 1 In this case the other two neighbors of X are narrow band or far away nodes as presented in Figure 2 6 We should remark that Tij lt Ti j41 Taj lt Ti 1 since X is the narrow band node with the smallest T Xi 1 j nb or far Xi FIGURE 2 6 Node X j has the neighbors X 1 and X 41 accepted 44 In this case the finite difference operators are De 20 DRE Dig U DES 0 and equation 1 22 becomes
74. nd of local Fast Marching algorithm in sub domain Qg and T Xa be the T at node X in sub domain Qg obtained after the ghost points synchronization Consider a domain with the starting nodes in opposite corners These nodes are initialized as accepted nodes at the beginning of the Fast Marching algorithm Let us consider the particular decomposition of the domain in two sub domains each including one of the accepted nodes as illustrated in Figure 3 4 In the following 99 for this particular case we describe how the synchronization procedure works step by step for each sub domain min ghost value X I V4 min N JI a After Local FM A Kept A Kept A T lt X b After Ghost Update FIGURE 3 4 Ghost points update Step 1 for all 4 and all X 00 let gets be the solution of the local Eikonal equation at X in If Tip lt Tr X we say that X is a node with ghost point influence in Qy colored in blue in Figure 3 4 a If TA Ty Xi we say that X is an accepted node and its T will not change in Qy colored in red in Figure 3 4 a Step 2 for all Q and for any node X OQ we do a comparison so that we can reassign the flags and reconstruct the accepted and narrow band sets 1 n 3 if Ti Xiz lt IFA then tag Xij NB 0 otherwise tag X A Q n 3 n wis Step 3 let a e min TY be the minimum T on the narrow k ml
75. ng has its own advantages e we can solve large scale problems since we can overcome memory constraints single computers have finite memory resources e we can save time executing multiple things at the same time e we can save money by using available computing resources on a network or over Internet instead of buying the machines Nevertheless some disadvantages are also worth mentioning e parallel computing programs are more difficult to write than sequential codes e communication and synchronization between the sub problems is typically one of the greatest barriers to getting good performance From the architecture point of view the parallel computers are classified as shared memory distributed memory and hybrid distributed shared memory parallel com puters In shared memory architecture multiple processors can operate independently but share the same memory resources as shown in Figure 4 10 picture reproduced from 20 If one of the processors changes a location in memory the change is visible to all the other processors In distributed memory architecture each processor has its own memory To access the memory of another processor a communication network is required as presented in Figure 4 11 reproduced from 20 78 FIGURE 4 10 Shared memory architecture network FIGURE 4 11 Distributed memory architecture The hybrid distributed share memory architecture is a combination of the previously prese
76. nt gt previous node To organize the list elements in increasing order of T we have to use a sorting algorithm To insert new nodes into the list an insertion sort algorithm could be used To perform only the reordering of the elements in the list a comparison sort algorithm could be employed Our particular sorting algorithm is called UpdateList and combines both of them Since our list is double chained the algorithm has a forward and a backward sorting procedure In Figure 4 5 we illustrate on a numerical example how the sorting procedure works We respect the color code rail Algorithm 12 UpdateList UpdateList node T head tail Input node T head tail Output updated list of T values v T node row node col crt node if crt previous head then first element of the list Uprey big number else Uprev T node previous row node previous col if crt nert tail then last element of the list Unext big number else Uneat T node next row node nezt col if Uprev lt V amp US Vnert then do nothing else if Uprev gt U amp Unext lt v then List messed up else if Uprev gt v then Going backward crt node previous repeat if crt previous previous NULL then RemoveNode node InsertBefore node crt else crt crt previous until v lt T crt row crt col if node head then RemoveNode node InsertBefore node crt else Remo
77. nted approaches and is illustrated in Figure 4 12 reproduced from 20 Each processor has its own local memory and there is also a memory space shared by all processors Accesses to local memory are typically faster than accesses to the shared memory 12 network FIGURE 4 12 Hybrid shared distributed memory architecture Parallel programming models were created to program parallel computers They represent an abstraction of real life hardware and memory architectures They can generally be divided into shared memory threads message passing 79 data parallel hybrid Models are machine architecture independent each of them can be implemented on any hardware given appropriate operating system support In the following paragraphs we focus on two of the models shared memory and message passing The shared memory programming model inherited the properties of shared memory architecture where tasks share a common address space which they read and write asynchronously From the programmer standpoint in this model it is not necessary to take care of the communication between tasks but it is hard to understand and manage data locality 20 At the language level we find shared variables semaphores for synchronization mutual exclusion and monitors locks The message passing model is defined as a set of processes having only lo cal memory Processes exchange data by sending and receiving messages Data transfer usually requ
78. nts to be sorted 4 2 Sequential Numerical Experiments We want to study the behavior of the Fast Marching algorithm on the following test grid sizes 6x6 10x10 30x30 60x60 100x100 120x120 150x150 164x164 73 180x180 200x200 240x240 256x256 300x300 600x600 with only one starting node in one of the corners 4 2 1 Complexity of the Algorithm Let N be the leading dimension of our grid size i e for a grid of 300x300 N is 300 An efficient version of the Fast Marching depends on how quickly we manage to find the node with the smallest T in the narrow band The main loop of the Fast Marching algorithm is characterized by e O N complexity for computing T at all the grid points e complexity in between O N log N and O N for the sorting algorithm due to the growth of narrow band size with O N complexity Therefore the Fast Marching algorithm has a computational cost between O N and O N or actually O N log N and O N This is comparable with the complexity achieved by Sethian in 23 using a heap priority queue based on a tree To verify this assumption let us analyze the time in the Fast Marching algo rithm in terms of the image size and graph this dependency In Figure 4 6 the computing time in the Fast Marching algorithm versus the leading dimension of the grid size is drawn in red while the green line represents a linear least square fitting It is obvious that the approximation line is in betwe
79. o do this is to consider only numerical schemes which satisfy the upwind condition These schemes ensure that out of all possible T such that F VT 1 almost everywhere only the correct physical solution from the family of solutions of 1 22 is picked Since T solves equation 1 22 T cannot be locally convex In Figure 1 6 we consider a stencil with Ar 1 and F x 1 1 lt i lt n and show how it selects only concave solutions a gt i 1 i i 1 1 1 1 i 1 1 i i a b c d FIGURE 1 6 Discretization for the 1 D case Case a If Tij Tia 1 and Tias Tij 1 then Diz T and DFT 1 Equation 1 22 becomes max mex D T 0 mim D F 0 max 1 0 S41 ij ij 22 Case b If Tay Tij 1 and Tij Tii 1 then D T 1 and DGT j Equation 1 22 becomes max max D T 0 min D T 0 max 0 me Case c If Tij Ti 1 and Tij Tipiy 1 then D T 1 and DET Equation 1 22 becomes max max D T 0 min D T 0 max 1 1 Case d If Ti_1 5 T j lamda nD e Te Equation 1 22 becomes max max D T 0 min D 7T 0 max 0 0 0 Thus this case is not feasible and it is clear that the scheme selects only the concave solution Remark 1 21 At this point we need to choose which discretization scheme to use in the following steps of the numerical implementation We settled for Rouy Tourin discretization scheme 1 22 sin
80. ocity F gt 0 We interpret the solution T x y of the Eikonal equation as the time needed by the curve T to evolve and reach the point x y for the first time We consider the t level set of T x y as the zero level set of the viscosity solution of the Eikonal equation at time t loft z y T z y t Remark 1 18 When F 1 T is also interpreted as the distance to To The gradient flow of T gives the trajectories of each point Property 1 19 Optimality Property Let curve IT propagate in the domain Q with normal velocity F gt 0 and T x y be the viscosity solution of 1 19 Let Do t 2 y T a y t be the zero level set of the solution T The trajectory of a point x y of the front coincides with the trajectory starting at to yo To and reaching x y in time t 16 The way of interpreting T in the light of the last property is very important for the construction of all numerical schemes Since T is the viscosity solution the trajectory of a point x y is the physically correct trajectory from all the possible ones starting from x y 1 2 Numerical Approximations In this section we present a numerical scheme to compute the viscosity solution of the Eikonal equation 1 1 under the assumption that F x gt 0 on Q It is shown in 26 that the first arrival travel time field is a viscosity solution of the Eikonal equation Numerical methods utilizing upwind finite differences schemes manage
81. olution has been found In this respect we consider our solution approximation to be accurate when it changes within some imposed tolerance after two consecutive iterations i e error mathopmax Tj Tole We define tolerance e as a function of number of rows and columns i e N and M more precisely a function of discretization step h If error gt e then we keep iterating trying to decrease the error otherwise we stop the algorithm and save and display the results Practically we observe that even when T at all the nodes is computed the al gorithm is still iterating trying to minimize as much as possible the global error This means that only the condition for the error being less than a certain tolerance is not enough Let us apply the algorithm for 9 CPU s on a 6x6 image with two starting points on the top edge corners and analyze what happens to the error after a certain iteration The algorithm is reaching convergence after 4 iterations but the iterative process continue 3 more iterations until the error is zero As Figure 4 18 shows after the fourth iteration the error starts oscillating and in fact the information starts propagating from the boundaries back into the domain We call this process the back propagation of the error For a better understanding of the error back propagation let us consider a bigger image see Figure 4 19 In Figure 4 19 a we can see how the level line are inter secting In Figure 4 19 b rema
82. omputation based on ghost points 91 Fast Sweeping Boundary Update 92 Flag update on the sub domains 93 Maim Function ss sa dB wt 2 ane ADA RR G 114 Structure Module Header 4 4 424 44444 eu aus naaa 114 List module header og Li sas ad Se Se ee EA 115 vill 25 26 21 28 Flag Module Header 4 2244 44484 bee dee a ES 115 Algorithms Module Header a aa a 116 Display Module Meader ositos okis 117 String Conversion Module Header 117 1x Abstract Fast Marching represents a very efficient technique for solving front propagation problems which can be formulated as partial differential equations with Dirichlet boundary conditions called Eikonal equation F x VT z 1 rE T x 0 zer where Q is a domain in R P is the initial position of a curve evolving with normal velocity F gt 0 Fast Marching Methods are a necessary step in Level Set Methods which are widely used today in scientific computing The classical Fast Marching Methods based on finite differences are typically sequential Parallelizing Fast Marching Methods is a step forward for employing the Level Set Methods on supercomputers The efficiency of the parallel Fast Marching implementation depends on the re quired amount of communication between sub domains and on algorithm ability to preserve the upwind structure of the numerical scheme during execu
83. on u lim Ue Where us C Q is the classical solution of the perturbed problem 1 6 if us exists and con verges locally uniformly to some continuous function u If is bounded and uniformly continuous then u is the unique viscosity solution of 1 5 One can extend the notion of viscosity solution of 1 5 for discontinu ous using the lower semi continuous 1 s c and the upper semi continuous u s c envelopes of the solution 2 Definition 1 3 For any x Q define the upper semi continuous envelope of u as u x lim sup u y y gt z and the lower semi continuous envelope of u as u x lim inf u y 12 Therefore we can rewrite the definition of the viscosity solution Definition 1 4 version II A locally bounded function u is a viscosity sub solution of the Hamilton Jacobi equation 1 5 if Vv CR at each maximum point xo of U V we have max H xp Du xo U D lt 0 and a viscosity super solution of the Hamilton Jacobi equation 1 5 if Vv CAR at each minimum point o of u uv we have max H xo Du xo u 0 Remark 1 5 Then u is a viscosity solution of 1 5 if u is both a sub solution and super solution Remark 1 6 Definition 1 4 is an extension of Crandall Lions or Ishii defini tion of viscosity solution for continuous Hamiltonians see 2 pg 558 577 5 13 Definition 1 1 is a particular case of Definition 1 4 for con
84. optimization problem he presented a first order accurate Dijkstra like algorithm in 27 Later Sethian and Osher 18 22 23 24 developed the idea and produced the Fast Marching Methods For solving equation 1 1 assume that n 2 and F a gt 0 All the results ob tained in this case can be generalized to the 3 dimensional case The central idea behind the Fast Marching Methods is to systematically construct the solution of 1 1 outward from the smallest values of T to its largest ones step ping away from the boundary condition in a downwind direction The algorithm is initialized by tagging the points of the domain as e far away nodes T 00 T has never been calculated e accepted nodes T j 0 is given e narrow band nodes these are the neighbors of the accepted points As it can be seen in Figure 2 2 the yellow orange region representing the narrow band points separates the accepted nodes from the far away nodes The algorithm is mimicking the front evolution step by step 31 accepted values upwind side narrow band values downwind side far away values FIGURE 2 2 Far away narrow band and accepted nodes e sweep the front ahead by considering the narrow band points e march this narrow band forward freezing the values of existing points and bringing new points into the narrow band The key is in selecting which grid point in the narrow band to update The algo rithm will stop when
85. oximation for our data set with a green line Analyzing the graphs we can see how the execution time of Fast Marching algorithm decreases when we increase the number of CPUs In oder words if for instance we double the number of processors we notice that the execution time is reduced to almost half of the initial one This brings us to an almost linear scalability To compare the Ordered Overlap approach with Fast Sweeping approach we compute the slope of the linear least square approximation as shown in Figure 4 28 We conclude that the Fast Sweeping approach is faster than the Ordered Overlap approach During the numerical experiments for small number of processors we cannot say clearly which algorithm is faster There are some factors such as processors decomposition number of CPU s perfect square or not type of the image number of starting points that will influence the output of the algorithm 105 L error Ordered Overlap approach 10 q gt lt A 4 r 10 Slope 1 2661 ISS E O 10 10 1 10 10 10 N a Worst case L error Ordered Overlap approach ho Sig O Slope 1 0094 10 i 10 10 10 N b Normal case FIGURE 4 24 L Error Ordered Overlap Strategy 106 L error Fast Sweeping approach 102 DRE 10 10 Le 9 O 10 lope 0 41863 107 102 10 10 10 N a Worst case L error Fast Sweeping approach 10 A
86. rd and backward connections between current node and its preceding node and replace them by a forward connection between previous and next element of current node the dashed blue line in Figure 4 2 a Secondly we do the same thing for the forward and backward connections between current node and its succeeding node and replace them by a backward connection the dashed red line in Figure 4 2 a between the following and the preceding nodes of the current node The resulted list is one node shorter and has the backward and forward pointers correctly connected see Figure 4 2 b 68 a Delete current node from list current b List after deletion FIGURE 4 2 Deletion from the double chained list Algorithm 9 Delete current node from the list DeleteList node Input node Output delete node X j from the list current gt previous gt next current gt next current gt next gt previous current gt previous The insertion operation can be performed in two ways e insert after the current node described graphically in Figure 4 3 and in pseudo code in Algorithm 10 e insert before the the current node described graphically in Figure 4 4 and in pseudo code in Algorithm 11 Since both procedures are similarly performed we only explain how the insert after current position works First we cut off the backward and forward links be tween the current node and its succeeder see Figure 4 3 a Secondly as p
87. re the third quadrant is the upwind side of our domain and show how the Rouy Tourin algorithm satisfies the upwind ing All the other possible situations can be reduced 24 to this case based on rotation and symmetry of the domain By evaluating Tiras Ti i j 1 1 TT J Dij Tj Xz 0 D Tij AT gt 0 y ij i j 1 TY i j 1 ij D Cj Ny gt 0 D Cj Ny le and plugging them in equation 1 24 we obtain max Dr o max DT oy 1 0 y oj 7 Er ij which is the quadratic equation that we need to solve 2 i dag Ti Ay Ax aj During the algorithm execution this computation is done many times for each node in the domain until convergence is achieved Basically the value at the grid point T can be computed exactly using only T at its neighbors i e Tj41 and T 41 We consider T to be computed exactly if and only if the new T is smaller JA than the old one Remark 1 22 Since during the algorithm execution at each iteration we recom pute T at every point of the domain we have a lot of useless computations T is changing only at a few grid points The algorithm is not taking this shortcut into consideration To illustrate the Rouy Tourin algorithm we consider a rectangular domain with the boundary condition imposed in two opposite corners In the initialization step if X ET a point where we can approximate the initial curve then 7
88. re 1 1 The function F may depend on factors such as curvature normal direction shape and position of the front or shape independent properties 23 Assume that the curve moves outwards from the domain i e F gt 0 and the tangential direction of the motions are ignored In all that follow we will consider that F depends on the position of the front and not on curvature Q x y FIGURE 1 1 Curve propagation with speed F in normal direction In order to characterize the position of the expanding front we can compute its first arrival time T as it crosses each point z y in the domain Therefore we define T x y inf Ty x y Eli gt Formally in one dimension we have distance rate x time so we can write the equation for the arrival function T 1 F dx In higher dimensions the time T of first arrival is the solution of a boundary value problem known as the Eikonal equation FaIVT amp 1 en 1 1 T x 0 zer where Q is a domain in R PD is the initial position of a curve evolving with normal velocity F V denotes the gradient and is the Euclidean norm We want to show how the time of first arrival is different from the curve evolution For illustration purposes consider the propagation of a curve with normal velocity et F 1 as presented in Figure 1 2 a Swallowtail b Physically Correct Evolution FIGURE 1 2 Curve propagating with F 1 In Figure 1 2 a th
89. re enough to compute the solution correctly Thus T at a grid point X can be computed exactly from either its left or right neighbor T X min T 1 T 1 h Using the Godunov discretization scheme 1 26 we obtain the continuous viscosity solution drawn in Figure 1 9 c 28 n Notice that for i lt gt T is correctly computed based on its left neighbors dur ing the first sweep while for i gt T is correctly computed based on its right neighbors during the second sweep The Fast Sweeping algorithm is illustrated in Algorithm 2 Remark 1 23 The first sweep satisfies the upwind ing condition for 0 lt lt NI z n and the second sweep for gt LAN The most important point in the algorithm is that the upwind difference scheme used in the discretization enforces that the solution at a grid point be determined by its neighboring values that are smaller 28 Algorithm 2 Fast Sweeping Algorithm Consider S gt diam Q Initialization T 0 on I T S otherwise Iteration for i 1 to n 2 do Compute R solution of local Eikonal equation end for i n 1 to n 2 do Compute P solution of local Eikonal equation end T min P Ri T T 29 Chapter 2 Sequential Fast Marching Methods 2 1 General Idea The Fast Marching Method is very closely related to Dijkstra s algorithm a well known algorithm from the 1950 s for computing the shortest path on a network Dijkstra s algo
90. resented in Figure 4 3 b we follow the steps 69 a Insert after current node in list current b Resulted list FIGURE 4 3 Insertion after the current node in a double chained list Algorithm 10 Insert node X after current node InsertAfter node current Input node current Output insert node X j after current node gt previous current node gt next current gt next current gt next gt previous node current gt next node 1 using a backward connection drawn in red we link our new node to the current one 2 using a forward connection blue color we link our new node to the node succeeding the current node At this step we reconstructed half of the links and our new node is connected backward and forward in the list 3 using a backward connection red color we link the the node succeeding the current node to our new node 4 using a forward connection blue color we link the current to our new node 70 The resulted list is one element longer and it has all the links correctly established as shown in Figure 4 3 b node b Resulted list FIGURE 4 4 Insertion before the current node in a double chained list Algorithm 11 Insert node X before current node InsertBefore node current Input node current Output insert node X before current node gt previous current gt previous current gt previous gt next node node gt next current curre
91. rithm is widespread from Internet routing applications to navigation system applications To explain the connection consider a network with a cost assigned to each node as in Figure 2 1 1 2 1 1 1 2 1 3 1 4 1 4 1 5 FINISH 1 2 1 1 1 2 1 3 1 4 1 4 1 5 FINISH eee Here Serrer Sereer START START a b 1 1 1 3 1 3 1 1 1 1 1 2 1 2 FIGURE 2 1 Dijkstra algorithm for finding the shortest path from Start to Finish The basic idea of Dijkstra s method 7 is as follows 1 Put the starting point in a set called Accepted 2 Call the grid points which are one link away from the Start Neighbors 3 Compute the cost of reaching each of these Neighbors 30 4 The smallest cost of these Neighbors must be the correct cost Remove it from Neighbors call it Accepted and return to step 2 until all points are labeled Accepted The algorithm orders the way in which the points are accepted from the known costs the starting point all the way to the finish The method is a one pass method each point being touched essentially only once Note that it is not guar anteed that the method converges to the optimal solution The Fast Marching Methods use upwind difference operators to approximate the gradient but retains the Dijkstra idea of a one pass algorithm Tsitsiklis was the first to develop a Dijkstra like method for solving the Eikonal equation Addressing a trajectory
92. rk the formation of the computational oscillations 94 Error versus number of iterations 1000 r r Error versus number of iterations 0 04 r r r 900 0 035 800 0 03 700 sol 0 025 9 S E 500 0 02 aor 0 015 300 0 01 200 oo 0 005 L L i L L L L 1 L L 0 2 4 6 7 45 5 5 65 Iter Iter a Error b Error Zoom FIGURE 4 18 Back propagation of the error at the intersection of the level lines and look how they are influencing the solution shape N b Le LA a Above view b Below view FIGURE 4 19 Solution s shape for multiple processors case To avoid the process of error back propagation we use the information covered in the gradient of error function for stopping the algorithm and define the residual error as Slvr 1 i j We observe that after convergence the residual error increases as the error starts back propagating In conclusion convergence is decided based on the norm of the change in the solution between successive iterations being less than some tolerance or on the decrease of the residual norm i e lt OF Tn41 lt Tn Once the error is less than 1 all the boundaries of all sub domains are computed we start monitoring the residual error and try to avoid its increase If this is true we stop the algorithm and consider convergence reached When the parallel algorithm reaches convergence th
93. s Communication between processors and memories takes time and influences the design of the algorithm we are interested in an algorithm that minimizes the ratio of communication time to computation time for a given computer architec ture To illustrate the efficiency of the parallel Fast Marching algorithm in the distributed memory implementations we should analyze e the strong scalability when the problem size is fixed and number of proces sors expands and our goal is to minimize the time to solution e the weak scalability when the problem size and the number of processors expand and our goal is to achieve constant time to solution for larger prob lems We say that an algorithm is scalable if its performance improves after adding hardware CPU s proportionally to the capacity added For this we need to define the test cases that we use e worst case scenario there is only one starting node in the whole domain see Figure 4 20 a In this case all the sub domain computations depend on the ones from the starting sub domain e best case scenario there is a starting node in each sub domain i e number of starting nodes is almost equal to the number of sub domains see Figure 97 4 20 b In this case each sub domain influences and it is influenced by its neighbor sub domain computations e normal case scenario there are more than one starting nodes in the whole domain but zero or one starting node
94. s per sub domain For illustration pur poses we consider the case with two starting nodes in opposite corners of the domain see Figure 4 20 c All the sub domain computations depend on the two starting sub domains a Worst case b Best case c Normal case FIGURE 4 20 Test cases For the weak scalability we choose to study the worst and normal case scenarios for the following grid sizes 30 x 30 60x60 100x100 120x120 150x150 164x164 180x180 200x200 240x240 256x256 300x300 600x600 For the strong scalability we consider the best case scenario for an image size of 256 x 256 The number of iterations on the parallel algorithm depends on the partition of CPU s along each direction We come up with an estimate for the number of iterations 98 Conjecture 4 3 Let us consider that the initial domain is decomposed in n x m sub domains CPU s The number of iterations iter is iter lt n m Remark 4 4 The number of iterations does not depend on the image size but it depends on the processor decomposition More precisely the number of iterations is n m 1 where n and m represent the number of sub domains in each direction the allows us to take into account the stopping criteria Of course the number of iterations can be estimated more precisely if we have a perfect square number of processors and if we know the type of image For example considering the grid sizes presented above through simulations we obtai
95. sed on Xit1 j Xij 1 graphically drawn as the two up right triangles Another important observation is that the Fast Marching preserves the upwind structure at all its nodes Since the ghost point update procedure is performed at 88 FIGURE 4 17 Possible situation in ordered overlap strategy the end of local Fast Marching the ghost nodes of each sub domain already satisfy the upwind condition and we can optimize the algorithm using the order of the ghost nodes At this point we need to distinguish between the ghost points with or without influence on the neighbors The ghost nodes at which the gradient of T is pointing toward the sub domain are called sources More precisely Definition 4 1 For 00 Xi O and Xij a G Xij if TX De T Xij41 then X 41 is called a source node Let N be the number of sources and let X be a source node with l D 0 No such that for any b lt l we have T Xani lt T Xij Since our algorithm works edge by edge using only the source nodes we can reduce the number of local Eikonal equations to solve The optimized version of the ordered overlap algorithm is presented in Algorithm 18 Remark 4 2 For this strategy we remark that 89 Algorithm 18 Ordered Overlap Algorithm begin find the source nodes order sources in increasing order of T in sources s order solve the local Eikonal equations at boundary node in Qk reconstruct NB A FA sets
96. t Sweeping Approach FIGURE 4 28 Best case scenario Fast Marching execution time with respect to the number of CPU s 110 References 1 2 10 11 12 13 M Bardi I Capuzzo Dolcetta Optimal Control and Viscosity Solutions of Hamilton Jacobi Bellman Equations Birkhauser 1997 G Barles B Perthame Discontinuous Solutions of Deterministic Optimal Stopping Time Problems Mathematical Modeling and Numerical Analysis Vol 21 Nr 4 1987 pp 557 579 G Barles P E Souganidis Convergence of approximation schemes for fully nonlinear second order equations Asymptotic Anal 4 1991 pp 271 283 M G Crandall L C Evans and P L Lions Some Properties of Viscosity So lutions of Hamilton Jacobi Equations Trans Amer Math Soc 282 1984 pp 487 502 M G Crandall and P L Lions Viscosity Solutions of Hamilton Jacobi Equa tions Trans Amer Math Soc 277 1983 pp 1 42 E Cristiani M Falcone Fast Semi Lagrangian Schemes for the Eikonal Equa tion and Applications SIAM J Numer Analysis Vol 45 Nr 5 2007 pp 1979 2011 E W Dijkstra A Note on Two Problems in Connection with Graphs Nu merische Mathematic 1 1959 pp 269 271 L C Evans Partial Differential Equations Graduate Studies in Mathematics Volume 19 Amer Math Soc 1998 M Falcone T Giorgi P Loreti Level Sets of Viscosity Solution Some Ap plications to Front and Rendez Vous Problems SIAM
97. the number of CPU s is a perfect square i e 4 9 16 and the image is a 256 x 256 with one and two starting nodes in the corners Analyzing Figure 4 26 and Figure 4 27 we observe that the execution time of Fast Marching algorithm decreases with the number of proces 103 sors doubling the number of processors the time is reduced almost to half Ordered Overlap and Fast Sweeping approach execution time 10 FS time linear FS time approximation FM ordered time 10 linear FM ordered time approximation J 10 3 3 o O N 5 10 Fy J LL 107 a O N 10 3 10 10 10 10 N a Worst case Ordered overlap and Fast Sweeping approach execution time 10 FM ordered time linear FM ordered time approximation FS time 10 linear FS time approximation 10 10 Le 107 10 10 i AA 10 10 10 b Normal case FIGURE 4 23 Time Complexity for 50 CPU s Ordered Overlap and Fast Sweeping 104 4 5 2 Strong Scalability Consider the best case scenario presented in Figure 4 20 b for a grid of size 256x256 We expend the number of processors and analyze the execution time of the Fast Marching algorithm We run the same simulation for our both strate gies and present the results in Figure 4 28 In both cases we represent graphically the linear least square appr
98. tinuous Hamiltoni ans Viscosity solutions have the following properties consistency existence unique ness and stability One can show that the viscosity solution satisfies the partial differential equation 1 5 whenever it is differentiable that it exists and further more it is unique and stable Theorem 1 7 Consistency of a viscosity solution Let u be a viscosity solution of problem 1 5 and suppose u is differentiable at some point xy Q C R Then H xo Du xo 0 The proof of this theorem can be found in 8 section 10 1 2 page 545 Proposition 1 8 Ifu C Q solves 1 5 and if u is bounded and uniformly continuous then u is a viscosity solution 13 Proof If v is smooth and u v has a local maximum at zo then Du xo Dv xo This implies that H xo Du xo H xo Du xo 0 since u solves 1 5 Similar equality holds for u v having a local minimum at zo Ol Theorem 1 9 Existence and uniqueness Let Q be a bounded open subset of R H conver on Q H gt 0 also let F Q R such that F gt 0 is continuous on 2 Then there exists a unique viscosity solution u of the problem F z H x Vu 0 zea 1 16 wiz 0 ET COL Crandall and Lions proved this theorem in 5 pages 24 25 and pointed out assumptions that can cause the uniqueness of the viscosity solution to fail Remark 1 10 fthe function F vanishes at at least a single point in 2 then
99. tion To ad dress these problems I develop several parallel strategies which allow fast conver gence The strengths of these approaches are illustrated on a series of benchmarks which include the study of the convergence the error estimates and the proof of the monotonicity and stability of the algorithms Introduction Scientific computing allows scientists and engineers to gain understanding of real life problems in diverse areas such as cosmology climate control computational fluid dynamics health care design and manufacturing The scientists and engi neers develop computer programs that model the behavior of the studied system or phenomenon Running these programs with multiple and various sets of input parameters helps them better comprehend past behavioral patterns and possibly predict future actions Typically these models require massive amounts of cal culations and therefore executed on supercomputers or distributed computing platforms Nowadays parallel computing is considered a standard tool for scientific comput ing It is formally viewed as the simultaneous use of multiple processors to execute a program Parallel programming is more intricate than its sequential counterpart and demands extra care For instance concurrency between tasks introduces sev eral new classes of potential software bugs and requires revisiting many classical algorithms Communication and synchronization between processors is typically one of t
100. to produce viscosity solutions of the Eikonal equation If we consider the numerical methods to approximate the partial differential equation then we dis tinguish the following methods the algorithm introduced by Rouy and Tourin 21 the Fast Sweeping algorithm 28 the Fast Marching Methods proposed by Osher and Sethian 18 They all compute an approximation of the same solution 1 2 1 Motivation Let Q 0 1 and consider the one dimensional Eikonal equation given by yu F x u 1 u 1 0 Given the speed function F x gt 0 we want to construct u x away from the boundary and we observe that the solution is not unique for instance if v x solves the problem then so does v x We will deal only with nonnegative solutions of the function u Consider the following ordinary differential equations and try to solve each problem 17 separately In order to do the numerical approximation we partition the x axis into a collection of grid points x iAz and define u u iAx and F F iAx where Az is the discretization step and n n Using a Taylor expansion and neglecting the remainder we obtain the discrete system Un 0 Ui41 Ui ge ds Ui Uj 1 F i lt 0 Ag a Ut 0 Notice that Un 1 can be computed exactly from un Un 2 can be computed exactly from u _1 u can be computed exactly from u2 ug can be computed exactly from u U_n 1 can be computed
101. to the ellipticity condition for H for the Hamilton Jacobi equation H z Vu 0 in Q That is for all M N if for all x Q H x M lt H z N then M gt N This property assures some maximum principle type property for the scheme Definition 2 8 Stability The scheme S is stable if and only if Vp gt 0 3 a uniformly bounded solution u L Q such that 2 1 holds 2 3 The bound is independent of p 36 The scheme is stable in the sense that for all points of space discretization the solution exists and has a lower and upper bound Moreover it is independent of discretization step the information propagates from the smaller value of T to larger values Definition 2 9 Consistency The scheme S defined in 2 1 is consistent 1 e Vel ando E C Q S p y p E lim sup lt lim supH z D x 2 4 p 0 P yr your 20 and lim inp PHO 8 _ lim inf H x D z 2 5 p gt p gt r you 20 Definition 2 10 Strong Uniqueness Consider the solution u L N of the Hamilton Jacobi equation He Vu 0 in Q 2 6 If for any x O one has that u x limsupu y then u is the upper semi yn continuous envelope of the solution of equation 2 6 Similarly v x lim inf u y is the lower semi continuous envelope of the solution of equation 2 6 Lemma 2 11 Strong Uniqueness Property In the setting of the previous def inition if u and v are the upper semi continuous en
102. uce that Ax a S laa lt Tis S Ten F a 27 which implies that Ax PST F J 0 lt Ttua T 2 20 Therefore when we compute T at the current iteration from equation 2 18 the quantity under the square root becomes 2Ar 244 Ar Az F2 Tres gt qa pe pe 20 UJ ij ij Up to a rotation in the domain we can assume without loss of generality that the accepted nodes are X 1 and X j 1 with Ti 41 lt Tj 1 In this 46 case the other two neighbors of X are narrow band or far away nodes as presented in Figure 2 7 We should remark that Tij lt Tin Tij lt Ti 1j since X is the narrow band node with the smallest T Xiii nb or far acc XG acc Aija nb or far A FIGURE 2 7 Node X j has the neighbors X _1 and X 1 accepted For this situation the finite difference operators are DEN D T lt 0 DTi lt 0 DTI 2O and equation 1 22 becomes 1 max D T y D max 0 0 ij x 2 1 max D Tij D Ln an aj Using the finite difference operators D T and DD we get Tip Tp a L Pate F ij To Tija Al i AT FS a j Since Ti gt Ti 1 and Tij gt Tij this case is equivalent to solving the local Eikonal equation for T in first and in second quadrant and taking the 47 minimum of the solutions Ar Ag min T j 1 Fy Ti j 1 E fe J Repeating the first case steps
103. ume that zo is the unique maximum point of u in B xo r for some r gt 0 such that u xo xo inside the ball B xp r and gt 2sup u outside the ball B xo r In some neighborhood of zo with x x9 we have u x olx lt 0 xo zo in B xo r for somer gt 0 38 By Lemma A 3 pg 577 of Barles and Perthame in 2 there exist sequences y Q and pn R such that for n oo Yn Zo Pn 0 uP Yn U xo and 2 9 Yn is a global maximum point of u n p Hence for y Zo and all x B xo r we have UP Yn Yn lt 0 u 20 20 2 10 In 5 Crandall and Lions remarked that by replacing y with yn n where En 0 there exists a sequence still denoted y of local maximum points of uf converging to zo Thus relation 2 10 becomes UP Un Yn En for alla 2 Applying the monotonicity property 2 2 of scheme S to uf yn lt Yn one gets S pn Yn o AF En lt S Pn T oP By definition of the u S pn x ue 0 and so S Pn Un F En lt 0 2 11 Taking the limit of 2 11 and using the consistency property 2 5 of S one gets S Pns Yn O En Pn ipa S p y 0 im inf 0 p 0 lim inf H o D xo Y xO o IV lim inf IV IV Since xo 20 lim inf H o D b z0 lt 0 and is the sub solution of TO 2 6 O 39
104. veNode node InsertAfter node crt else Going forward Uprey lt v crt node next repeat if crt nert nert NULL then RemoveNode node InsertAfter node crt else L crt crt next until v lt Tlert row ert col if node head then RemoveNode node InsertAfter node crt else RemoveNode node InsertBefore node crt 72 defined in Chapter 2 in the introduction of Fast Marching algorithm red for accepted node orange for narrow band node and black for far away node A pseudo code version of the algorithm is presented in Algorithm 12 i ENERE Lise FIGURE 4 5 Sort algorithm Our target is to do a parallel implementation We know that by decomposing the domain in sub domains we do not deal with a huge amount of nodes per sub domain Therefore sorting algorithms will not be compelled to use large lists For our formulation the algorithm is easy to implement and stable Since our lists are not large our sorting algorithm is more efficient than most of the other simple O n algorithms such as bubble sort The number of comparisons that a comparison sort algorithm requires increases as a function of nlog n Thus the algorithm time is in between O nlog n and O 15 By n we denote the number of eleme
105. velope and the lower semi continuous envelope of the solution of 2 6 respectively then us 0 on Q 2 7 The following theorem is the Barles and Souganidis convergence result published in 3 pg 275 276 37 Theorem 2 12 Assume that 2 2 2 3 2 4 2 5 and the hypothesis of Lemma 2 11 hold Then as p 0 the solution uP of 2 1 converges locally uniformly to the unique continuous viscosity solution u of 2 6 Proof The proof follows the steps and arguments presented in 3 pg 275 276 2 pg 576 577 and 5 Consider u satisfying 2 1 and let u L Q be defined as u lim sup u y you eN 2 8 u lim inf u y p 0 Assume that u u are sub and super viscosity solutions of 2 6 Then based on their definitions u gt u Since is the upper semi continuous usc and u is the lower semi continuous Isc envelope of solution of 2 1 Lemma 2 11 implies that u lt u Hence we can write that u u Since the upper semi continuous solution u and lower semi continuous solution u are equal Lemma 2 11 assures that u is the unique continuous solution of Hamilton Jacobi equation 2 6 Using 2 8 one gets that uw converges locally uniformly to u Now let us prove that u u are the viscosity sub solution and super solution of 2 6 We only present the case of u being the sub solution of 2 6 Let xo be a local maximum of a on Q for some p C Q We ass
106. ver it may be 84 Processor 2 Processor 3 Processor 2 Processor 3 26 27 28 29 30 22 23 24 29 30 21 22 23 24 25 19 20 21 27 28 16 17 18 19 20 16 17 18 25 26 11 12 13 14 15 7 8 9 14 15 6 7 8 9 10 4 5 6 12 13 1 2 3 4 5 1 2 3 10 11 Processor 0 Processor 1 Processor 0 Processor 1 a Natural ordering b PETSC ordering FIGURE 4 15 Natural and PETSC ordering for a Distributed Array used to create vectors matrices with the proper layout Each DA object defines the layout of two vectors a distributed global vector and a local vector that includes room for the appropriate ghost points A distributed array communication data structure in two dimensions can be created using the command DACreate2d MPI Comm comm DAPeriodicType wrap DAStencilType st int M int N int m int n int dof int s int lx int ly DA da where M and N are the global numbers of grid points in each direction while m and n denote the process partition in each direction m x n being the number of processes specified in the MPI communicator comm 19 Instead of specifying the process layout one may use PETSC DECIDE for m and n Therefore PETSc will determine the partition using MPI The type of periodicity of the array is specified by wrap dof indicates the number of degrees of freedom at each array point and s is the stencil width i e the width of the ghost point region The optional arrays lx and ly may contain the number of
107. we conclude that Ax Ax Taas Dja s DaS Ilga AA Fij Fij Case 3 and 4 where three or all the X s neighbors are accepted are similar to the previous cases since the finite differences take the smallest values in each coordinate direction O Remark 2 16 We should remark that the proof of Theorem 2 13 provides a con structive solution for the local Eikonal equation at X In the implementation we notice that it is more efficient to solve the four variants of equation 2 15 and 2 16 and take the minimum of these real solutions Convergence of Fast Marching At each step of the algorithm the size of the accepted nodes set grows by 1 We can apply Theorem 2 13 iteratively using induction on the number of iterations of the algorithm During first iteration if node X 1 is accepted then its value is known and ini A tialized with T _1 0 and we have Ti lt 0 a i j Which satisfies relation 2 14 and therefore Theorem 2 13 holds true At the n th step of the algorithm the induction hypothesis implies that at each iteration the T s at the nodes in the narrow band are greater than the ones at the nodes labeled as accepted And thus Theorem 2 13 can be applied In conclusion the upwind nature of the numerical scheme is satisfied since T at the node labeled as accepted at every iteration is exact i e it cannot be improved on the same grid 22 48 Chapter 3 Parallel Fast Marching Methods In t

Fast Marching Methods - Parallel Implementation and Analysis

Contents

Download Pdf Manuals

Related Search

Related Contents