Home

A multiscale approach to optimal transport

1. Preprint 2011 J Delon J Salomon and A Sobolevski Fast Transport Optimization for Monge Costs on the Circle SLAM Journal on Applied Mathematics 70 2010 no 7 2239 2258 R Fletcher Practical methods of optimization John Wiley amp Sons 1987 Y Lipman and I Daubechies Conformal Wasserstein distances comparing sur faces in polynomial time Advances in Mathematics 2011 G Loeper and F Rapetti Numerical solution of the Monge Amp re equation by a Newton s algorithm Comptes Rendus Mathematique 340 2005 no 4 319 324 T Lachand Robert and E Oudet Minimizing within convex bodies using a con vez hull method SIAM Journal on Optimization 16 2006 no 2 368 379 R J McCann A Convexity Principle for Interacting Gases Advances in Math ematics 128 1997 no 1 153 179 P Mullen F De Goes M Desbrun D Cohen Steiner and P Alliez Signing the Unsigned Robust Surface Reconstruction from Raw Pointsets Computer Graphics Forum 29 2010 no 5 1733 1741 P Mullen F Memari De Goes and M Desbrun Hodge Optimized Triangula tions Proceedings of ACM SIGGRAPH 2011 2011 P Machado Manhaes de Castro J Tournois P Alliez and O Devillers Filtering relocations on a Delaunay triangulation Computer Graphics Forum vol 28 Wiley Online Library 2009 pp 1465 1474 G Monge M moire sur la th orie des d blais et de remblais Histoire de Acad mie Royale des Sciences de Paris avec les M moires de M
2. with equal to half the minimum distance between two points in S ensures that T converges to T on a set F with full Lebesgue measure in 10 Q MERIGOT Q Choose a point x in the intersection of the cell Vorg x and of F and consider the sequence qn T p This sequence converges to p and by definition one has n Pn Alp 5 le Pll lle pall wp w Pn Using the uniform convergence of n to one deduces that w qn converges to w p We now prove by contradiction that if pn converges to p then lim sup Wn pn is at most w p Suppose not taking subsequence if necessary the limit of wn pn is larger than w p by a positive 7 For every point x in Q we use the triangle inequality to get I 11 e Pall wa Pn lt dull walan Tn with rn n Pall 2D Pn qnl wWn Gn Wn Pn and D is the maximum distance between a point in 2 and a point in the ball B 0 L defined by the assumption iii Using the convergence of pn and qn to the same point p and the assumption on the limits of wn pn and wn qn we obtain limp o rn lt 7 Combining this with Eq 11 shows that the cell of pn in the power diagram Vors does not intersect Q for n large enough This contradicts the hypothesis that p is a Dirac with positive mass in the support of vn The proof that lim inf wn pn is larger than w p is very similar but a bit longer as it requires the use
3. It relies on the following result cf Vil09 Corollary 5 23 Fact Let vn be a sequence of measures converging to u and T resp T denotes the optimal optimal transport map between u and vn resp v Then for every positive 10 lim p Ac T Ta 0 n 00 where A T Tn x R4 T x T x gt e Proof of Proposition 2 For almost every x in Q the gradient V a T 2 is included in the support of v hence in the ball B 0 L by iii The same holds for T so that the inequality T T lt 2L holds for almost every x in Q For every p21 EE I 2 Ta x P ade z IT Ty p w de Q Ae T Tn i T x T 2 o e aa A T Tn lt e 2L Pu Ae T Ta Using Eq 10 we obtain the convergence of T to T in the L sense Thanks to the the assumptions i and ii we can apply the Poincar inequality on the domain Q u to the zero mean potentials and to get l n blir f l n leloa lt const Q p Ta Tllte u In other words n converges to in the L u sense Since the support of all target measures is contained in a ball of size L V n lt L and is L Lipschitz Hence gy also converges uniformly to on the support of p Proof of Theorem 3 We begin by proving that there for every point p in S there exists a sequence of points qn E Sn converging to p such that wn qn also converges to w p Applying Eq 10
4. construct an interpolation between o and p Given a time t consider the weight vector w tw and the corresponding Power diagram Vorg w Now we define the interpolant p at time t as the only piecewise constant function pp on Q obtained by spreading the mass of Ap on the inter section of the cell Vors w p with Q ie for every point x in Vorg w p define p x Ap area Vors w p An example of this interpolation is presented in Figure 4 A MULTISCALE APPROACH TO OPTIMAL TRANSPORT 15 FIGURE 4 First and second rows An interpolation between a pic ture of G Monge and photograph of B Riemann with N 625 and 15k respectively The intermediary steps are obtained using McCann s displacement interpolation McC97 of the two corre sponding measures which can be computed from the L optimal transport 6 DISCUSSION In this paper we have presented a simple way to increase the efficiency of the convex optimization algorithm introduced in AHA98 to solve the optimal trans port problem We also discussed how our multiscale approach can be used to obtain fast estimation of Wasserstein distances between images This first step suggests that in order to obtain faster computations of optimal transport one has to better understand the geometry of the function For in stance it is currently not possibl
5. convex optimization algo rithms colors are the same as in Fig 2 a b and c Estimation of Wasserstein distance in red resp blue the lower and upper bounds obtained by the multiscale resp original algorithm as a function of time and in green the correct value This minimization problem is equivalent to a weighted k means problem with k n 1 Since it is hopeless to solve this problem exactly we use the standard Lloyd s iterative algorithm to find a good local minimum We initialize the algorithm using a random sample S of n 1 points drawn independently from vg We then apply Lloyd s descent step to Sj to obtain SP stopping when the points do not move more than a given threshold between two successive steps This procedure provides us with the support S1 of our measure We define zz to be the application which maps a point pin S to its nearest neighbor in Se41 The values of Ap 2 1 pes are defined by Eq 5 Initial quantization of the target measure Often the target measure is not a discrete measure but a measure vz with density Q R such as a grayscale image In this case we apply Lloyd s algorithm to the measure p in order to obtain an initial quantization v gt Apo Of the original measure v with a prescribed number of points N pes 5 RESULTS We will use the following datasets in our experiments We denote by AU the uniform probability measure on the square AS 0 A
6. cost of transporting every bit of u onto v supposing that moving an infinitesimal amount du x from z to y is equal to a y du z Note that our definition above is not symmetric as it requires the source measure to have a density However this restriction can be leveraged using the notion of transport plan instead of transport map leading to a definition of Wasserstein distance between any pair of probability measures Vil09 Ch 6 2 2 Power diagrams Let S be a finite set of points in R and w S gt R be a given weight vector The power diagram or weighted Voronoi diagram of S w is a decomposition of the ambient space in a finite number of cells one for each point in S defined by the property that a point x belongs to Vorg p iff for every q in S one has z p w p lt x q w q Note that if the weights are all zero this coincides with the usual Voronoi diagram Given such a decomposition we will consider the application TY which maps every point x in the power cell Vorg p to its center p This map is well defined everywhere except on the boundary of the power cells but since this set has zero Lebesgue measure this has no consequence for us The pushforward of the source measure u by the map TY is a sum of Dirac masses centered at every point p in the set S whose mass is the y mass of the corresponding power cell T3 y u gt w Vors p dp pes In AHA98 the following theorem was proven
7. depending on whether P intersects p or not thus disallowing a more efficient GPU implementation However since the area of P N pij needs to be computed only for pixels containing edges or vertices of P the algorithm we use remains rather efficient Pixels on edges are dealt with while applying Bresenham s algorithm to raster the polygon The coverage of pixels containing vertices of P is obtained simply by computing the intersection of the polygon P with the square p j Convex optimization We tried several approaches for the actual convex opti mization All of these methods use the following rough scheme to construct the sequence of weight vectors wx i Determine a descent direction dy ii Determine a timestep sg and set Wk 1 Wk Skdk Methods to choose the descent direction d include gradient methods where d is simply V w Newton methods for which dp D w V w and quasi Newton methods In quasi Newton methods D w is not computed ex actly but estimated from previous evaluations of the gradients We chose the widely used low storage version of the BFGS scheme Fle87 implemented in C in libLBFGS The timestep sp is determined by a search along the line starting from w with direction dy Here again the literature is very vast as there is a trade off between finding a good step size the best choice would be to minimize the function s gt wk sd and requiring as few functions evaluat
8. level 4 As before we will consider the weight vectors at level as functions from the support Se of ue to R The function that we optimize at step is given by the same formula as in Eq 2 The multiscale descent algorithm is summarized in Algorithm 1 Note that the algorithm does not depend on the choice of the convex optimization scheme L BFGS which we will discuss later in Section 4 Algorithm 1 Multiscale minimization of p wr i 0 for l L 1 to 0 do set weo p we 1 Telp for every p Se k 0 repeat compute we k 1 from we using L BFGS on set Uk 1 t Vlw k k k 1 until vg q gt set We Wek end for In the stopping condition denotes the usual L4 norm where q gt 1 or q 00 In particular V8 w loo ow p u Vors p VO w Y Ap w Vor p pes The first quantity measures the maximum error that has been made by consider ing the weight vector w instead of the optimal one In particular if V w gt minpes Ap then one is sure that all the cells in the power diagram of S w inter sect u non trivially This is important especially for visualization purpose as the existence of cells with zero p mass lead to black spots in the interpolated pictures see Section 5 3 The choice of 1 plays a different role which we describe in the next paragraph 3 3 Computation of Wasserstein distances Simple lower and upper bounds on Wasserste
9. lt The speedup increases as goes to zero i e as the measure AU becomes more concentrated around the lower left corner of the original square S 5 2 Computation of Wasserstein distances We use the approach described in Section 3 3 to obtain lower and upper bounds on the Wasserstein distance between u and v at every step of the algorithm Figure 3 b and 3 c compare the evolution of these two bounds as a function of the runtime of the original and the multiscale algorithm 5 3 Displacement interpolation of images The concept of displacement in terpolation of two probability measures was introduced in McC97 It uses optimal transport maps as a replacement for the linear interpolation p 1 t tv Displacement interpolation can be a useful tool for the interpolation of grayscale image when the gray value of a pixel can be interpreted as a density of some quan tity e g satellite views of clouds preprocessed so that the gray level ranges from black to white depending on the thickness of the cloud We make use of the trans port map computed using the multiscale algorithm Recall that in order to apply this algorithm to a target measure with density 0 Q gt R we had to compute a first quantization of o v pes Apop using Lloyd s algorithm By construction of v and by definition of the optimal weight vector w one has for every point p in S is o a dz p p x da Vor s p NQ Vors w p NQ This suggests a way to
10. of the zero mean assumption for the convex potentials These two bounds for lim inf sup wn pn conclude the proof 4 IMPLEMENTATION In the first paragraph we give some details our implementation of the convex optimization method proposed in AHA98 for a fixed target measure Then we ex plain how we compute the hierarchical decomposition of the target measure needed for the multiscale algorithm 4 1 For a fixed target measure Solving optimal transport between a proba bility measure u with density p and a discrete measure v Spee Apop amounts to finding the minimum of the convex function given in Theorem 2 iii dw xew J de l viper pes 2 op f with w p x dg w p Vor p Pl We need three ingredients to achieve this goal an efficient and robust implemen tation of power diagram computation robust numerical integration functions and convex optimization software In this paragraph we discuss and motivate our choices regarding these three aspects Power diagrams We use the Regular_triangulation_2 package from CGAL cga It is tempting to try to avoid recomputing the whole power diagram for every evaluation of the function by using the same approach that was used in MMdCTAD09 to maintain the Delaunay triangulation However as shown in Figure 3 a the topology of the power diagram keeps changing until the very last steps of the optimization thus discarding this appro
11. 512 x 0 4512 For A 1 we will simply write U and S By L we denote is the standard grayscale picture of Lena on the square S Given a measure with density D we will denote by Dy a quantization of this measure with N points obtained using Lloyd s algorithm The decomposition of measures we work with are all obtained with the same parameters 14 Q MERIGOT source target original multiscale speedup U U10000 577s 143s 4 0 U U10000 1180s 189s 6 2 gU U10000 1844s 241s 7 6 U L40000 216s 52s 4 1 TABLE 1 Running time of the original and multiscale algorithm to find a weight vector such that V w lt 107 5 levels in the decomposition including the original one and level being made of N 5 Dirac masses 5 1 Comparisons with the original approach In Figure 2 b and 2 c we show the evolution of the distance between the weight vector obtained at a given time and the optimal one w This optimal weight vector had been previ ously obtained by running the algorithm with a target accuracy of V w o lt 107 The advantage of our multiscale method over the original convex optimization is especially important when the source and target measure are far from each other Table 1 compares the running time of the original and multiresolution algorithms to compute a weight vector adapted to the problem of optimally transporting AU to U1000 with a given accuracy V w
12. A MULTISCALE APPROACH TO OPTIMAL TRANSPORT QUENTIN MERIGOT LABORATOIRE JEAN KUNTZMANN UNIVERSITE DE GRENOBLE AND CNRS ABSTRACT In this paper we propose an improvement of an algorithm of Au renhammer Hoffmann and Aronov to find a least square matching between a probability density and finite set of sites with mass constraints in the Eu clidean plane Our algorithm exploits the multiscale nature of this optimal transport problem We iteratively simplify the target using Lloyd s algorithm and use the solution of the simplified problem as a rough initial solution to the more complex one This approach allows for fast estimation of distances be tween measures related to optimal transport known as Earth mover or Wasser stein distances We also discuss the implementation of these algorithms and compare the original one to its multiscale counterpart 1 INTRODUCTION Engineer and mathematician Gaspard Monge proposed the following problem Mon81 what is the cheapest way to transport a pile of sand into a hole with minimum cost knowing that moving an individual particle from a position x to another position y has a cost c z y This problem gave birth to the field of optimal transport which has been very vivid in the past twenty years with applications in geometry probability and PDEs see e g Vil09 However Monge s problem was an engineering problem and it is not very sur prising that various form of optimal transport appeare
13. Note that this result as well as Theorem 2 can be also obtained as a consequence of Brenier theorem Bre91 Theorem 1 For any probability measure u with density and a weighted set of points S w the map Ty is an optimal transport map between the measure u and the pushforward measure T y be Consequently 1 2 Wasso m Teln lke vlPo w az pes Vor p A MULTISCALE APPROACH TO OPTIMAL TRANSPORT 5 This theorem gives examples of optimal transport plans and the discrete prob ability measures supported on S that can be written as the pushforward T y bs where w is a set of weights on S As we will see in the next section Theorem 2 it turns out that every probability measure supported on S can be written in this way Said otherwise the optimal transport maps between and a measure supported on S can always be written as Ts for some weight vector w 2 3 An unconstrained convex optimization problem Adapted weight vec tors Let u be a probability measure with density p and v a discrete probabil ity measure supported on a finite set S with v pes Ap Sp A weight vector w S Ron S is called adapted to the couple of measures u v if for every site pin S one has 1 Ap Vor p f 7 te By Theorem 1 if the weigh vector w is adapted to the couple u v then the optimal transport between u and the discrete measure v is given by the map T s Moreover Theorem 2 below asserts that finding a weight ve
14. ach Numerical integration In our C implementation a measure u with density p is represented by an object which can answer the following two queries Given A MULTISCALE APPROACH TO OPTIMAL TRANSPORT 11 a convex polygon P ao N ao and a function f from P to R the class should provide a way to compute 1 the mass of P i e fp p a da 2 the integral of f over P i e fp f x p a de In practice we only use it the second query for the functions f x gt a xoll We developped two different models of measure with density The first one is the uniform measure on a convex polygon R In this case computing the mass of a polygon P amounts to computing the area of the intersec tion PN R of two convex polygons The integral of the squared distance function x x xol over the polygon PNR is computed by triangulating P and summing the integral over each triangle T The integral on T can be obtained in closed form if one denotes by cov T xo the covariance matrix of T with base point xo then z xo da cov T t9 11 cov T 20 22 T The second model corresponds to the density obtained from a grayscale image We assume that the density p is constant on each square pixel p i i 1 x 7 j 1 equal to the value a We then consider 12 oo 5 a j area P N pij i j 13 10ed Y aig area P opia ij Note that it is not possible to simply replace the area of P N p by zero or one
15. ans that the Wasserstein distance Wass2 u 0 is equal to ce Moreover by the reverse triangle inequality Wass2 u v Wasse u 7 lt Wass2 ve Wass2 vg v One then concludes using the previous proposition 3 4 Convergence of Optimal Transport Maps In this paragraph we discuss the soundness of constructing an initial weight vector for the optimal transport problem p ve from an adapted weight vector for the problem 1 ve 1 The result of this section are summarized in the following theorem Note that the definition of zero mean convex potential is given below and is necessary to define uniquely the adapted weight vector without this adapted weight vectors are defined up to an additive constant Theorem 3 Let v and Vn n gt be discrete probability measures supported on finite sets S and S n gt 1 respectively such that lim Wass2 v Vn 0 Let Wn Sn gt R be adapted to u Vn w S R be adapted to u v Suppose that both weight vectors yield zero mean convex potentials see below and that the assumptions of Proposition 2 are satisfied Then for every sequence of points pn Sn converging to a point pin S one has w p lim wn pn Before proving this theorem we need to introduce a few definitions and auxiliary results Convex potential Let v be a probability measure supported on a finite set S and let w denote the weight vector adapted to the optimal transport problem be
16. ath matique etde Physique pour la m me ann e 1781 pp 666 704 J J Mor and D J Thuente Line search algorithms with guaranteed sufficient decrease ACM Transactions on Mathematical Software TOMS 20 1994 no 3 286 307 V Oliker Mathematical aspects of design of beam shaping surfaces in geometrical optics Trends in Nonlinear Analysis Springer Verlag 2003 p 193 Y Rubner C Tomasi and L J Guibas The earth mover s distance as a metric for image retrieval International Journal of Computer Vision 40 2000 no 2 99 121 C Villani Optimal transport old and new Springer Verlag 2009
17. blem u 7e_1 in a hierarchical way 3 1 Decomposition of the target measure A decomposition of the target measure v bars App is a sequence of discrete probability measures ve e gt o such that vo v and ve is supported on a set Se Ve 5 Ap lOp pESe Moreover for every level we are given a transport map between vg and 141 that is a map me from the support Se of ve to the smaller support Se41 of ve with the additional property that for every point p in Se 1 5 Xpe 1 5 Xa t qEn7 p The decomposition that we will consider in practice are constructed using Lloyd s algorithm as explained in Section 4 2 This means that the transport map 7 maps every point pin S to its nearest neighbor in 41 We remark that having access to such a transport map between ve and 174 allows to bound the Wasserstein distance between these two measures By considering the composition of transport maps it is also possible to bound the distance between e g v and vzr Letting 7 mp_1 0 0 To one has 1 2 6 Wassz v vz lt gt gt Allp m p II pes A MULTISCALE APPROACH TO OPTIMAL TRANSPORT 7 3 2 Algorithm We are given the source probability measure u and a decompo sition ve o lt e lt z with L levels of the target measure The goal is to use the solution of the optimal transport problem from p to vep at level 1 to construct the initial weight vector for the optimal transport problem between u and v at
18. ctor adapted to the couple u v amounts to finding a global minimum of the function below thus turning the very constrained original problem minimization among convex maps into an unconstrained convex optimization problem Note that this theorem follows from the discussion in Section 5 of AHA98 2 w Apw p x s w du z 2 w 2 p fasal l p du Theorem 2 Given a measure u with density p on R and v gt following three statements are equivalent sc Apdp the i the power map T realizes an optimal transport between the measures pu and v ii w is adapted to the couple u v iii w is a global minimizer of the convex function We will recall briefly how the value of the gradient of the function can be com puted at a weight vector w Incidentally the steps necessary to this computation almost sketch a complete proof of the theorem Consider the map U w l ag TP vodno which is related to the our function by the equation w pes Apw p Y w The map W is concave as we will show by writing it as an infimum of linear functions Consider any map T from R to the finite set S By definition of the power cell for every point x in Vorg p one has lz pl w p lt z T z w T 2 As a consequence the function Y can be rewritten as U w inf Wr w where Vr w te T x w T 2 du 2 Since the functions Yy all depend linearly on w the
19. d in many applied fields In computer vision distances defined using optimal transport have been used as a way to compare the color histograms of images in RTGOO under the name of Earth mover distance Optimal transport on the circle has been used for transferring the hue of an image to another DSS10 In combination with Lloyd s algorithm op timal transport seems a interesting tool for optimal quantization BSD09 More recently methods inspired by or relying on optimal transport have been proposed as a tool in several parts of geometry processing surface comparison LD11 surface reconstruction from data corrupted with outliers CCSM10 MDGD110 construc tion of optimized primal dual triangulations MMD11 reconstruction with sharp corners and edges 4GCSAD11 Yet the lack of a practical method to compute optimal transports maps except in 1D has hindered the development of many of these applications Even in the simplest planar situation namely with c x y a y there is a lack of reason ably fast and widely usable method able to efficiently compute optimal transport maps 1 1 L optimal transport In the remaining of the paper we will deal with the L optimal transport i e where the cost for moving a certain amount of mass from a point x to a point y is proportional to square of the Euclidean distance z y This model is well understood theoretically and has several nice prop erties such as uniqueness of t
20. e to obtain complexity estimates for this approach because i nothing is known about the shape and size of the basin around the minimizer where Newton s method has quadratic convergence and ii the stability result Theorem 3 is not quantitative Understanding these two problems could open the way to even more efficient computations of optimal transport maps We also believe that this multiscale approach can be useful in the solution of more geometric problems with a similar structure An example of such a prob lem is Minkowski s problem given a set of normals 7 7iy and a set of areas Ai Awn such that 5 A 7 vanishes find a convex polytope whose facets normals are among the 7 and such that the facet corresponding to has an area of ex actly This problem has a similar multiscale structure as optimal transport and can be also solved by minimizing a convex functional LRO06 and would probably benefit from a multiscale approach A second example is the problem of designing a reflector antenna with prescribed image measure at infinity which can also be formally cast as an optimal transport problem Section 4 2 5 in O1i03 Acknowledgements ANR grant GIGA ANR 09 BLAN 0331 01 and Universit Grenoble I MSTIC grant GEONOR REFERENCES AHA98 F Aurenhammer F Hoffmann and B Aronov Minkowski type theorems and least squares clustering Algorithmica 20 1998 no 1 61 76 AHT03 S Angenent S Haker and A Tanne
21. erpolation between power cells on the left and Voronoi cells on the right obtained by linearly varying the weights from their value on the left picture to zero Their color vary so that the product of the area of a cell multiplied by its gray level remains constant over time countable family of disjoint measurable subsets The total mass of a measure wis mass j2 u R A measure u with unit total mass is called a probability measure The support of a measure u denoted by spt u is the smallest closed set whose complement has zero measure The optimal transport problem involves two probability measures a source mea sure u and a target measure v We will always suppose that the source measure ju has a density i e there exists a non negative function p on R such that for every measurable subset B of R4 p B pla de On the other hand we will assume that the target measure v is discrete supported on a finite set S of R This means that there exists a family of positive coefficients Ap pes such that for every subset B v B X p pESnB The above formula is equivalent to writing v as the sum ae ApOp Where 6 is the unit Dirac mass at p The integral of a continuous function with respect to these two measures is fga x du x fga O a p x da and fga x dv x Dope ApP P In Section 3 1 we will see how to adapt the proposed method to the case where the source and target measures both have density Tran
22. function WV is concave and is convex Moreover it is easy to check that the values of the functions Y and Vw 6 Q MERIGOT coincide for the weight vector w Consequently their gradient coincide to at that point and a simple computation shows Ow 3 Ow p T p d O 4 Ow p ms E A p oe In particular the second equation shows that the gradient V vanishes at a weight vector w if and only if w satisfies Eq 1 3 MULTISCALE APPROACH FOR MINIMIZING The efficiency of an optimization technique relies on two important choices The most important one is the choice of descent algorithm as it is well known that the difference in efficiency between for instance the first order simple gradient descent algorithm and the second order Newton methods can be tremendous Fle87 The second one is the choice of the position from where the optimization is started Its importance shouldn t be disregarded even for convex optimization as the second order convergence in Newton s descent does only happen in a small basin around the global minimizer In this section we introduce our multiscale algorithm for finding a global min imum of We start by building a decomposition of the target measure v i e a sequence of discrete measures vo V V1 Vg that are simpler and simpler as L increases The level of the decomposition is then used to construct a good initial weight vector for the optimal transport pro
23. he solution that follow from strict concavity of the cost Numerical schemes have been proposed by Brenier Benamou BBO00 Loeper LRO5 and Angenent Haker Tannenbaum AHT03 to solve L optimal transport However numerical instabilities makes them difficult to use for general problems 1 2 Q MERIGOT for instance LR05 requires a lower bound on the density of the source measure while the gradient descent algorithm of AHT03 suffers from a drift effect which produces optimal maps that are not transport plans Another possibility to find optimal transport plans is by discretizing the source and or target measure These discrete approach include linear programming the Hungarian method and a vari ant known as Bertsekas auction algorithm Ber88 These methods work for general cost functions and are often unable to take advantage of the geometric simplifica tions that occur when working specifically with the squared Euclidean metric A promising approach both from the practical and the theoretical point of view has been proposed in AHA98 In this approach the source measure has a density p while the target measure is given by a sum of Dirac masses supported on a finite set S The fact that the source measure has a density ensures the existence and uniqueness of the optimal transport map In this case solving the optimal transport problem amounts to finding a weight vector wp pes such that the power diagram of S w has the following p
24. in distance can be obtained at every step of the multiscale algorithm using the fact that V w 1 corresponds to the twice amount mass that has been misaffected This follows from the following proposition Proposition 1 Let w be the a weight vector on S and consider the image measure Di Ts 4p u Then Wass9 v 7 lt D x V w 2 7 where D is the diameter of the support S of v Proof By definition both this measure vy and the target measure v are supported on the same set S Moreover m VO w 1 gt gt pes Ap i p x da Vor p 8 Q MERIGOT corresponds to the amount of mass that has been mistransported The cost of transporting this mass back at the right place in S is at most Vv mD where D diam S As a consequence of this proposition stopping Algorithm 1 at level with weight vector we yields the following estimation of Wass2 j1 v wassa inr f le Torup 7 lt D x V w i Wasse v ve Said otherwise if one wants to compute the Wasserstein distance between u and v up to a certain error it is not necessary to consider the levels of the decomposition below the first level Z such that Wass2 ve v lt Note that this quantity can be estimated thanks to Eq 6 The effectiveness of this approach is discussed in Section 5 2 Proof of Eq 7 By Theorem 1 the map Ts w is an optimal transport between the measure u and y Ts we y H This me
25. ions as possible recall that in our case a function evaluation requires the construction of a complete Power diagram Figure 2 a shows that gradient descent methods are outperformed by quasi Newton ones regardless of the choice of line search It also shows that the choice of line search method is not as important barring the fixed step scheme For all 12 Q MERIGOT 100 X I l 11000 i 10 he 10 s 1 NS 0 1 iN g N lie S ve 0 001 a Gradient descent vs quasi Newton b Single vs multi scale U L3000 U L3000 we w at step k lwr w at time t l l a 1000 IS 100 10 1 ua 0 1 0 01 wg wl at time t c Single vs multi scale U L3000 FIGURE 2 Speed of convergence measured by the L distance be tween the weight vector at a given time step and the optimal one a Comparison of simple convex optimization algorithms gradi ent descent red with fixed step solid or strong Wolfe line search dashed and low storage BFGS algorithm blue with strong Wolfe solid or Mor Thuente line search dashed b and c Comparison between the original algorithm of AHA98 red and the multiscale one blue remaining experiments we use the low storage BFGS method with Mor Thuente line search MT94 4 2 Decomposition of the target measure Suppose for now that the measure v is discrete we will explain in the next paragraph h
26. nbaum Minimizing flows for the monge kantorovich problem SIAM journal on mathematical analysis 35 2003 no 1 61 97 16 BB00 Ber88 Bos10 Bre91 BSD09 CCSM10 cga aGCSAD11 DSS10 Fle87 LD11 LR05 LROO6 McC97 MDGD 10 MMD11 MMdCTAD09 Mon81 MT94 Oli03 RTG00 Vilo9 Q MERIGOT J D Benamou and Y Brenier A computational fluid mechanics solution to the Monge Kantorovich mass transfer problem Numerische Mathematik 84 2000 no 3 375 393 D P Bertsekas The auction algorithm A distributed relaxation method for the assignment problem Annals of Operations Research 14 1988 no 1 105 123 D Bosc Numerical approximation of optimal transport maps Preprint 2010 Y Brenier Polar factorization and monotone rearrangement of vector valued functions Communications on pure and applied mathematics 44 1991 no 4 375 417 Michael Balzer Thomas Schl mer and Oliver Deussen Capacity constrained point distributions a variant of Lloyd s method ACM Trans Graph 28 2009 86 1 86 8 F Chazal D Cohen Steiner and Q M rigot Geometric inference for probability measures Foundation of Computational Mathematics 2010 to appear CcaL Computational Geometry Algorithms Library http www cgal org F de Goes D Cohen Steiner P Alliez and M Desbrun An optimal transport approach to robust reconstruction and simplification of 2D shapes
27. o obtain the first discretization v Note that this idea was independently proposed in Bos10 This procedure provides a significant speedup up to an order of magnitude for computing optimal transport plans Moreover at every step of the algorithm it is possible to obtain a lower and upper bound on the Wasserstein distance also known as Earth mover distances RTG00 between the source measure u and the original target measure v Using this approach one can obtain rough estimates of Wasserstein distances between two images with a speedup of up to two order of magnitude over the simple convex optimization approach 2 BACKGROUND We briefly recap the few concepts of measure theory and optimal transport that we use before explaining the relation between the L optimal transport and power diagrams We also recall how optimal transport can be turned into an unconstrained convex optimization problem 2 1 Measure theory and Optimal transport A non negative measure u on the space R is a map from measurable subsets of R to a non negative numbers which is additive in the sense that u Uien Bi 5 Bi whenever B is a A MULTISCALE APPROACH TO OPTIMAL TRANSPORT 3 FIGURE 1 Interpolation between the standard pictures Photo graph and Peppers obtained with our algorithm see Section 5 3 The target image was quantized with 625 Dirac masses for the first row and 15625 Dirac masses for the second row The cells in these pictures are int
28. ow to convert an image to such a measure From this measure we construct a sequence of discrete probability measures ve with Vg 5 Xp tOp pESe such that v9 v and that the number of points of the support of ve decreases as increases The parameters of our algorithm are the number L of levels in the decomposition and for each level the number of points n in the support of the measure v In practice we found that choosing n n 0 k with k 5 usually provides good results Lloyd s algorithm Theorem 3 suggests that if we want to be able to construct a good initial weight vector for the problem u ve from a weight vector adapted to u ve41 we need to have ve as close as possible to vg in the Wasserstein sense Given the constraints that ve is supported on n l 1 points this means Ver arg min Wass2 D ve spt 7 lt n 1 A MULTISCALE APPROACH TO OPTIMAL TRANSPORT 13 lt 0 100 200 ae 108 205 150 Q 100 ti B B 8 100 2 9 a s 3 h 50 E L al xe 70 K 0 a Number of active sites U L3000 b Bounds on Wasserstein distance U L3000 Os 10s 20s 30s 400 he 200 F ma c Bounds on Wasserstein distance 4U L3000 Wasso j1 v at time t FIGURE 3 a Percentage of points in the support of the tar get measure whose Power cell intersects the support of the source measure during the execution of various
29. roperty for every point s in S the proportion of the mass of p contained in the corresponding power cell should be equal to the mass of the Dirac mass at s see Section 2 2 In the same article the authors also introduced a convex function whose minimum is attained at this optimal weight vector thus transforming the optimal transport problem into an unconstrained convex optimization problem on RY where N is the number of points in the set S 1 2 Contributions We revisit the approach of AHA98 with implementation in mind Our main contribution is a multiscale approach for solving the unconstrained convex minimization problem introduced in AHA98 and thus to solve L optimal transport Let us sketch briefly the main idea In order to solve the optimal transport problem between a measure with density u such as a grayscale image and a discrete measure v we build a sequence vo v Vz of simplifications of v using Lloyd s algorithm We start by solving the much easier transport problem between u and the roughest measure vz using standard convex optimization techniques Then we use the solution of this problem to build an initial guess for the optimal transport between u and vz _ We then proceed to convex optimization starting from this guess to solve the optimal transport between u and vz_ This is repeated until we have obtained a solution to the original problem If the original target measure has a density we use Lloyd s algorithm t
30. sport map The pushforward of a measure u by a map T R Rt is an other measure Ty u defined by the equation Ty B u T 1 B for every subset B of R A map T is called a transport map between u and v if the pushforward of u by T is v We denote by II v the set of transport maps between u and v For instance a map T is a transport map between the source measure u and the target measure v described at the beginning of this paragraph if and only if for 4 Q MERIGOT every point p in the support S of v Ay aT pH Pave on Optimal transport maps The cost of a transport map T between the source measure u with density p and the target measure v is defined by AT f le TOPe a je Te Poleae The problem of optimal transport also called Monge s problem is to find a trans port map Topt whose cost is minimal among all transport maps between jz and v i e Topt argmin c T T H p v The non convexity of both the cost function c and of the set of transport plans makes it difficult in general to prove the existence of a minimum However in this specific case where pu has a density and the cost is the squared Euclidean norm the existence follows from Bre91 Wasserstein distance The Wasserstein distance between two probability mea sures u and v u having a density is the square root of the minimal transport cost We denote it by Wasso u v Intuitively the Wasserstein distance measures the minimum global
31. tween u and v Set 8 98l 5 lel mip lie pl w0 9 max alp 5 w p ll where v w gt gt viw denotes the usual Euclidean scalar product From these two formulations it is easy to see that the function is convex and that its gradient Vo coincides with the transport map Tg We call such a function a conver potential for the optimal transport plan Since adding a constant to the A MULTISCALE APPROACH TO OPTIMAL TRANSPORT 9 weight vector or to does not change the transport plan we consider the zero mean convex potential which is uniquely defined by the extra assumption that Jra x p x da 0 Proposition 2 Let u and v be two probability measures u having density p and v supported on a finite set S Let vn be a sequence of probability measures supported on finite sets S s t lim Wasso v 0 Assume that i the support of p is the closure of a connected open set 2 with regular piecewise C boundary ii there exists a positive constant m such that p gt m on Q iii the support of all the measures vn is contained in a fixed ball B 0 L Denote resp n 5 the zero mean convex potential of the optimal transport between u and v resp Vn Then converges to uniformly on Q as n grows to infinity This proposition is similar to Bos10 Theorem 1 but without the requirement that the source and target measure have to be supported on convex sets

A multiscale approach to optimal transport

Contents

Download Pdf Manuals

Related Search

Related Contents