Home
        1 Introduction - USC - University of Southern California
         Contents
1.   A  Helmy  V  Jacobson  and L  Wei  Protocol Independent Multicast Dense  Mode  PIM DM    Protocol Specification  Proposed Experimental RFC  September 1996     A  J  Ballardie  P  F  Francis  and J  Crowcroft  Core Based Trees  In Proceedings of the ACM SIG   COMM  San Francisco  1993       S  Deering  D  Estrin  D  Farinacci  M  Handley  A  Helmy  V  Jacobson  C  Liu  P  Sharma  D  Thaler     and L  Wei  Protocol Independent Multicast   Sparse Mode  PIM SM   Motivation and Architecture   Experimental RFC  October 1996       S  Deering  D Estrin  D  Farrinacci  V  Jacobson  C  Liu  and L  Wei  An Architecture for Wide area    Multicast Routing  In Proceedings of the ACM SIGCOMM    94  London  1994     S  Deering  D  Estrin  D  Farinacci  V  Jacobson  C  Liu  and L  Wei  The PIM Architecture for  Wide Area Multicast Routing  ACM Transactions on Networks  April 1996       D  Estrin  D  Farinacci  A  Helmy  D  Thaler  S  Deering  M  Handley  V  Jacobson  C  Liu  P  Sharma     and L  Wei  Protocol Independent Multicast   Sparse Mode  PIM SM   Protocol Specification  Experi   mental RFC  December 1996       S  Deering  D  Estrin  D  Farinacci  V  Jacobson  C  Liu  L  Wei  P  Sharma  and A  Helmy  Protocol    Independent Multicast Sparse Mode  PIM SM    Protocol Specification  Internet Draft  December 1995   W  Fenner  Internet Group Management Protocol  Version 2  Internet Draft  January 1997     Liming Wei  Thesis Dissertation   Scalable Multicast Routing  Tree Types And Protoco
2.  We emphasize that it is important that this be true  regardless of the distribution of k  In other words  it must be true under any multicast address  allocation scheme  When the number of groups is small  the coefficient of variation is less important  since the RPs will be comparatively lightly loaded    We ran several simulations to verify that o  does indeed approach 0 in the limit  Figure 5 a   shows the results of varying the number of groups on o  among 10 RPs  In this simulation   both multicast group addresses and addresses of RPs were randomly chosen from the available    15                            0 25 0 2  0 18   ae  ae  2 0 2 Bx Z dig a  fe  5  10 7 f a  T ic Pee   amp  0 15    amp  0 14 s  5    042 a  S 5 ye  5 0 1 5 0 1 t    3 3  8  amp  0 08   0 05 Q  DON 0 06    1 1 Pani 0 04 1 1 1 1 1 1 1  100 1000 10000 100000 0 50 100 150 200 250 300 350 400    Groups   RPs in the Domain   a   b     Figure 5  Group balancing effectiveness    address spaces  Each point represents an average over 500 trials  As can be seen  the groups are  indeed balanced as g  gt  oo  Other simulations using sequential multicast group addresses and RP  addresses produced similar results  This is not surprising  since the randomness inherent in the  function f results in a mapping which is relatively independent of the distribution of its inputs    Figure 5 b  shows the results of varying the number of RPs in the RP Set  with the total  number of groups remaining constant at 5000  Again 
3.  When an RP or BSR changes state  i e   becomes reachable or unreachable  data loss may occur   The duration of time during which loss may occur is a function of the convergence time  much as  it is for unicast routing    Occasional RP failure or unreachability is unavoidable  When this happens  receivers using  the shared RP tree may lose data for groups that map to the failed RP  Meanwhile  the RP timer  associated with this RP  at the BSR  times out  and the RP is removed from the RP Set  Up to  this point  the RP Set remains consistent among all routers  When distribution of the new RP   Set starts  the groups that mapped to the failed RP may temporarily use inconsistent RP Sets  if  some routers receive the new RP Set before others  After the distribution is completed  all routers  converge on the new RP Set  When affected groups re join to the new RP  data will be received  again    In this section we enumerate the various network events that can result in data loss  For each  event we formulate an expression for the average convergence time  We use the following terms in  our analysis     e Convergence time     Conv      The time taken for all routers within a domain to converge  on  and join to  a consistent RP Set of reachable RPs  after network changes  This represents  an upper bound on the time taken for group participants    DRs to re join to a single reachable  RP  This time is measured from the point when the RP Set becomes inconsistent or an RP  within the RP 
4.  bootstrap timer  So long as the timer has not expired  the BSR is considered  active  and only messages from a preferred Candidate BSR  or from the active BSR  are  accepted  If the bootstrap timer has expired  this indicates that the stored BSR is no longer  active  and the next bootstrap message received will be accepted  Appendix A presents a  detailed state transition diagram for this mechanism  as well as the timer actions     When a bootstrap message is accepted by a PIM router  the BSR and RP Set information  within that router are updated  and the bootstrap timer restarted  The bootstrap message is then  forwarded out all interfaces except the receiving interface    To achieve better convergence for newly started PIM routers  a Designated Router may send a  bootstrap message carrying RP Set information to new PIM neighbors  In this case  a newly started  PIM router  having no RP Set information  accepts the first bootstrap message it gets  In general   however  bootstrap messages are only sent periodically  and are not triggered  This minimizes the  overhead associated with the mechanism  for detailed analysis of overhead see section 3 2     When a BSR becomes unreachable and bootstrap messages stop  routers continue to use the  stored RP Set until they get new information  This way  multicast data distribution is not disrupted  as long as the RP for the group is still reachable  For further discussion of detailed failure scenarios  see section 3 1     2 4 RP Set pro
5.  default timer values     E RP AddConv   lt  Upper Bound E SetDist   JoinLat    E RPDelConv   lt  150   Upper Bound E RP AddConv    E PartitionConv   lt  330   Upper Bound E RP AddConv    E HealConv   lt  30   Upper Bound  E RP AddConv         A network administrator could use this analysis to determine the maximum desired domain size  given a tolerable convergence time and observed probability of packet loss per link     3 2 Evaluation of control message overhead    Typically  state and bandwidth overheads are considered  For the bootstrap mechanisms  state  overhead is simply proportional to the number of RPs within the RP Set  Bandwidth overhead is    11    Upper Bound on Avg RPAddConv in sec Upper Bound on Avg RPDelConv in sec      sooj 0 60  614  77 70    Upper Bound on Avg PartitionConv in sec Upper Bound on Avg HealConv in sec          N  Number of Nodes in Domain  P   Probability of Packet Loss Per Link    Figure 3  Summary of convergence time    a function of Candidate RP Advertisement and bootstrap messages  Candidate RP  Advertisement  overhead is proportional to the number of candidate RPs  and the unicast distances between can   didate RPs and the BSR  A more complex issue is the overhead due to distribution of bootstrap  messages  The number of RPs in the domain included in the RP Set affects the size of bootstrap  messages  We assume the number of RPs is constant for the remainder of the comparisons     3 2 1 Bootstrap message overhead in steady state    In ste
6.  each point represents an average over 500  trials  From this  we see that o u grows with the number of RPs  However  since the load on each  RP decreases as the number of RPs grows  this imbalance becomes less important     5 Summary and Future Work    Rendezvous based multicast routing protocols are likely to be an important infrastructure technol   ogy for the evolving Internet and therefore it is important to carefully evaluate their robustness  and efficiency  While previous studies described and justified PIM   s use of explicit join mechanism  and the manner in which it builds and maintains distribution trees  this is the first treatment of  the other critical component of the architecture  the bootstrap mechanisms    In this paper  we have presented architectural and mechanistic details of the bootstrap mech   anisms  as well as systematic analysis and evaluation of the robustness and performance of these  mechanisms  Moreover  these bootstrap mechanisms could be readily used with other rendezvous   based multicast protocols  such as CBT    Currently under study are the extensions of the bootstrap mechanisms to a global inter domain  context and questions of multicast group address allocation and state aggregation     References     1  D  Waitzman S  Deering  C  Partridge  Distance Vector Multicast Routing Protocol  November 1988   RFC1075     16     3         6         9      10      11         12      13      14      15      16         17     D  Estrin  D  Farinacci
7.  expires  and the current state is    ElectedBSR     the router orig   inates a bootstrap message  and restarts the RP Set timer at  Bootstrap Period   No  state transition is incurred    This way  the elected BSR originates periodic bootstrap messages every  Bootstrap   Period      2  If a router is not a Candidate BSR      a  The router operates initially in the    AxptAny    state  In such state  a router accepts the  first bootstrap message from the The Reverse Path Forwarding  RPF  neighbor toward  the included BSR  The RPF neighbor in this case is the next hop router en route to the  included BSR     b  If the bootstrap timer expires  and the current state is    AxptPref       where the router  accepts only preferred bootstrap messages  those that carry BSR priority and address  higher than  or equal to     LclIBSR     from the RPF neighbor toward the included BSR   the router transits into the    AxptAny    state    In this case  if an elected BSR becomes unreachable  the routers start accepting boot   strap messages from another Candidate BSR after the bootstrap timer expires  All PIM  routers within a domain converge on the preferred reachable Candidate BSR     Receiving bootstrap message To avoid loops  an RPF check is performed on the included  BSR address  Upon receiving a bootstrap message from the RPF neighbor toward the included  BSR  the following actions are taken     1  If the router is not a Candidate BSR      a  If the current state is    AxptAny     the r
8.  includes the data source address  the multicast group address   the incoming interface from which packets are accepted  the list of outgoing interfaces to which  packets are sent  and various timers and flags  In this case  the wildcard route entry   s data source  is    any source     the incoming interface points toward the RP and the outgoing interfaces point to  the neighboring downstream routers that have sent join messages toward the RP  This state forms  a shared  RP rooted  distribution tree that reaches all group members    When a data source first sends to a group  its local router  D  unicasts register messages to  the RP with the source   s data packets encapsulated within  If the data rate is high  the RP can  send source specific join messages back towards the source  These will instantiate source specific  route entries in the routers on the path  and the source   s data packets will then follow the resulting  forwarding state and travel unencapsulated to the RP    Data packets reaching the RP are forwarded natively down the RP rooted distribution tree  toward group members  If the data rate warrants it  routers with local receivers can join a source   specific  shortest path  distribution tree  and prune this source   s packets off of the shared RP rooted  tree  However  for low data rate sources  PIM routers need not join a source specific shortest path  tree and data packets can be delivered via the shared RP tree    If a sender or receiver has more than on
9.  message  This bootstrap message  is suppressed when it reaches any PIM router having higher preference or which has received  bootstrap messages from another Candidate BSR with higher preference    Since the bootstrap timers are synchronized by the elected BSR   s bootstrap messages  this  mechanism may incur a lot of overhead  as shown in Figure 49  Y axis values are the total num   ber of bootstrap messages generated or forwarded  during a partition healing  until convergence       However  adding randomization to bootstrap message origination decreases the overhead of boot   strap messages significantly  Further  if the random wait value is a function of the priority and  address of the Candidate BSR  the behavior of bootstrap messages in transient states tends to that  of the steady state  In Figure 4  the dotted line represents this case  The Random Wait value is set  such that the highest addressed Candidate BSR is the first to originate a bootstrap message  The    function used here is       RandomW ait   logoa  1   ElectedBS RAddress     CandidateBSRAddress       Simulations used random graphs generated using RTG 10   As a worst case scenario  all routers were considered    Candidate BSRs   Simulations were performed using PIMSIM 11   a discrete event driven packet level simulator based on the Mary     land Routing Simulator MaRS  12  13   Future simulations will use the Network Simulator  ns   version 2  14        This function could be extended easily to accommodat
10.  time when group partition occurs     SIf no join messages are lost  this value is on the order of the end to end delay of a domain   TA group partitions when mutually reachable receivers and senders map to different RPs  or map to an unreachable    RP  and hence fail to rendezvous     Hence  the convergence time becomes     RP AddConv   SetDist   JoinLat    Deletion of an RP  In the following analyses we use the notation     a b     to denote a time interval  between    a    and    b     If an RP becomes unreachable from the BSR  its associated RP timer at the  BSR expires within the interval   RP Timeout   RPAdvPeriod   RPTimeout   at best  the RP failure  happens right before the RP sends a Candidate RP Advertisement  where the time interval between  the point of failure and RP timer timeout is    RPTimeout RPAdvPeriod     The BSR will send the new  RP Set at the next RP Set cycle  0  BootstrapPeriod   After    SetDist     the PIM domain converges  on the updated RP Set  The last converged router needs time    JoinLat    to join to the new RP   We note that potential data loss starts from the point of RP failure  or unreachability  but only  affects groups that map to the unreachable RP  After all PIM routers converge on the updated  RP Set and affected groups re join to their new RP  the data loss for the affected groups stops   During the RP Set Distribution Time  groups mapping to the unreachable RP are partitioned   The convergence time becomes     RPDelConv     RPTime
11.  uses an algorithmic mapping to  bind the group to one of the RPs in the RP Set  The DR then sends a join message towards  that RP  When the DR receives a data packet from a directly connected sender for such a group   it performs the same algorithmic mapping and sends the data packet encapsulated in a Register    message directly to the RP  All PIM routers within a domain use the same mapping algorithm   which we present in section 4     3 Evaluation metrics of the Bootstrap Mechanisms    One of the major design goals of the bootstrap mechanisms is scalability  In the next two subsec   tions  we show a detailed evaluation of the bootstrap mechanism  with emphasis on two critical  dimensions of scalability    robustness and overhead    For robustness  we study responsiveness of the protocol to router failures and topology changes   In particular  we explain the transient protocol behavior due to various network events  and evaluate  the time taken to converge on  and join to  a consistent RP Set  The study establishes an upper  bound on average convergence time    For overhead  we evaluate the domain resources  state and bandwidth  consumed by the boot   strap mechanisms  The state consumed is the memory required to maintain the BSR and RP Set  information in each router  and is simply proportional to the size of the RP Set  The analysis of the  bandwidth overhead  however  is more involved  because it is affected by the underlying topology     3 1 Evaluation of robustness   
12. A Dynamic Bootstrap Mechanism for Rendezvous based    Multicast Routing    Deborah Estrin  Mark Handley  Ahmed Helmy  Polly Huang  David Thaler     Abstract    Current multicast routing protocols can be classified into three types according to how the  multicast tree is established  broadcast and prune  e g   DVMRP  PIM DM   membership ad   vertisement  e g   MOSPF   and rendezvous based  e g   CBT  PIM SM   Rendezvous based  protocols associate with each logical multicast group address  a physical unicast address  re   ferred to as the    core    or    rendezvous point     RP   Members first join a multicast tree rooted  at this rendezvous point in order to receive data packets sent to the group  Rendezvous mech   anisms are well suited to large wide area networks because they distribute group specific data  and membership information only to those routers that are on the multicast distribution tree   However  rendezvous protocols require a bootstrap mechanism to map each logical multicast  address to its current physical rendezvous point address  The bootstrap mechanism must adapt  to network and router failures but should minimize unnecessary changes in the group to RP  mapping  In addition  the bootstrap mechanism should be transparent to the hosts    This paper describes and analyzes the bootstrap mechanism developed for PIM SM  The  mechanism employs an algorithmic mapping of multicast group to rendezvous point address   based on a set of available RPs distributed throu
13. BSM  4     TExp  OrigBSM  4     CandBSR Elected BSR       PrefBSMRxd  FwdBSM   1  2  3     Initial state  CandBSR  LclIBSR   Local address  LcIRP Set   empty    State transition diagram for a Candidate BSR     a     PrefBSMRxd  FwdBSM   1  2  3     TExp    AxptPref    BSMRxd  FwdBSM   1  2  3     Initial state  AxptAny  LcIBSR   0  LcIRP Set   empty    State transition diagram for a router not configured as C BSR     b     Figure 6  State Diagram for the BSR election and RP Set distribution mechanisms    Each PIM router keeps a bootstrap timer  initialized to  Bootstrap Timeout   in addition to a  local BSR field    ZclBSR     initialized to a local address if Candidate BSR  or to 0 otherwise   and a  local RP Set    ZclRP Set     initially empty   The main stimuli to the state machine are timer events    and arrival of bootstrap messages     19    Initial states and timer events    1  If the router is a Candidate BSR      a  The router operates initially in the    CandBSR    state  where it does not originate any  bootstrap messages     b  If the bootstrap timer expires  and the current state is    CandBSR     the router originates  a bootstrap message carrying the local RP Set and its own BSR priority and address   restarts the bootstrap timer at  Bootstrap Period  seconds  and transits into the    Elect   edBSR    state  Note that the actual sending of the bootstrap message may be delayed  by a random value to reduce transient control overhead     c  If the bootstrap timer
14. Set becomes unreachable    e Candidate RP Advertisement period     RPAdvPeriod      The period at which Candidate   RP Advertisement messages are sent   e g  60 seconds          All default values mentioned here are set as per the PIMv 2 specification 7      e RP timeout     RPTimeout      The time interval after which the BSR times out an RP   Typically  this timer is set to 2 5 times the Candidate RP Advertisement period  e g  150  seconds     e Bootstrap period     BootstrapPeriod      The period at which the elected BSR originates  periodic bootstrap messages  e g  60 seconds   We call the state in which the elected BSR  originates periodic messages the steady state    e Bootstrap timeout     Bootstrap Timeout      The time after which a Candidate BSR originates  a bootstrap message to elect a new BSR  We call the time during which the election is in  process the transient state   e g  150 seconds   2 5 times the bootstrap period     e RP Set distribution time     SetDist      The time to distribute an RP Set throughout the  domain or the time for all routers within the domain to converge on the distributed RP Set    e Join latency     JoinLat      The time taken to establish a new multicast branch leading to a  receiver  by processing hop by hop PIM join messages  The join messages may  in worst case   travel all the way to the RP  if no closer point exists on the multicast tree      e Join period     JoinPeriod      The time interval at which periodic join messages are 
15. ady state  when there is no BSR election  bootstrap messages will be originated periodically  every BootstrapPeriod by the BSR  Each router accepts only those bootstrap messages sent by  the Reverse Path Forwarding neighbor towards the BSR  The accepted bootstrap message is then  forwarded to all interfaces except the receiving interface    This behavior has the following property  If the unicast routing is stable and the link layer does  not cause any duplicate packet transmission  each node will forward bootstrap messages only once  over each interface except the incoming interface    By the above property  it is easy to show that the total number of bootstrap messages sent  within the domain equals     Cesr   gt   Gi 1    Za  AN  T     i CBSR    where C  is the degree of router i  and N is total number of routers in the PIM domain     12                               Bootstrap Message Overhead  Total Amount of Bootstrap Messages on Links   500   450    a 400    o  D     350   3    300             Overhead in Transient w  o  s Randomization  2 20T      E     Overhead in Transient w    200  Randomization      a 150   Q  5  3 100   ee   50   ania eeii  0  10 15 20 25 30 35 40 45 50  Number of Domain Nodes             Figure 4  Distribution of bootstrap overhead    3 2 2 Bootstrap message overhead in transient state    When the bootstrap timer expires in all Candidate BSRs  during transient state after a network  partition for example  each Candidate BSR originates a bootstrap
16. cessing    The bootstrap message indicates liveness of the RP in the RP Set  If an RP is included  then it is  tagged as    up    at the routers  while RPs not included in the message are removed from the set of  RPs over which the group to RP mapping algorithm functions    This mechanism adapts efficiently to network dynamics  including partitions  The BSR main   tains a timer for each RP  called the RP timer  When an RP becomes unreachable  the BSR stops  receiving advertisements from it  Consequently  the BSR   s RP timer for that RP expires  and the  RP is removed from the RP Set  Routers receiving the new RP Set  that does not contain the  failed RP   re map the affected groups to other reachable RPs from the new RP Set    Since the set of RPs are known to be live prior to initiating a multicast group  this scheme  requires no start up phase  and hence is suitable for applications with strict start up delay bounds   The bursty source problem is also eliminated    This RP liveness mechanism requires only a bootstrap message  Candidate RP Advertisement  message and an RP timer at the BSR  simplifying complex reachability mechanisms described in  a previous PIM specification  8   and obviating the need for RP reachability  RP list  and Poll  messages  in addition to various flags and timers     2 5 Group to RP mapping    When a Designated Router  DR  receives an IGMP Host Membership Report  9  from a directly  connected receiver for a group for which it has no state  the DR
17. e local router  one of them is elected the Designated  Router  DR   and only this router participates in this process     Figure 1  How senders rendezvous with receivers    1 2 Bootstrap mechanisms   architectural issues    Rendezvous based protocols require that every router consistently maps an active multicast group  address to the same RP  There are many ways to perform this task     e Application level advertisements to distribute RP information along with the multicast group  to potential participant applications    e A routing protocol to advertise group to RP mappings to routers    e A distributed directory in which routers can look up group to RP mappings on demand    e An algorithmic mapping from multicast address to RP address     Application level advertisements require changing the IP multicast service model  and require  adding application level dependencies on the choice of multicast routing protocol    Routing protocol based advertisements scale badly because each router needs to maintain the  mapping for all potential multicast groups  This may additionally add unpredictable delays between  choosing a multicast address and being able to use it    On demand look up using a distributed directory  such as DNS  scales better  but introduces  a startup delay from when a source starts sending to when the directory responds with the group   to RP mapping  This is a problem for sources wishing to use multicast for resource location  for  example  where only a brief q
18. e set of RPs must be distributed consistently to all  routers within a domain  This must be done robustly and efficiently  If routers have inconsistent  sets of RPs  groups may fail to rendezvous    The distribution of the set of RPs within a domain cannot be supported by rendezvous based  multicast without manual configuration  which is not robust to failures or misconfiguration  The  set of RPs could be distributed using a separate broadcast and prune multicast routing protocol  but this would introduce an unnecessary dependency  A third alternative  adopted here  uses a  simple bridge like flooding mechanism to distribute the set of RPs efficiently within each domain   see section 2 1     Finally  given the choice of an algorithmic mapping  we must use one that will achieve the goal  of mapping groups evenly across RPs in the domain   s RP set  thus reducing traffic concentration  at individual RPs  We also desire characteristics such as minimal group disruption when RP  reachability changes  the ability to map related groups to the same RP  and efficient implementation    The remainder of this paper provides a detailed evaluation of the design  robustness  and perfor   mance of the bootstrap mechanisms  Section 2 describes these mechanisms in detail and section 3  evaluates them in terms of robustness and efficiency  Section 4 details the algorithmic mapping and  its evaluation  Section 5 summarizes our results and proposals for future research  Supplemental  appendices p
19. e various priorities as well     13    4 Group to Rendezvous Point Mapping Algorithm    Having shown that all routers converge on a common RP Set  under all kinds of conditions  we  now consider the group to RP mapping using this RP Set    Ideally  a good mapping of groups to RPs meets the following requirements    e Group balancing under any multicast address allocation scheme  By this  we mean that when  the number of groups is large  no single RP is serving significantly many more groups than any  other RP in the RP Set  While this balances the number of groups as opposed to balancing  the actual load  the actual load will be balanced roughly when the number of groups is large   provided that the mapping is independent of the load represented by a group     e Minimal disruption of groups when there   s a change in the RP Set  By this  we mean that  the number of active multicast groups  and hence  the number of shared RP trees  affected  by a change in the RP Set must be as small as possible     e A set of related groups can map to the same RP  giving the same latency and fate sharing  characteristics  and allowing state aggregation  This is useful when data is split across multiple  groups because of different media types or for hierarchical encoding  e g    15  16  17       e Efficient implementation  This means there must be a fast implementation requiring only  32 bit integer arithmetic     We employ hash function theory to realize the desired algorithm  In the followi
20. elected  as the BSR for the domain  This    dense    distribution mechanism is efficient since all PIM routers  in the domain require the messages  which corresponds to a densely populated group    The elected BSR originates periodic bootstrap messages to capture network dynamics  If a  PIM domain partitions  each area separated from the old BSR will elect its own BSR  which will  distribute an RP Set containing RPs that are reachable within that partition  When the partition  heals  an election will occur with the next periodic bootstrap message and only one of the BSRs will  continue to send out bootstrap messages  As is expected at the time of a partition or healing  some  disruption in packet delivery may occur  This time will be on the order of the region   s end to end  delay and the bootstrap router timeout values  see section 3 1     The BSR election mechanism is integrated with the RP Set distribution mechanism described  in section 2 3  Complete mechanistic details are given in appendix A and  7      2 2 RP Set construction using Candidate RP Advertisements    A set of routers within a PIM domain are configured as candidate RPs  Typically these are the  same routers that are configured as Candidate BSRs   Once a BSR for the domain is elected   candidate RPs periodically unicast Candidate RP Advertisement messages to the elected BSR   These messages include the address of the advertising candidate RP and provide a liveness indication  to the BSR    The BSR chooses a 
21. ghout a multicast domain  The primary eval   uation measures are convergence time  message distribution overhead  balanced assignment of  groups to RPs  and host impact  The mechanism as a whole  and the design lessons in particular     are applicable to other rendezvous based multicast routing protocols as well     1 Introduction    Multicast routing is a critical technology for the evolving Internet  Previous papers have described  techniques that use some form of broadcast to build a multicast distribution tree from active  sources to current group members  DVMRP  1  and PIM DM  2  broadcast initial data packets   which in turn trigger and store prune state in those parts of the network that do not lead to group  members  MOSPF broadcasts membership information so that intermediate nodes can construct  source specific distribution trees on the fly  These multicast routing protocols are efficient when  each group is widely represented or the bandwidth is universally plentiful       Authors listed in alphabetical order     In contrast  Core Based Trees  CBT   3  and Protocol Independent Multicast Sparse Mode   PIM SM  or PIM for short   4   do not use broadcast  Rather they use explicit joining to a desig   nated core or rendezvous point whereby new receivers hear from all group sources and new sources  reach all group members  Such rendezvous based multicast routing protocols are better suited for  wide area multicast  where group members may be sparsely distributed and whe
22. ibution  mechanisms  These mechanisms are described by the following state machine  illustrated in figure 6   The protocol transitions for a Candidate BSR are given in state diagram  a   For routers not  configured as Candidate BSRs  the protocol transitions are given in state diagram  b      State variables   LcIBSR   Local concatenated BSR priority and BSR IP address  LclRP Set   Local RP Set   RxdBSR   Received concatenated BSR priority and BSR IP address  RxdRP Set   Received RP Set   Bootstrap Period   60 seconds   Bootstrap Timeout   2 5 x Bootstrap Period   150 seconds    Predicates    P RxdBSR 2 LclBSR    Incoming events    Name Interface Meaning   PrefBSMRxd RPF nbr toward included BSR Bootstrap msg rcvd satisfying P  BSMRxd RPF nbr toward included BSR Bootstrap message received  TExp Timer provider machinery Bootstrap timer expired    States   Name Meaning   AxptPref Accept only Bootstrap messages from preferred or equal BSR  AxptAny Accept any RP Set messages coming thru the right interface  CandBSR Candidate bootstrap router   ElectedBSR    Elected bootstrap router    Outgoing events    Name Interface Meaning       FwdBSM All interfaces except receiving interface Forward Bootstrap message    OrigBSM All interfaces Originate Bootstrap message    Specific actions     1    Restart Bootstrap timer at Bootstrap Timeout   2     LcIBSR   RxdBSR     3     LcIRP Set   RxdRP Set     4    Restart Bootstrap timer at Bootstrap Period    PrefBSMRxd  FwdBSM   1  2  3  TExp  Orig
23. in to the new RP    In this case  the convergence time is given by     PartitionConv   BootstrapTimeout   RPTimeout     0  BootstrapPeriod    SetDist   JoinLat    When a partition heals  the two previously partitioned regions merge  Each region will originally  have its own BSR and set of RPs  Of the two BSRs  one will be more preferred than the other  and become the BSR of the combined region  At the point where the two regions become mutually  reachable  groups are  by definition  partitioned and hence the convergence time starts from this  point     When the partition heals  the more preferred BSR sends a bootstrap message within the interval   0  BootstrapPeriod   All routers in the other region receive this message after SetDist  It then  takes JoinLat for them to join to a new RP  Thus  the initial convergence time is given by     HealConv    0  BootstrapPeriod    SetDist   JoinLat    Meanwhile  the candidate RPs in the other region start sending their Candidate RP  Advertisement  messages to the more preferred BSR  Then  one BootstrapPeriod after the first bootstrap message   the more preferred BSR may issue an updated RP Set with RPs from both previously disconnected  regions  This can cause another temporary group partition which is equivalent to that experienced  when a new RP is added     3 1 2 Evaluation of RP Set distribution time and join latency    The expressions derived above are all a function of RP Set distribution time and join latency  both  of which depe
24. l Dynamics   December 1995     Liming Wei  The design of the USC PIM SIMulator  PIMSIM   Technical Report USC TR 95 604     Computer Science Department  University of Southern California  Feburary 1995     Cengiz Alaettinoglu  A Udaya Shanker  Klaudia Dussa Zieger  and Ibrahim Matta  Mars  Maryland  Routing Simulator    version 1 0 user   s manual  Technical Report CS TR 2687  Computer Science    Department  University of Mariland  June 1991     Cengiz Alaettinoglu  A Udaya Shanker  Klaudia Dussa Zieger  and Ibrahim Matta  Design and imple   mentation of mars  A routing testbed  Technical Report CS TR 2964  Computer Science Department   University of Mariland  September 1992     Steven McCanne  The UCB LBNL Network Simulator ns   November 1996     N  Shacham  Multipoint communication by hierarchically encoded data  In Proceedings of the IEEE  INFOCOM 792  pages 2107 2114  1996     D  Taubman and A  Zakhor  Multi rate 3 D subband coding of video  IEEE Transactions on Image  Processing  3 5  572 588  September 1994     Steven McCanne  Van Jacobson  and Martin Vetterli  Receiver driven Layered Multicast  In Proceedings  of the ACM SIGCOMM    96  August 1996     17     18  David Thaler and Chinya V  Ravishankar  A Name Based Mapping Scheme for Rendezvous  Technical  Report CSE TR 316 96  University of Michigan  November 1996     18    Appendices    A BSR Election and RP Set Distribution    For simplicity  the bootstrap message is used in both the BSR election and the RP Set distr
25. message is lost anywhere in  the domain  convergence takes at least another BootstrapPeriod  The second message will then  complete the convergence if there is no loss between the BSR and any routers which did not receive  the first message  The number of times the bootstrap message must be sent until it reaches all  routers is then bounded above by a geometric distribution function  The upper bound on the  expected number of trials until success is thus given by     1    Expected trials until success  lt  H   p   Pr no bootstrap loss     Note that this is simply an upper bound  since in fact each node need only get the message  once  Thus  once an entire subtree has received the message  later retransmissions dropped within  the subtree are irrelevant    If we assume for the sake of simplicity that P  is the same for all links  then we get     1  Expected trials until success  lt  0  P   1  E SetDist   lt  C  BootstrapPeriod   aa TPN 1     We can apply the same process to finding the average join latency  Let Cj be the average  transmission  processing and propagation delay experienced from the DR to the RP  In the worst  case  a linear topology   the join message again crosses N     1 links  giving us upper bounds of     1  Expected trials until success  lt  UPNT  1    21    
26. nd upon the characteristics of the domain topology and the number of successive  bootstrap message losses    In the absence of packet loss  the average RP Set distribution time grows linearly with the end   to end delay of the domain  Therefore  we represent the distribution time for a particular topology  in the absence of packet loss as a constant Cy  This constant covers transmission  processing and  propagation delays    Next  we consider the effect of packet loss  In Appendix B  we show that the expected RP Set  distribution time is bounded by     E Set Dist   lt  Cy   BootstrapPeriod    Gs     1   where E SetDist  is the expected value of SetDist  N is the number of nodes in the domain  and  P is the probability of packet loss on a link    PIM join messages are processed hop by hop  and establish multicast state within intermediate  routers  If the first join message reaches the RP successfully  the value of JoinLat is simply the time  to establish the shared multicast tree within intermediate routers    Such time includes transmission   processing times and propagation delays  and is represented by the constant C     As with bootstrap messages  PIM join message loss contributes to overall convergence time   Using similar terminology  we can likewise express the expected join latency     JoinLat     in the  presence of packet loss as being bounded by     1  Given these formulations for the expected RP Set distribution time and join latency  Figure 2  shows combined dela
27. ng subsections  we provide background information on theory and implementation of conventional hash functions   followed by an explanation of the selected Highest Random Weight  HRW  scheme  Then  we  evaluate the scaling properties of the chosen algorithm     4 1 Hash Function Theory and Implementation    A conventional hash function maps a key k to one of n    buckets    by computing a bucket number i  as a function of k  i e   i   h k   Typically  such functions are defined as f k  modulo n  where f is  some function of k  e g   f  k    k when k is an integer   For our purposes  a key k corresponds to  a multicast group address  and a bucket corresponds to an RP from the RP Set  The problem with  using a Modulo n function for Group to RP mapping  however  comes when the RP Set changes    We first define disruption as the fraction of keys which map to a different bucket when the set of  buckets changes  i e   the number of multicast groups which get remapped to a different RP when  the RP Set changes  Since the RP Set may change whenever a candidate RP goes up or down   the number of buckets  n  over which the hash function will operate can change over time  For  example  when a single RP fails and the number of RPs in the RP Set changes from n to n     1   all groups would be remapped except those for which f k  mod n   f k  mod  n   1   When f k   is uniformly distributed  the disruption will thus be na  or almost all the groups  Clearly  this is  insufficient for our pur
28. o the  RP Set  again only   g groups will be affected  but all groups must be rehashed to determine  which of them are affected  This can be optimized by storing the values of i and f k 1 z   in per   group state  Then for each group k  only f k 1 7    where I 7  is the address of a new RP  need be  calculated to determine which groups are affected  This again can be done in O mg  time    Finally  to allow state aggregation  we now modify the definition of the key k to be     k    group address  amp  mask     The mask determines how much state aggregation will exist  i e  how many groups will always map  to the same RP   This mask is specified by the BSR and is included with the RP Set distributed to  all other routers in the domain  For example  if this mask is  hex  FFFFFFFC  the recommended  default   then sets of four consecutive group addresses will map to the same RP  A multicast address  allocation scheme may take advantage of this when allocating multiple multicast addresses for the  same session by allocating a set of addresses which have the longest possible prefix in common     4 2 Evaluation of the Hash Function    To determine how well our hash function scales with respect to group balancing  we use the coef   ficient of variation  defined as the ratio between the standard deviation and the mean  o u   of  the number of active groups mapped to each RP in the RP Set  For group balancing to scale  we  desire that     PETEN    where g is the number of active groups 
29. out     RP AdvPeriod   RPTimeout      0  BootstrapPeriod    SetDist   JoinLat    Network partitions In addition to single BSR change and RP change  we are interested in  events such as network partition and partition healing that may sometimes cause simultaneous  changes in both RP and BSR reachability    When a domain partitions  it is divided into two separate regions  one of which will have the  old BSR and some subset of the RPs in the RP Set  and the other will have the remainder of the  RPs in the RP Set  We will look at each    side    of the partition in turn    For the region containing the original BSR  any groups using RPs in that region will be unaf   fected  and hence their convergence time is    0     Any groups using RPs that are now unreachable  will see convergence times equivalent to those experienced when an RP is deleted    On the other side of the partition  without the original BSR  all groups using RPs which are  still reachable  on the same side  will remain unaffected and hence also have a convergence time  of    0     Groups using now unreachable RPs  however  experience longer convergence times  A new  BSR is elected after    Bootstrap Timeout     After    RPTimeout     the new BSR times out the RP timer   Then  the new BSR updates the RP Set and distributes the new RP Set within the interval  0   BootstrapPeriod   After    SetDist     the PIM domain converges on the updated  smaller  RP Set   The last converged router needs time    JoinLat    to jo
30. outer accepts the bootstrap message  and transits  into the    AxptPref    state     b  If the current state is    AxptPref     and the bootstrap message is preferred  the message  is accepted  No state transition is incurred     2  If the router is a Candidate BSR  and the bootstrap message is preferred  the message is  accepted  Further  if this happens when the current state is    Elected BSR     the router transits  into the    CandBSR    state     When a bootstrap message is accepted  the router restarts the bootstrap timer at  Bootstrap   Timeout   stores the received BSR priority and address in    LclIBSR     and the received RP Set in     EclRP Set     and forwards the bootstrap message out all interfaces except the receiving interface    If a bootstrap message is rejected  no state transitions are triggered     20    B Analysis of RP Set distribution time and join latency    We first define the following terms     N   number of routers in the PIM domain     Cy  average RP Set distribution time without loss of bootstrap messages     P    the probability that a given bootstrap message crossing router is RPF link towards the  BSR is lost    Pr no bootstrap loss    the probability that a given bootstrap message reaches all other  routers in the PIM domain     2       Pr no bootstrap loss     1     P      i 1    If no messages are dropped  the average RP Set distribution time SetDist should be no greater  than the end to end delay  and equal to Cp  If the first bootstrap 
31. poses    A much better scheme  which we employ  is the Highest Random Weight  HRW  algorithm 18    HRW defines the hash function so as to give the bucket with the maximum value of a function f   ie     h  i  f k  I i    gt  f k U 9     O lt ij lt n  where the bucket label l i  is a function of the bucket number  for our purposes  l i  is the unicast  IP address of the i RP in the RP Set  and f is a function of both the key and the bucket    14    label  If f is sufficiently random with respect to k and l  this has the desirable property that the  disruption caused by m RPs changing is just m n  This successfully achieves the minimum bound  on disruption as shown in  18     We now need to find an appropriate function for f that satisfies the above constraint  For this   we use the following function     f  k  U 4      1103515245     1103515245k   12345  XOR I i     12345  mod 23     which is derived from the BSD rand   function  and is shown in  18  to perform well as a hash  function compared with other pseudo random number generators  It also lends itself well to an  efficient 32 bit implementation    When a router detects that one or more  m  RPs have failed  by their absence from a newly  received RP Set   only those groups which previously mapped to those RPs must be rehashed  For  each of these    g groups  h is recomputed in O n  time  for a total processing requirement of  O mg  work    On the other hand  when a router detects that one or more  m  RPs have been added t
32. re bandwidth is a  scarce resource  The PIM architecture  5  6  4  described the rationale and design details by which  the multicast trees are built and maintained  In this paper we describe the bootstrap mechanisms   in particular  the distribution of rendezvous point information and the use of that information to  map specific groups to RPs in a consistent  distributed  and robust manner  We focus on the  robustness and control message overhead of the bootstrap mechanisms because these are the main  determinants of scalability    After a short overview of PIM SM we motivate the design of its bootstrap mechanisms and  outline the remainder of the paper     1 1 Protocol Independent Multicast   Overview    In this section we provide an overview of the architectural components of PIM SM  referred to  hereafter as PIM   6  4  7   The architecture described here operates in each multicast domain   which in this context is a contiguous set of routers that all implement PIM and operate within a  common boundary defined by PIM Multicast Border Routers  7     As shown in Figure 1  when a receiver   s local router  A  discovers it has local receivers  it starts  sending periodic join messages toward a group specific Rendezvous Point  RP   Each router along  the path toward the RP builds a wildcard  any source  route entry for the group and sends the  join messages on toward the RP  A route entry is the forwarding state held in a router to maintain  the distribution tree  Typically it
33. rovide elaborate details of protocol models  mathematical analyses  and simulation  detail     2 Bootstrap mechanisms  Design description and rationale    This section describes how the set of RPs is distributed throughout a domain  We elaborate on  the design rationale of the bootstrap mechanisms  The three critical elements of the design are  the bootstrap router  the candidate RPs  and the RP Set  The following subsections describe how  these elements fit together to realize the bootstrap mechanisms     2 1 BSR election    A BootStrap Router  BSR  is a dynamically elected PIM router within a PIM domain  It  collects the set of potential RPs  and distributes the resulting set of RPs  RP Set  to all the  PIM routers in the PIM domain  Centralizing the RP Set distribution reduces the opportunities  for inconsistency  while dynamic election provides robustness in the face of network changes and  failures    A small set of routers within a PIM domain are configured as candidate bootstrap routers   Candidate BSRs   and each is given a  not necessarily unique  BSR priority  From these can   didates  a single bootstrap router  BSR  is elected for that domain  Should the current BSR fail     or another Candidate BSR with higher priority be added to the network  a new BSR is elected  automatically    The mechanism is based upon a bridge like spanning tree election algorithm  bootstrap messages  originated by the Candidate BSRs travel hop by hop  and the most preferred  candidate is 
34. sent  e g   60 seconds      At the end of this section we present sample convergence times for various network topologies  and loss probabilities     3 1 1 Convergence time due to various network events    When network dynamics affect BSR and RP reachability  data loss may occur  The bootstrap mech   anism described here was designed to adapt to these changes in a timely fashion  The convergence  time varies depending upon the particular network event     Changes in BSR A change in the BSR without a change in the RP Set does not partition  groups     nor cause data loss  RPs continue to support active groups without disruption to data  delivery  so long as the RPs are reachable during the BSR re election     Addition of a new RP When a new candidate RP becomes reachable  it may immediately  obtain the BSR address from its neighbors or wait for the periodic bootstrap messages  The new  candidate RP then sends a Candidate RP Advertisement to the BSR  Provided that the Candidate   RP Advertisement message is not lost  and the BSR includes it in a new RP Set  the new RP Set  will be distributed during the next RP Set cycle  After the RP Set distribution time     SetDist       the domain converges on the updated RP Set  The last converged router needs time    JoinLat    to  join to the new RP    Group partition may occur during the distribution of the new RP Set  as some routers get the  new information before others  There is potential data loss for some group members during the 
35. subset  of  or all  the live candidate RPs to form the RP Set  which is then  distributed in the bootstrap messages     2 3 RP Set distribution    The bootstrap messages originated by the BSR carry the BSR   s address and priority  and the RP   Set  A bootstrap message is multicast out all interfaces with a TTL of 1  to all directly connected  PIM routers  7     Before accepting a bootstrap message  a PIM router performs two checks     1  Only those bootstrap messages that arrive from the Reverse Path Forwarding  RPF  neighbor   towards the BSR are processed  others are discarded  The RPF check prevents looping of  bootstrap messages  Persistent looping would lead to usage of obsolete information     Unicast routing dynamics may cause transient loops but these do not affect the eventual  correctness of the mechanism      The preferred router is the one with highest configured priority or highest addressed if priorities are equal     Any PIM router can be configured as a candidate RP or Candidate BSR  However  to maximize reachability     only well connected stable routers should be configured as candidates   3The BSR attempts to choose the same subset each time if possible      The RPF neighbor for an IP address X is the the next hop router used to forward packets toward X     2  If the RPF check passes  a check is performed against the previously stored active BSR  information  priority and address   Every time a PIM router gets a message from the elected  BSR  it restarts a
36. uery may be sent  and then the sender goes silent again  Such bursty  sources may send for less time than the time taken to acquire the mapping  If the inter burst  period is longer than the routers state timeout period  then all the source   s packets may be lost  deterministically  We refer to this problem as the bursty source problem    All three of these mechanisms do not allow applications to algorithmically generate and use  multicast addresses without announcing them to all potential recipients  a bootstrapping problem  in some circumstances     Moreover  all three of these mechanisms introduce a dependency on application level directory  assistance in some way  if only to insert the mapping into the database   Ideally we would like  multicast to be as low a level service as possible  depending only on unicast routing and routers  themselves     The use of an algorithmic mapping from group to RP address allows us to address these com   peting design issues  This approach entails periodic distribution of a set of reachable  RPs to all  routers in the domain  and use of a common hash function to map any multicast address to one  router from the set of RPs  The algorithmic mapping approach requires routers to store and main   tain liveness information for a relatively small and stable set of RPs per multicast domain  This  approach scales well  is purely a low level routing function  and does not require any changes to  hosts    For such an algorithmic mapping to work  th
37. ys as a function of network topology and packet loss  per link  probabilities   We present examples for the combination of RP Set distribution time and join latency because    together they are the common variables in most of the convergence formulas  These results were  based on several simplifications and assumptions  The constants  Cp and C   were assumed to be      Note that this represents an upper bound on the join latency  JoinLat  for all routers  since in general join    messages only reach the nearest point of the established tree  and need not reach the RP per se     10       Upper Bound of Average SetDist   JoinLat    0 001    Convergence Time  Second   A      Q  oO       0 0001 P  Probability of Packet  0 00     Loss Per Link     N  Number of Domain Nodes  500             Figure 2  Combined RP Set distribution time and join latency    SetDist   JoinLat       negligible as compared to the other terms  and default values from the PIM SM specification  7   were used for the JoinPeriod  BootstrapPeriod  RPTimeout  BootstrapTimeout  and RPAdvPeriod     Values used for the packet loss probability per link were restricted to those ranges considered typical  of current networks     3 1 3 Summary of convergence time    To conclude our discussion of convergence time  Figure 3 shows the results of substituting numbers  in Figure 2 into the convergence formulas for the various network events  The results shown were  obtained using the following simplified equations  based on
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
D6000 - Codonics Disinfection Technology Systems Reference Guide  INSTRUCTIONS MODEL: P6BO  Tabla de Registros iQBio™ Guardian XL  Lenovo IdeaTab A2109 16GB Black, Champagne, Gold  Benutzerhandbuch  顕微鏡組織標準片 取扱説明書  Samsung GT-I8910 คู่มือการใช้งาน  Hobbico DIDC0046  都筑センターだより第 43 号  Philips Ledino    Copyright © All rights reserved. 
   Failed to retrieve file