Question? Leave a message!




Exterior Gateway Protocols: EGP, BGP-4, CIDR

Exterior Gateway Protocols: EGP, BGP-4, CIDR 36
Exterior Gateway Protocols: EGP, BGP4, CIDR Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1Overview  Cores, Peers, and the limit of default routes  Autonomous systems EGP  BGP4  CIDR: reducing router table sizes  Refs: Chap 10,14,15. Books: “Routing in Internet” by Huitema, “Interconnections” by Perlman, “BGP4” by Stewart, Sam Halabi, Danny McPherson, Internet Routing Architectures  Reading: Geoff Huston, Commentary on Interdomain Routing in the Internet  Reference: BGP4 Standards Document: In TXT  Reading: Norton, Internet Service Providers and Peering  Reading: Labovitz et al, Delayed Internet Routing Convergence  Reference: Paxson, EndtoEnd Routing Behavior in the Internet,  Reading: Interdomain Routing: Additional Notes: In PDF In MS Word  Reference Site: Griffin, Interdomain Routing Links Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2History: Default Routes: limits  Default routes = partial information  Routers/hosts w/ default routes rely on other routers to complete the picture.  In general routing “signposts” should be: Consistent, I.e., if packet is sent off in one direction then another direction should not be more optimal. Complete, I.e., should be able to reach all destinations Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3Core  A small set of routers that have consistent complete information about all destinations.  Outlying routers can have partial information provided they point default routes to the core Partial info allows site administrators to make local routing changes independently. CORE . . . S1 S2 Sm Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4Peer Backbones  Initially NSFNET had only one connection to ARPANET (router in Pittsburg) = only one route between the two.  Addition of multiple interconnections = multiple possible routes = need for dynamic routing  Single core replaced by a network of peer backbones = more scalable Today there are over 30 backbones  Routing protocol at cores/peers: GGP EGP BGP4 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5Exterior Gateway Protocol (EGP)  A mechanism that allows noncore routers to learn routes from core (external routes) routers so that they can choose optimal backbone routes  A mechanism for noncore routers to inform core routers about hidden networks (internal routes)  Autonomous System (AS) has the responsibility of advertising reachability info to other ASs. One+ routers may be designated per AS. Important that reachability info propagates to core routers Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6Purpose of EGP you can reach net A via me AS2 EGP AS1 R3 R2 traffic to A R1 A table at R1: border router R dest next hop A R2 internal router Shivkumar Kalyanaraman RenssShare elaer Polytechnic Iconne nstitute ctivity information across ASes 7EGP Operation  Neighbor Acquisition: Reliable 2way handshake  Neighbor Reachability:  Hellos: j out of m hellos OK = Neighbor UP  k out of n hellos NOT OK = Neighbor DOWN  Updates/Queries:  EGP is an incremental protocol. New info = send updates  Each router can query neighbors as well  Reachability advertized; metrics ignored  Requires a tree topology of ASes to avoid loops (eg: see next slide) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8Why EGP Requires a Tree Structure.. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9EGP weaknesses  EGP does not interpret the distance metrics in routing update messages = cannot be compute shorter of two routes  As a result it restricts the topology to a tree structure, with the core as the root Rapid growth = many networks may be temporarily unreachable Only one path to destination = no load sharing  Need new protocol = BGP4 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10Today’s Big Picture Large ISP Large ISP Stub Small ISP DialUp Access ISP Network Stub Stub Large number of diverse networks Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11Internet AS Map: caida.org Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12Autonomous System(AS)  Internet is not a single network Collection of networks controlled by different administrations  An autonomous system is a network under a single administrative control  An AS owns an IP prefix  Every AS has a unique AS number  ASes need to internetwork themselves to form a single virtual global network Need a common protocol for communication Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13IntraAS and InterAS routing C.b Gateways: B.a •perform interAS A.a routing amongst A.c b c themselves a a C •perform intraAS b a B routers with other d routers in their AS c b A network layer interAS, link layer intraAS physical layer routing in gateway A.c Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14Who speaks InterAS routing AS2 BGP AS1 R3 R2 R1 border router internal router R  Two types of routers  Border router(Edge), Internal router(Core)  Two border routers of different ASes will have a BGP Shivkumar Kalyanaraman session Rensselaer Polytechnic Institute 15IntraAS vs InterAS  An AS is a routing domain  Within an AS:  Can run a linkstate routing protocol  Trust other routers  Scale of network is relatively small  Between ASes:  Lack of information about other AS’s network (Link state not possible)  Crossing trust boundaries  Linkstate protocol will not scale  Routing protocol based on route propagation Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16Autonomous Systems (ASes)  An autonomous system is an autonomous routing domain that has been assigned an Autonomous System Number (ASN). All parts within an AS remain connected. … the administration of an AS appears to other ASes to have a single coherent interior routing plan and presents a consistent picture of what networks are reachable through it. RFC 1930: Guidelines for creation, selection, and registration of an Autonomous System Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17IP Address Allocation and Assignment: Internet Registries IANA www.iana.org APNIC ARIN RIPE www.apnic.org www.arin.org www.ripe.org Allocate to National and local registries and ISPs Addresses assigned to customers by ISPs RFC 2050 Internet Registry IP Allocation Guidelines RFC 1918 Address Allocation for Private Internets RFC 1518 An Architecture for IP Address Allocation with CIDR Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18AS Numbers (ASNs) ASNs are 16 bit values. 64512 through 65535 are “private” Currently over 11,000 in use. • Genuity: 1 • MIT: 3 • Harvard: 11 • UC San Diego: 7377 • ATT: 7018, 6341, 5074, … • UUNET: 701, 702, 284, 12199, … • Sprint: 1239, 1240, 6211, 6242, … • … Shivkumar Kalyanaraman Rensselaer Polytechnic Institute ASNs represent units of routing policy 19Nontransit vs. Transit ASes Internet Service ISP 2 providers (ISPs) ISP 1 have transit networks NET A Nontransit AS Traffic NEVER might be a corporate flows from ISP 1 or campus network. through NET A to ISP 2 Could be a “content provider” Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20Selective Transit NET B NET C NET A provides transit NET A between NET B and NET C NET A DOES NOT and between NET D provide transit and NET C Between NET D and NET B NET D Most transit ASes allow only selective transit key impact of commercialization Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21Customers and Providers provider IP traffic provider customer customer Customer pays provider for access to the Internet Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22CustomerProvider Hierarchy provider IP traffic customer Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23The Peering Relationship Peers provide transit between peer peer their respective customers provider customer Peers do not provide transit between peers traffic traffic NOT Peers (often) do not exchange allowed allowed Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24Peering Wars Peer Don’t Peer  Reduces upstream transit  You would rather have costs customers  Can increase endtoend  Peers are usually your performance competition  May be the only way to  Peering relationships may connect your customers require periodic to some part of the renegotiation Internet (“Tier 1”) Peering struggles are by far the most contentious issues in the ISP world Peering agreements are often confidential. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 25Requirements for InterAS Routing  Should scale for the size of the global Internet.  Focus on reachability, not optimality  Use address aggregation techniques to minimize core routing table sizes and associated control traffic  At the same time, it should allow flexibility in topological structure (eg: don’t restrict to trees etc)  Allow policybased routing between autonomous systems  Policy refers to arbitrary preference among a menu of available routes (based upon routes’ attributes)  Fully distributed routing (as opposed to a signaled approach) is the only possibility.  Extensible to meet the demands for newer policies. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 26Recall: Distributed Routing Techniques Link State Vectoring  Topology information is  Each router knows little flooded within the routing about network topology domain  Only best nexthops are  Best endtoend paths are chosen by each router for computed locally at each each destination network. router.  Best endtoend paths result  Best endtoend paths from composition of all next determine nexthops. hop choices  Based on minimizing some  Does not require any notion notion of distance of distance  Works only if policy is shared Does not require uniform and uniform policies at all routers  Examples: OSPF, ISIS Examples: RIP, BGP Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27BGP4  BGP = Border Gateway Protocol  Is a PolicyBased routing protocol  Is the de facto EGP of today’s global Internet  Relatively simple protocol, but configuration is complex and the entire world can see, and be impacted by, your mistakes. • 1989 : BGP1 RFC 1105 – Replacement for EGP (1984, RFC 904) • 1990 : BGP2 RFC 1163 • 1991 : BGP3 RFC 1267 • 1995 : BGP4 RFC 1771 – Support for Classless Interdomain Routing (CIDR) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 28BGP Operations (Simplified) Establish session on AS1 TCP port 179 BGP session Exchange all active routes AS2 While connection is ALIVE exchange Exchange incremental route UPDATE messages updates Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 29Four Types of BGP Messages  Open : Establish a peering session.  Keep Alive : Handshake at regular intervals.  Notification : Shuts down a peering session.  Update : Announcing new routes or withdrawing previously announced routes. announcement = prefix + attributes values Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 30Border Gateway Protocol (BGP)  Allows multiple cores and arbitrary topologies of AS interconnection. Uses a pathvector concept which enables loop prevention in complex topologies  In ASlevel, shortest path may not be preferred for policy, security, cost reasons. Different routers have different preferences (policy) = as packet goes thru network it will encounter different policies BellmanFord/Dijkstra don’t work BGP allows attributes for AS and paths which could include policies (policybased routing). Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 31BGP (Cont’d)  When a BGP Speaker A advertises a prefix to its B that it has a path to IP prefix C, B can be certain that A is actively using that ASpath to reach that destination  BGP uses TCP between 2 peers (reliability)  Exchange entire BGP table first (50K+ routes)  Later exchanges only incremental updates  Application (BGP)level keepalive messages  Holddown timer (at least 3 sec) locally config  Interior and exterior peers: need to exchange reachability information among interior peers before updating intra AS forwarding table. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 32Two Types of BGP Neighbor Relationships • External Neighbor (eBGP) in a different Autonomous Systems • Internal Neighbor (iBGP) in the same Autonomous System AS1 iBGP is routed (using IGP) eBGP iBGP AS2 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 33IBGP and EBGP IGP: Interior Gateway Protocol. IBGP Examples: ISIS, OSPF R3 IGP R2 A AS1 EBGP announce B AS2 R1 AS3 R5 R4 border router R internal router B Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 34IBGP  Why is IGP (OSPF, ISIS) not used  In large ASs full route table is very large (100K routes)  Rate of change of routes is frequent  Tremendous amount of control traffic  Not to mention Dijkstra computation being evoked for any change…  BGP policy information may be lost  IBGP :Within an AS  Same protocol/state machines as EBGP  But different rules about advertising prefixes  Prefix learned from an IBGP neighbor cannot be advertised to another IBGP neighbor to avoid looping = need full IBGP mesh ASPATH cannot be used internally. Why Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 35IBGP vs EBGP  IBGP nodes: typically ABRs, or other nodes where default routes terminate  IBGP peering sessions between every pair of routers within an AS: full mesh. Physical link A IBGP session D C B AS1 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 36iBGP Peers: Fully Meshed  iBGP is needed to avoid routing eBGP update loops within an AS  Full Mesh =  Independent of physical connectivity.  Single link may see same iBGP updates update multiple times  iBGP neighbors do not announce routes received via iBGP to other iBGP neighbors.  Is iBGP an IGP NO  Set of neighbor relationships to transfer BGP info Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 37IBGP Scaling: Route Reflection  Add hierarchy to IBGP  Route reflector: A router whose BGP implementation supports the readvertisement of routes between IBGP neighbors  Route reflector client: A router which depends on route reflector to readvertise its routes to entire AS and learn routes from the route reflector Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 38Route Reflection 128.23.0.0/16 RR2 RRC4 RRC1 RR1 RR3 RRC3 RRC2 AS1 ER EBGP 10.0.0.0/24 AS2 IBGP Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 39AS Confederations  Divide and conquer: Divides a large AS into sub ASs SubAS 11 10 14 13 R1 12 AS1 R2 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 40CIDR  Shortage of class Bs = give out a set of class Cs instead of one class B address  Problem: every class C n/w needs a routing entry  Solution: Classless Interdomain Routing (CIDR).  Also called “supernetting”  Key: allocate addresses such that they can be summarized, I.e., contiguously.  Share same higher order bits (I.e. prefix)  Routing tables and protocols must be capable of carrying a subnet mask. Notation: 128.13.0/23  When an IP address matches multiple entries (eg 194.0.22.1), choose the one which had the longest mask (“longestprefix match”) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 41RFC 1519: Classless InterDomain Routing (CIDR) PreCIDR: Network ID ended on 8, 16, 24 bit boundary CIDR: Network ID can end at any bit boundary IP Address : 12.4.0.0 IP Mask: 255.254.0.0 Address 00001100 00000100 00000000 00000000 Mask 11111111 11111110 00000000 00000000 Network Prefix for hosts Usually written as 12.4.0.0/15, a.k.a “supernetting” Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 42Understanding Prefixes and Masks (Recap) 12.5.9.16 is covered by prefix 12.4.0.0/15 12.5.9.16 00001100 00000101 00001001 00010000 00001100 00000100 00000000 00000000 12.4.0.0/15 11111111 11111110 00000000 00000000 12.7.9.16 00001100 00000111 00001001 00010000 12.7.9.16 is not covered by prefix 12.4.0.0/15 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 43Interdomain Routing Without CIDR 204.71.0.0 204.71.0.0 204.71.1.0 Global 204.71.1.0 204.71.2.0 Service Internet 204.71.2.0 …...……. Provider Routing …...……. Mesh 204.71.255.0 204.71.255.0 Interdomain Routing With CIDR 204.71.0.0 204.71.1.0 Global 204.71.2.0 Service Internet 204.71.0.0/16 …...……. Provider Routing Mesh 204.71.255.0 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 44Longest Prefix Match (Classless) Forwarding Destination =12.5.9.16 payload Prefix Next Hop Interface OK 0.0.0.0/0 10.14.11.33 ATM 5/0/9 better 12.0.0.0/8 10.14.22.19 ATM 5/0/8 12.4.0.0/15 10.1.3.77 Ethernet 0/1/3 even better best 12.5.8.0/23 attached Serial 1/0/7 IP Forwarding Table Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 45What is Routing Policy  Policy refers to arbitrary preference among a menu of available routes (based upon routes’ attributes)  Public description of the relationship between external BGP peers  Can also describe internal BGP peer relationship  Eg: Who are my BGP peers  What routes are  Originated by a peer  Imported from each peer  Exported to each peer  Preferred when multiple routes exist  What to do if no route exists Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 46Routing Policy Example  AS1 originates prefix “d”  AS1 exports “d” to AS2, AS2 imports  AS2 exports “d” to AS3, AS3 imports  AS3 exports “d” to AS5, AS5 imports Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 47Routing Policy Example (cont)  AS5 also imports “d” from AS4  Which route does it prefer  Does it matter  Consider case where AS3 = Commercial Internet AS4 = Internet2 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 48Import and Export Policies  Inbound filtering controls outbound traffic  filters route updates received from other peers  filtering based on IP prefixes, ASPATH, community  Outbound Filtering controls inbound traffic  forwarding a route means others may choose to reach the prefix through you  not forwarding a route means others must use another router to reach the prefix  Attribute Manipulation  Import: LOCALPREF (manipulate trust)  Export: ASPATH and MEDs Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 49Attributes are Used to Select Best Routes 192.0.2.0/24 pick me 192.0.2.0/24 192.0.2.0/24 pick me pick me Given multiple routes to the same 192.0.2.0/24 pick me prefix, a BGP speaker must pick at most one best route (Note: it could reject them all) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 50BGP Policy Knob: Attributes Value Code Reference 1 ORIGIN RFC1771 2 ASPATH RFC1771 We will cover a 3 NEXTHOP RFC1771 4 MULTIEXITDISC RFC1771 subset of these 5 LOCALPREF RFC1771 attributes 6 ATOMICAGGREGATE RFC1771 7 AGGREGATOR RFC1771 8 COMMUNITY RFC1997 9 ORIGINATORID RFC2796 10 CLUSTERLIST RFC2796 11 DPA Chen 12 ADVERTISER RFC1863 13 RCIDPATH / CLUSTERID RFC1863 14 MPREACHNLRI RFC2283 15 MPUNREACHNLRI RFC2283 Not all attributes 16 EXTENDED COMMUNITIES Rosen need to be present in ... 255 reserved for development every announcement From IANA: http://www.iana.org/assignments/bgpparameters Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 51BGP Route Processing Apply Policy = Apply Policy = filter routes Receive Based on Best Transmit filter routes tweak BGP Attribute Routes BGP tweak attributes Updates Values Updates attributes Apply Import Best Route Best Route Apply Export Policies Selection Table Policies Install forwarding Entries for best Routes. IP Forwarding Table Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 52Import and Export Policies  For inbound traffic  Filter outbound routes  Tweak attributes on outbound outbound routes in the inbound routes hope of influencing your traffic neighbor’s best route selection  For outbound traffic  Filter inbound routes inbound outbound  Tweak attributes on routes traffic inbound routes to influence best route selection In general, an AS has more control over outbound traffic Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 53Policy Implementation Flow Adj Main Adj Incom Outgo RIB BGP RIB ing ing In RIB Out Static Main IGPs RIB/ HW FIB Info Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 54Conceptual Model of BGP Operation  RIB : Routing Information Base  AdjRIBIn: Prefixes learned from neighbors. As many AdjRIBIn as there are peers  LocRIB: Prefixes selected for local use after analyzing AdjRIBIns. This RIB is advertised internally.  AdjRIBOut : Stores prefixes advertised to a particular neighbor. As many AdjRIBOut as there are neighbors Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 55UPDATE message in BGP  Primary message between two BGP speakers.  Used to advertise/withdraw IP prefixes (NLRI)  Path attributes field : unique to BGP  Apply to all prefixes specified in NLRI field  Optional vs Wellknown; Transitive vs Nontransitive 2 octets Withdrawn Routes Length Withdrawn Routes (variable length) Total Path Attributes Length Path Attributes (variable length) Network Layer Reachability Info. (NLRI: variable length) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 56Path Attributes: ORIGIN  ORIGIN: Describes how a prefix came to BGP at the origin AS Prefixes are learned from a source and “injected” into BGP:  Directly connected interfaces, manually configured static routes, dynamic IGP or EGP Values: IGP (EGP): Prefix learnt from IGP (EGP) INCOMPLETE: Static routes Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 57Path Attributes: ASPATH  List of ASs thru which the prefix announcement has passed. AS on path adds ASN to ASPATH  Eg: 138.39.0.0/16 originates at AS1 and is advertised to AS3 via AS2.  Eg: ASSEQUENCE: “100 200”  Used for loop detection and path selection AS1 AS3 (100) (15) 138.39.0.0/16 AS2 (200) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 58Traffic Often Follows ASPATH 135.207.0.0/16 ASPATH = 3 2 1 AS 1 AS 3 AS 4 AS 2 135.207.0.0/16 IP Packet Dest = 135.207.44.66 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 59… But It Might Not AS 2 filters all subnets with masks longer than /24 135.207.0.0/16 ASPATH = 1 135.207.0.0/16 135.207.44.0/25 ASPATH = 3 2 1 ASPATH = 5 AS 1 AS 3 AS 4 AS 2 135.207.0.0/16 IP Packet Dest = 135.207.44.66 From AS 4, it may look like this packet will take path 3 2 1, but it AS 5 actually takes path 3 2 5 135.207.44.0/25 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 60Shorter ASPATH Doesn’t Mean Shorter Hops BGP says that path 4 1 is better than path 3 2 1 Duh AS 4 AS 3 AS 2 AS 1 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 61Path Attributes: NEXTHOP  Nexthop: node to which packets must be sent for the IP prefixes. May not be same as peer.  UPDATE for 180.20.0.0, NEXTHOP= 170.10.20.3 BGP Speakers Not a BGP Speaker Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 62Recursive Lookup  If routes (prefix) are learnt thru iBGP, NEXTHOP is the iBGP router which originated the route.  Note: iBGP peer might be several IPlevel hops away as determined by the IGP  Hence BGP NEXTHOP is not the same as IP next hop  BGP therefore checks if the “NEXTHOP” is reachable through its IGP.  If so, it installs the IGP nexthop for the prefix  This process is known as “recursive lookup” – the lookup is done in the controlplane (not dataplane) before populating the forwarding table.  Example in next slide Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 63Join EGP with IGP For Connectivity 135.207.0.0/16 Next Hop = 192.0.2.1 135.207.0.0/16 10.10.10.10 AS 1 AS 2 192.0.2.1 192.0.2.0/30 Forwarding Table destination next hop 192.0.2.0/30 10.10.10.10 Forwarding Table + destination next hop EGP 135.207.0.0/16 10.10.10.10 destination next hop 192.0.2.0/30 10.10.10.10 135.207.0.0/16 192.0.2.1 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 64LoadBalancing Knobs in BGP  LOCALPREF: outbound traffic, local preference (box level knob)  MED: Inboundtraffic, typically from the same ISP (link level knob) AS1 AS2 Local Preference MED Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 65Path Attribute: LOCALPREF  Locally configured indication about which path is preferred to exit the AS in order to reach a certain network. Default value = 100. Higher is better. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 66Attributes: MULTIEXIT Discriminator Link A AS3 AS2 AS1 Link B AS4  Also called METRIC or MED Attribute. Lower is better  AS1:multihomed customer.  AS2 (provider) includes MED to AS1  AS1 chooses which link (NEXTHOP) to use  Eg: traffic to AS3 can go thru Link1, and AS2 thru Link2 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 67Hot Potato Routing: Closest Egress Point 192.44.78.0/24 egress 2 egress 1 56 IGP distances 15 This Router has two BGP routes to 192.44.78.0/24. Hot potato: get traffic off of your network as Soon as possible. Go for egress 1 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 68Getting Burned by the Hot Potato Heavy 2865 Content High bandwidth Web Farm Provider 17 backbone SFF NYC Low b/w customer 56 15 backbone San Diego Many customers want tiny http request their provider to huge http reply carry the bits Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 69Cold Potato Routing with MEDs (MultiExit Discriminator Attribute) Heavy Prefer lower 2865 Content MED values Web Farm 17 192.44.78.0/24 192.44.78.0/24 MED = 56 MED = 15 56 15 192.44.78.0/24 This means that MEDs must be considered BEFORE IGP distance Note1 : some providers will not listen to MEDs Note2 : MEDs need not be tied to IGP distance Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 70MEDs Can Export Internal Instability Heavy 2865 Content Web Farm FLAP 17 FLAP 192.44.78.0/24 192.44.78.0/24 MED = 56 OR 10 MED = 15 10 FLAP FLAP FLAP 56 FLAP 15 192.44.78.0/24 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 71ASPATH Padding: Shed inbound traffic provider AS 1 192.0.2.0/24 192.0.2.0/24 ASPATH = 2 2 2 ASPATH = 2 Padding will (usually) primary backup force inbound traffic from AS 1 customer 192.0.2.0/24 to take primary link AS 2 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 72Padding May Not Shut Off All Traffic AS 1 AS 3 provider provider 192.0.2.0/24 192.0.2.0/24 ASPATH = 2 ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 AS 3 will send traffic on primary backup “backup” link because it prefers customer routes and customer 192.0.2.0/24 local preference is AS 2 considered before ASPATH length Padding in this way is often used as a form of load Shivkumar Kalyanaraman Rensselaer Polytechnic Institute balancing 73Deaggregation + Multihoming If AS 1 does 12.2.0.0/16 not announce the 12.2.0.0/16 12.0.0.0/8 more specific prefix, then most traffic AS 3 AS 1 to AS 2 will go provider provider through AS 3 because it is a longer match customer AS 2 12.2.0.0/16 AS 2 is “punching a hole” in the CIDR block of AS 1= subverts CIDR Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 74CIDR at Work, No load balancing Table at ISP3 Prefix Next ORIGIN Hop AS 128.32/11 ISP1 ISP1 140.64/10 ISP2 ISP2 ISP1 AS1 128.32/11 128.40/16 140.127/16 ISP3 ISP2 140.64/10 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 75CIDR Subverted for Load Balancing Table at ISP3 Prefix Next ORIGIN Hop AS 128.32/11 ISP1 ISP1 140.64/10 ISP2 ISP2 140.255.20/24 ISP1 AS1 128.42.10/24 ISP2 AS1 ISP1 AS1 128.32/11 128.40/16 140.127/16 ISP3 ISP2 140.64/10 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 76How Can Routes be Colored BGP Communities A community value is 32 bits • Used within and between ASes • The set of ASes must agree on how to interpret the community value By convention, community • Very powerful BECAUSE it first 16 bits is number has no (predefined) meaning ASN indicating who is giving it an interpretation Community Attribute = a list of community values. (So one route can belong to multiple communities) Two reserved communities noexport = 0xFFFFFF01: don’t export out of AS RFC 1997 (August 1996) noadvertise 0xFFFFFF02: don’t pass to BGP neighbors Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 77Communities Example  1:100 To Customers  Customer routes 1:100, 1:200, 1:300  1:200 To Peers  Peer routes 1:100  1:300 To Providers  Provider Routes 1:100 Import Export AS 1 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 78BGP Route Selection Process Series of tiebreaker decisions...  If NEXTHOP is inaccessible do not consider the route.  Prefer largest LOCALPREF  If same LOCALPREF prefer the shortest ASPATH.  If all paths are external prefer the lowest ORIGIN code (IGPEGPINCOMPLETE).  If ORIGIN codes are the same prefer the lowest MED.  If MED is same, prefer mincost NEXTHOP  If routes learned from EBGP or IBGP, prefer paths learnt from EBGP  Final tiebreak: Prefer the route with IBGP ID (IP address) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 79Route Selection Summary Enforce relationships Highest Local Preference Shortest ASPATH Lowest MED traffic engineering iBGP eBGP Lowest IGP cost to BGP egress Throw up hands and Lowest router ID break ties Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 80Caveat • BGP is not guaranteed to converge on a stable routing. Policy interactions could lead to “livelock” protocol oscillations. See “Persistent Route Oscillations in Interdomain Routing” by K. Varadhan, R. Govindan, and D. Estrin. ISI report, 1996 • Corollary: BGP is not guaranteed to recover from network failures. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 81BGP Table Growth Thanks: Geoff Huston. http://www.telstra.net/ops/bgptable.html Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 82Large BGP Tables Considered Harmful • Routing tables must store best routes and alternate routes • Burden can be large for routers with many alternate routes (route reflectors for example) • Routers have been known to die • Increases CPU load, especially during session reset Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 83ASNs Growth Shivkumar Kalyanaraman Rensselaer PolFrom ytechnic : IGe nstitut of ef Huston. http://www.telstra.net/ops 84Dealing with ASN growth…  Make ASNs larger than 16 bits  How about 32 bits  See Internet Draft: “BGP support for fouroctet AS number space” (draftietfidras4bytes03.txt)  Requires protocol change and wide deployment  Change the way ASNs are used  Allow multihomed, nontransit networks to use private ASNs  Uses ASE (AS number Substitution on Egress )  See Internet Draft: “Autonomous System Number Substitution on Egress” (draftjhaasase00.txt)  Works at edge, requires protocol change (for loop prevention) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 85Daily Update Count Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 86A Few Bad Apples … Most prefixes are stable most of the time. On this day, about 83 of the prefixes were not updated. Typically, 80 of the updates are for less than 5 Of the prefixes. Percent of BGP table prefixes Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Thanks to Madanlal Musuvathi for this plot. Data source: RIPE NCC 87Squashing Updates  Rate limiting on sending updates Effective in  Send batch of updates every dampening MinRouteAdvertisementInterval oscillations seconds (+/ random fuzz) inherent in the vectoring  Default value is 30 seconds approach  A router can change its mind about best routes many times within this interval without telling neighbors Must be turned on  Route Flap Dampening with configuration  Punish routes for misbehaving Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 88Route Flap Dampening (RFC 2439) Routes are given a penalty for changing. If penalty exceeds suppress limit, the route is dampened. When the route is not changing, its penalty decays exponentially. If the penalty goes below reuse limit, then it is announced again. • Can dramatically reduce the number of BGP updates • Requires additional router resources • Applied on eBGP inbound only Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 89Route Flap Dampening Example route dampened for nearly 1 hour Shivkumar Kalyanaraman Rensselaer Polytechnic Institute penalty for each flap = 1000 90How Long Does BGP Take to Adapt to Changes 100 90 80 70 60 Tup Tshort 50 Tlong Tdow n 40 30 20 10 0 0 20 40 60 80 100 120 140 160 Seconds Until Convergence From: Abha Ahuja and Craig Labovitz Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 91 Cumulative Percentage of EventsTwo Main Factors in Delayed Convergence  Rate limiting timer slows everything down  BGP can explore many alternate paths before giving up or arriving at a new path No global knowledge in vectoring protocols Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 92Implementation Does Matter stateless withdraws stateful withdraws widely deployed widely deployed Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 93What is RPSL Why  Object oriented language (developed by RIPE 181)  Structured objects  Describes things interesting to routing policy  Routes, ASNs, Peer Relationships etc  Allows consistent configuration between BGP peers  Expertise encoded in the tools that generate the policy rather than engineer configuring peering session  Automatic, manageable solution for filter generation FOR MORE INFO... RFC 2622 “Routing Policy Specification Language (RPSL)” Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 94Summary  BGP is a fairly simple protocol …  … but it is not easy to configure  BGP is running on more than 100K routers making it one of world’s largest and most visible distributed systems  Global dynamics and scaling principles are still not well understood  Traffic Engineering hacked in as an afterthought… Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 95
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.ShivJindal
User Type:
Teacher
Country:
India
Uploaded Date:
19-07-2017