Question? Leave a message!




CISCO BGP TROUBLESHOOTING

HOW TO TROUBLESHOOTING BGP ROUTING PROTOCOL IN CISCO ENVIRONMENT. THIS IS A GOOD STUFF FOR WORKING PROFESSIONAL WHO ARE IN NETWORK SUPPORT ROLE IN BIG ORGANIZATION AND ISP END.
BRKRST3320 1 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Troubleshooting BGP BRKRST3320 BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 2 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrOverview • Troubleshooting Peers • BGP Convergence • High Utilization • BGP Routing Problems BRKRST3320 3 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Troubleshooting Peers BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 4 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Speakers Won’t Peer • This can be difficult to troubleshoot if you can only see one side of the connection • Start with the simple things: check for common mistakes Is it supposed to be configured for eBGP multihop Are the AS numbers right • Next, try pinging the peering address If the ping fails, there’s likely a connectivity problem BRKRST3320 5 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Speakers Won’t Peer Routerenable • Try some alternate ping Routerping Protocol ip: options Target IP address: 192.168.40.1 Repeat count 5: • Is the local peering address Datagram size 100: Timeout in seconds 2: the actual peering interface Extended commands n: y Source address or interface: 172.16.23.2 If not, use extended ping to source from the loopback or actual peering address If this fails, there is an underlying routing problem The other router may not know how to reach your peering interface BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 6 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scr No Peering Ping: FailsBGP Speakers Won’t Peer Routerenable • Try extended ping to sweep Routerping Protocol ip: a range of possible MTUs Target IP address: 192.168.40.1 Repeat count 5: Datagram size 100: Note the MTU at which the Timeout in seconds 2: ping starts to fail Extended commands n: y Source address or interface: Type of service 0: Make certain the interface is Set DF bit in IP header no: configured for that MTU size Validate reply data no: Data pattern 0xABCD: Loose, Strict, Record, Timestamp, Verbosenone: • If these all fail Sweep range of sizes n: y Sweep min size 36: 100 None of the pings work no Sweep max size 18024: 2500 Sweep interval 1: 100 matter how you try.... .... It’s likely a transport problem Drop back and punt BRKRST3320 7 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Speakers Won’t Peer • Remember that BGP runs on top of IP, and can be affected by: Rate limiting Traffic shaping Tunneling problems IP reachability problems (the underlying routing isn’t working) TCP problems Etc. BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 8 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Speakers Won’t Peer Useful Peer Troubleshooting Commands show tcp brief all TCB Local Address Foreign Address (state) 64316F14 1.1.1.1.12345 2.2.2.2.179 ESTAB 6431BA8C .179 2.2.2.2. LISTEN 62FFDEF4 . . LISTEN show tcp statistics Rcvd: 7005 Total, 10 no port 0 checksum error, 0 bad offset, 0 too short .... 0 outoforder packets (0 bytes) .... 4186 ack packets (73521 bytes) .... Sent: 9150 Total, 0 urgent packets 4810 control packets (including 127 retransmitted) 2172 data packets (71504 bytes) .... BRKRST3320 9 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Speakers Won’t Peer Useful Peer Troubleshooting Commands debug ip tcp transactions R1sh log i TCP0: TCP0: state was ESTAB FINWAIT1 12345 2.2.2.2(179) TCP0: sending FIN TCP0: state was FINWAIT1 FINWAIT2 12345 2.2.2.2(179) TCP0: FIN processed TCP0: state was FINWAIT2 TIMEWAIT 12345 2.2.2.2(179) This can be very chatty, so be careful with this debug TCP0: Connection to 2.2.2.2:179, advertising MSS 1460 TCP0: state was CLOSED SYNSENT 12346 2.2.2.2(179) TCP0: state was SYNSENT ESTAB 12346 2.2.2.2(179) TCP0: tcb 6430DCDC connection to 2.2.2.2:179, received MSS 1460, MSS is 1460 BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 10 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Speakers Won’t Peer • If the connectivity is good, the next step is to check BGP itself • debug ip bgp Use with caution Configure so the output goes to the log, rather than the console logging buffered size no logging console It’s easier to find the problem points this way routershow log i NOTIFICATION BRKRST3320 11 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Speakers Won’t Peer • show ip bgp neighbor 1.1.1.1 include last reset This should give you the resets for a peer The same information as is shown through debug ip bgp • bgp logneighbor changes Provides much of the same information as debug ip bgp, as well BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 12 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Speakers Won’t Peer Source/Destination Address Matching neighbor 2.2.2.2 remoteas 100 ƒ Both sides must agree on neighbor 2.2.2.2 updatesource loopback 0 source and destination addresses ƒ R1 and R2 do not agree on R1 R2 what addresses to use BGP will tear down the TCP session due to the conflict neighbor 10.1.1.1 remoteas 100 Points out configuration neighbor 10.1.1.1 updatesource loopback 0 problems and adds some security BRKRST3320 13 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Speakers Won’t Peer Source/Destination Address Matching ƒ R2 attempts to open a session to R1 BGP: 10.1.1.1 open active, local address 2.2.2.2 ƒ R1 denies the session because of the address mismatch ƒ debug ip bgp on R1 shows BGP: 2.2.2.2 passive open to 10.1.1.1 BGP: 2.2.2.2 passive open failed 10.1.1.1 is not updatesource Loopback0's address (1.1.1.1) BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 14 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scr 1.1.1.1 10.1.1.1 10.1.1.2 2.2.2.2BGP Speakers Won’t Peer Active vs. Passive Peer neighbor 2.2.2.2 remoteas 100 ƒ Active Session neighbor 2.2.2.2 connectionmode If the TCP session initiated by active R1 is the one used between R1 R2 then R1 “actively” established the session. ƒ Passive Session R1 R2 For the same scenario R2 “passively” established the session. ƒ R1 Actively opened the neighbor 10.1.1.1 remoteas 100 session neighbor 10.1.1.1 connectionmode passive ƒ R2 Passively accepted the session ƒ Can be configured neighbor x.x.x.x transport connectionmode activepassive BRKRST3320 15 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Speakers Won’t Peer Active vs. Passive Peer ƒ Use show ip bgp neighbor to determine if a router actively or passively established a session R1show ip bgp neighbors 2.2.2.2 BGP neighbor is 2.2.2.2, remote AS 200, external link BGP version 4, remote router ID 2.2.2.2 snip Local host: 1.1.1.1, Local port: 12343 Foreign host: 2.2.2.2, Foreign port: 179 ƒ TCP open from R1 to R2’s port 179 established the session ƒ Tells us that R1 actively established the session BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 16 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scr 1.1.1.1 10.1.1.1 10.1.1.2 2.2.2.2BGP Speakers Won’t Peer Session Collisions neighbor 2.2.2.2 remoteas 100 ƒ Both speakers initiate their neighbor 2.2.2.2 connectionmode active sessions at the same time ƒ The active session established by the peer R1 R2 with the highest routerID is the winner This rarely happens neighbor 10.1.1.1 remoteas 100 neighbor 10.1.1.1 connectionmode Not an issue if this passive does occur BRKRST3320 17 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Speakers Won’t Peer R2 Time to Live AS65001 ƒ BGP uses a TTL of 1 for eBGP peers Default TTL ƒ For eBGP peers that are more than 1 hop away a Configured TTL larger TTL must be used ƒ neighbor x.x.x.x ebgpmultihop 2255 AS65000 R1 R1show ip bgp neighbors 2.2.2.2 inc External BGP snip External BGP neighbor may be up to 1 hops away. BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 18 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scr 1.1.1.1 10.1.1.1 10.1.1.2 2.2.2.2BGP Speakers Won’t Peer Bad Messages BGP3NOTIFICATION: sent to neighbor 2.2.2.2 2/2 (peer in wrong AS) 2 bytes 00C8 FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 002D 0104 00C8 00B4 0202 0202 1002 0601 0400 0100 0102 0280 0002 0202 00 unknown subcode The peer open notification subcode isn’t known incompatible BGP version The version of BGP the peer is running isn’t compatible with the local version of BGP peer in wrong AS The AS this peer is locally configured for doesn’t match the AS the peer is advertising BGP identifier wrong The BGP router ID is the same as the local BGP router ID unsupported optional There is an option in the packet which the local BGP parameter speaker doesn’t recognize authentication failure The MD5 hash on the received packet does not match the correct MD5 hash unacceptable hold time The remove BGP peer has requested a BGP hold time which is not allowed (too low) unsupported/disjoint capability The peer has asked for support for a feature which the local router does not support BRKRST3320 19 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Speaker Flap Case Study R1 • Here we see a message from bgp logneighborchanges telling us the hold timer expired • We can double check this by looking at show ip bgp neighbor x.x.x.x include last reset R2 BGP5ADJCHANGE: neighbor 10.1.1.1 Down BGP Notification sent BGP3NOTIFICATION: sent to neighbor 1.1.1.1 4/0 (hold time expired) 0 bytes R2show ip bgp neighbor 10.1.1.1 include last reset Last reset 00:01:02, due to BGP Notification sent,hold time expired BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 20 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Speaker Flap Case Study R1 ƒ There are lots of possibilities here R1 has a problem sending keepalives The keepalives are lost in the cloud R2 has a problem receiving R2 the keepalive BGP5ADJCHANGE: neighbor 10.1.1.1 Down BGP Notification sent BGP3NOTIFICATION: sent to neighbor 1.1.1.1 4/0 (hold time expired) 0 bytes R2show ip bgp neighbor 10.1.1.1 include last reset Last reset 00:01:02, due to BGP Notification sent,hold time expired BRKRST3320 21 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Speaker Flap Case Study • Did R1 build and transmit a keepalive for R2 debug ip bgp keepalive show ip bgp neighbor • When did we last send or receive data with the peer R2show ip bgp neighbors 1.1.1.1 BGP neighbor is 1.1.1.1, remote AS 100, external link BGP version 4, remote router ID 1.1.1.1 BGP state = Established, up for 00:12:49 Last read 00:00:45, last write 00:00:44, hold time is 180, keepalive interval is 60 seconds • If R1 did not build and transmit a KA How is R1 on memory What is the R1’s CPU load Is R2’s TCP window open BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 22 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Speaker Flap Case Study R2show ip bgp sum begin Neighbor Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 2.2.2.2 4 2 53 284 10167 0 97 00:02:15 0 But the number of packets The number of packets At least one BGP transmitted is not increasing generated is increasing keepalive interval apart R2show ip bgp summary begin Neighbor Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 2.2.2.2 4 2 53 284 10167 0 98 00:03:04 0 The keepalives aren’t leaving R2 BRKRST3320 23 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Speaker Flap Case Study • Go back to square one and check the IP connectivity This is a layer 2 or 3 transport issue, etc. R1ping 10.2.2.2 Type escape sequence to abort. Sending 5, 100byte ICMP Echos to 2.2.2.2, timeout is 2 seconds: Success rate is 100 percent (5/5), roundtrip min/avg/max = 16/21/24 m R1ping ip Target IP address: 10.2.2.2 Repeat count 5: Datagram size 100: 1500 Timeout in seconds 2: Extended commands n: Sweep range of sizes n: Type escape sequence to abort. Sending 5, 1500byte ICMP Echos to 2.2.2.2, timeout is 2 seconds: ..... Success rate is 0 percent (0/5) BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 24 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Convergence BRKRST3320 25 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence • Hey—Who are you calling slow Slow is a relative term.... BGP probably won’t ever converge as fast as any of the IGPs • Two general convergence situations Initial startup between peers Route changes between existing peers BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 26 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Slow Convergence Initial Convergence • Initial convergence is limited by The number of packets required to transfer the entire BGP database The number of routes The ability of BGP to pack routes into a small number of packets The number of peer specific policies TCP transport issues How often does TCP go into slow start How much can TCP put into one packet BRKRST3320 27 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Initial Convergence ƒ BGP starts a packet by building an attribute set ƒ It then packs as many destinations (NLRIs) as it can into the packet Only destinations with the same attribute set can be placed in the packet Destinations can only be put into the packet until it’s full ƒ First rule of thumb: to increase convergence speed, decrease unique sets of attributes BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 28 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scr Less Efficient NLRI Attribute NLRI Attribute More Efficient NLRI NLRI AttributeBGP Slow Convergence Initial Convergence ƒ The larger the packet BGP can build, the more destinations it can put in the packet The more you can put in a single packet, the less often you have to repeat the same attributes Second rule of thumb: allow BGP to use the largest packets possible BRKRST3320 29 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Initial Convergence • BGP must create packets based the policies towards each peer Third rule of thumb: Minimize the number of unique policies towards eBGP peers BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 30 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scr NLRI NLRI Attribute Less Efficient NLRI Attribute NLRI Attribute Less Efficient More Efficient NLRI NLRI Attribute NLRI NLRI Attribute Less Efficient NLRI NLRI Attribute NLRI NLRI Attribute NLRI NLRI NLRI NLRI Attribute More Efficient More Efficient NLRI NLRI AttributeBGP Slow Convergence Initial Convergence • TCP Interactions Each time a TCP packet is dropped, the session goes into slow start It takes a good deal of time for a TCP session to come out of slow start Fourth rule of Thumb: Try and reduce the circumstances under which a TCP segment will be dropped during initial convergence BRKRST3320 31 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Initial Convergence • Bottom Line: Hold down the number of unique attributes per route Don’t send communities if you don’t need to, etc Hold down the number of policies towards eBGP peers Try to find a small set of common policies, rather than individualizing policies per peer Stop TCP segment drops Increase input queues Increase SPD thresholds Make certain links are clean BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 32 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Slow Convergence Initial Convergence Convergence Input Queue ƒ Here we see the results of time (minutes) Drops setting up maximum sized 20 250K input queues A single router running 16 200K 12.0(18)S 12 150K 100 to 500 peers in a single peer group 8 100K Sending 100,000+ routes to each peer 4 50K ƒ Increasing the input queue 0 0 sizes Reduced the input queue Peer group members drops by a factor of 10 12.0(18)S Reduces convergence speed by 50 BRKRST3320 33 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Initial Convergence ƒ TCP MTU path discovery 350 allows BGP to use the 300 largest packets possible 250 ƒ Without PMTU discovery, 200 we can support 100 peers with 120,000 routes each 150 100 ƒ With PMTU discover, we can support 175 peers with 50 120,000 routes each 0 ƒ Note this is 12.0(18)S, Cisco 80K 90K 100K 110K 120K IOS Software can support Routes more than this now BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 34 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scr Supported Peers 100 200 300 400 500BGP Slow Convergence Route Change Convergence ƒ There are two elements to route change convergence for BGP How long does it take to see the failure How long does it take to propagate information about the failure ƒ For faster peer down detection, there are several tools you can use Fast layer two down detection Fast external fallover for directly connected eBGP peers Faster keepalive and dead interval timers Down to 3 and 9 are commonly used today BRKRST3320 35 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Route Change Convergence • Fast Session Deactivation Link fails; IGP eBGP Multihop Session converges The address of each peer is registered with the Address Tracking Filter (ATF) system BGP Tears Down When the state of the route eBGP Session changes, ATF notifies BGP BGP tears down the peer impacted BGP/RIB ATF Interface BGP does not wait on the ATF Notifies BGP hold timer to expire BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 36 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Slow Convergence Route Change Convergence • Very dangerous for iBGP peers IGP may not have a route to a peer for a split second FSD would tear down the BGP session Imagine if you lose your IGP route to your RR (Route Reflector) for just 100ms • Off by default neighbor x.x.x.x fallover BRKRST3320 37 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Route Change Convergence • ATF can also be used to track changes in next hops iBGP recurses onto an IGP next hop to find a path through the local AS Changes in the IGP cost or reachability are normally seen only by the BGP scanner Since the scanner runs every 60 seconds, by default, this means iBGP convergence can take up to 60 seconds on an IGP change.... BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 38 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Slow Convergence Route Change Convergence • BGP Next Hop Tracking Enabled by default no bgp nexthop trigger enable • BGP registers all nexthops with ATF Hidden command will let you see a list of nexthops show ip bgp attr nexthop • ATF will let BGP know when a route change occurs for a nexthop • ATF notification will trigger a lightweight “BGP Scanner” run Bestpaths will be calculated None of the other “Full Scan” work will happen BRKRST3320 39 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Route Change Convergence • Once an ATF notification is received BGP waits 5 seconds before triggering NHT scan bgp nexthop trigger delay 0100 May lower default value as we gain experience • Event driven model allows BGP to react quickly to IGP changes No longer need to wait as long as 60 seconds for BGP to scan the table and recalculate bestpaths Tuning your IGP for fast convergence is recommended BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 40 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Slow Convergence Route Change Convergence • Dampening is used to reduce frequency of triggered scans • show ip bgp internal Displays data on when the last NHT scan occurred Time until the next NHT may occur (dampening information) • New commands bgp nexthop trigger enable bgp nexthop trigger delay 0100 show ip bgp attr nexthop ribfilter debug ip bgp events nexthop debug ip bgp ribfilter • Full BGP scan still happens every 60 seconds Full scanner will no longer recalculate bestpaths if NHT is enabled BRKRST3320 41 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Route Change Convergence • How is the timer enforced for peer X Timer starts when all routes have been advertised to X For the next MRAI (seconds) we will not propagate any bestpath changes to peer X Once X’s MRAI timer expires, send him updates and withdraws Restart the timer and the process repeats… • User may see a wave of updates and withdraws to peer X every MRAI • User will NOT see a delay of MRAI between each individual update and/or withdraw BGP would probably never converge if this was the case BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 42 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Slow Convergence Route Change Convergence • MRAI timeline for iBGP peer Bestpath Change 2 • Bestpath Change 1 at t7 is TXed immediately Bestpath Change 1 • MRAI timer starts at t7, will expire at t12 • Bestpath Change 2 at t10 t0 t5 t10 t15 t20 t25 must wait until t12 for MRAI to expire •TX update 1 •MRAI Expires •Start MRAI • Bestpath Change 2 is TXed •MRAI Expires at t12 •TX update 2 •Start MRAI • MRAI timer starts at t12, will expire at t17 • MRAI expires at t17…no updates are pending BRKRST3320 43 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Route Change Convergence • BGP is not a link state protocol • May take several “rounds/cycles” of exchanging updates and withdraws for the network to converge • MRAI must expire between each round • The more fully meshed the network and the more tiers of ASes, the more rounds required for convergence • Think about How many tiers of ASes there are in the Internet How meshy peering can be in the Internet BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 44 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Slow Convergence Route Change Convergence 10.0.0.0/8 • Full mesh is the worst case MRAI convergence R1 R2 scenario R3 R4 • R1 will send a withdraw to all peers for 10.0.0.0/8 • Count the number of R2 R1 R3,R1 R4,R1 rounds of UPDATEs and R3 R1 R2,R1 R4,R1 R4 R1 R2,R1 R3,R1 withdraws until the network has converged • Note how MRAI slows convergence •Blue path is the bestpath BRKRST3320 45 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Route Change Convergence 10.0.0.0/8 • R1 withdraws 10.0.0.0/8 to all peers R1 R2 • R1 starts a MRAI timer for R3 R4 each peer R2 R1 R3,R1 R4,R1 R3 R1 R2,R1 R4,R1 R4 R1 R2,R1 R3,R1 Withdraw Denied Update Update BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 46 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Slow Convergence Route Change Convergence 10.0.0.0/8 • R2, R3, R4 recalculate their R1 R2 bestpaths R3 R4 • R2, R3, R4 send updates based on new bestpaths R2 R1 R3,R1 R4,R1 • R2, R3, R4 start a R3 R1 R2,R1 R4,R1 MRAI timer for each R4 R1 R2,R1 R3,R1 peer Withdraw • End of Round 1 Denied Update Update BRKRST3320 47 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Route Change Convergence 10.0.0.0/8 • R2, R3, R4 recalculate their R1 R2 bestpaths R3 R4 • R2, R3 R4 must wait for their MRAI timers to expire R2 R1 R3,R1 R4,R1 • R2, R3, R4 send R3 R1 R2,R1 R4,R2,R1 updates and withdraws R4 R1 R2,R1 R3,R2,R1 R2,R3,R1 based on their new bestpaths Withdraw Denied Update • R2, R3, R4 restart Update the MRAI timer for each peer • End of Round 2 BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 48 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Slow Convergence Route Change Convergence 10.0.0.0/8 • R3 R4 recalculate their bestpaths R1 R2 • R3 R4 must wait R3 R4 for their MRAI timers to expire • R3 R4 send updates and R2 R1 R2,R1 R4,R1 withdraws based R3 R1 R2,R1 R4,R2,R1 on their new bestpaths R4 R1 R2,R1 R3,R2,R1 R2,R3,R1 • R3 R4 restart the MRAI Withdraw timer for each peer Denied Update • End of Round 3 Update BRKRST3320 49 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Route Change Convergence 10.0.0.0/8 • R2, R3, R4 took 3 rounds of messages to converge R1 R2 • MRAI timers had to expire R3 R4 between 1st/2nd round and between 2nd/3rd round • Total MRAI convergence R2 R1 R2,R1 R4,R1 delay for this example R3 R1 R2,R1 R4,R2,R1 iBGP mesh – 10 seconds R4 R1 R2,R1 R3,R2,R1 R2,R3,R1 eBGP mesh – 60 seconds Withdraw Denied Update Update BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 50 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBGP Slow Convergence Route Change Convergence • Internet churn means we are constantly setting and waiting on MRAI timers One flapping prefix slows convergence for all prefixes Internet table sees roughly 6 bestpath changes per second • For iBGP and PECE eBGP peers neighbor x.x.x.x advertisementinterval 0 Will be the default in 12.0(32)S • For regular eBGP peers Lowering to 0 may get you dampened OK to lower for eBGP peers if they are not using dampening BRKRST3320 51 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Slow Convergence Route Change Convergence • Will a MRAI of 0 eliminate batching Somewhat but not much happens anyway TCP, the operating system, and BGP code provide some batching Process all message from peer InQs Calculate bestpaths based on received messages Format UPDATEs to advertise new bestpaths • What about CPU load from 0 second MRAI Internet table has 6 bestpath changes per second Easy for a router to handle, 5 seconds of delay is not needed BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 52 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrHigh Utilization BRKRST3320 53 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public High Utilization • High Processor Utilization • Next Hop Tracking • High Memory Utilization BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 54 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrHigh Processor Utilization • Why This could be for several reasons High route churn is the most likely router show process cpu CPU utilization for five seconds: 100/0; one minute: 99; five minutes: 81 .... 139 6795740 1020252 6660 88.34 91.63 74.01 0 BGP Router BRKRST3320 55 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public High Processor Utilization • Check how busy the peers are The Table Version You have 150k routes and see the table version increase by 150k every minute … something is wrong You have 150k routes and see the table version increase by 300 every minute … sounds like normal network churn The InQ Flood of incoming updates or build up of unprocessed updates The OutQ Flood of outgoing updates or build up of untransmitted updates router show ip bgp summary Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 10.1.1.1 4 64512 309453 157389 19981 0 253 22:06:44 111633 172.16.1.1 4 65101 188934 1047 40081 41 0 00:07:51 58430 BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 56 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrHigh Processor Utilization • If the Table Version is Changing Quickly Are you in initial convergence with this peer Is the peer flapping for some reason Examine the table entries from this peer: why are they changing If there is a group of routes which are constantly changing, consider route flap dampening • If the InQ is high You should see the table version changing quickly If it’s not, the peer isn’t acting correctly Consider shutting it down until the peer can be fixed • If the OutQ is high Lots of updates being generated Check table versions of other peers Check for underlying transport problems BRKRST3320 57 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public High Processor Utilization • Check on the BGP Scanner Walks the table looking for changed next hops Checks conditional advertisement Imports from and exports to VPNv4 VRFs router show processes include BGP Scanner 172 Lsi 407A1BFC 29144 29130 1000 8384/9000 0 BGP Scanner BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 58 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrHigh Processor Utilization • To relieve pressure on the BGP Scanner Upgrade to newer code Most of the work of the BGP Scanner has been moved to an event driven model This has reduced the impact of BGP Scanner significantly Reduce route and view count Reduce or eliminate other processes which walk the RIB SNMP routing table walks, for instance Deploy BGP Next Hop Tracking (NHT) BRKRST3320 59 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Next Hop Tracking • ATF is a middle man between the RIB and RIB clients BGP, OSPF, EIGRP, etc are all clients of the RIB • A client tells ATF what prefixes he is interested in • ATF tracks each prefix Notify the client when the route to a registered prefix changes Client is responsible for taking action based on ATF notification Provides a scalable event driven model for dealing with RIB changes BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 60 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrNext Hop Tracking • BGP tells ATF to let us know BGP BGP Nexthops about any changes 10.1.1.3 to 10.1.1.3 and 10.1.1.5 10.1.1.5 • ATF filters out any changes for 10.1.1.1/32, 10.1.1.2/32, and 10.1.1.4/32 ATF • Changes to 10.1.1.3/32 and 10.1.1.5/32 are passed along to BGP RIB 10.1.1.1/32 10.1.1.2/32 10.1.1.3/32 10.1.1.4/32 10.1.1.5/32 BRKRST3320 61 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Next Hop Tracking • BGP Next Hop Tracking Enabled by default no bgp nexthop trigger enable • BGP registers all nexthops with ATF Hidden command will let you see a list of nexthops show ip bgp attr nexthop • ATF will let BGP know when a route change occurs for a nexthop • ATF notification will trigger a lightweight “BGP Scanner” run Bestpaths will be calculated None of the other “Full Scan” work will happen BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 62 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrNext Hop Tracking • Once an ATF notification is received BGP waits 5 seconds before triggering NHT scan bgp nexthop trigger delay 0100 May lower default value as we gain experience • Event driven model allows BGP to react quickly to IGP changes No longer need to wait as long as 60 seconds for BGP to scan the table and recalculate bestpaths Tuning your IGP for fast convergence is recommended BRKRST3320 63 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Next Hop Tracking • Dampening is used to reduce frequency of triggered scans • show ip bgp internal Displays data on when the last NHT scan occurred Time until the next NHT may occur (dampening information) • New commands bgp nexthop trigger enable bgp nexthop trigger delay 0100 show ip bgp attr nexthop ribfilter debug ip bgp events nexthop debug ip bgp ribfilter • Full BGP scan still happens every 60 seconds Full scanner will no longer recalculate bestpaths if NHT is enabled BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 64 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrHigh Memory Utilization Views and Routes • Why is BGP taking up so much memory A A BGP speaker generally First View Second View receives a number of copies of the same route B or set of routes C Each of these copies of the same route or routes is called a view D A has two views of 10.1.1.0/24 10.1.1.0/24 BRKRST3320 65 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public High Memory Utilization Views and Routes • Multiple views can come from: iBGP peers peering with the same remote AS iBGP peers peering with remote AS’ with (generally) the same table This is common in the case of the global Internet eBGP peers peering with the same remote AS eBGP peers peering with remote AS’ with (generally) the same table This is common in the case of the global Internet BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 66 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrHigh Memory Utilization Views and Routes • Multiple views exist in IGPs, as well But not on the same scale Neighbor adjacencies in IGPs are generally on a lower scale In the hundreds, not the thousands Neighbor adjacencies in IGPs normally pick up different routes, rather than the same route multiple times • Each view takes up some amount of space 250,000 routes x 100 views == a lot of memory usage BRKRST3320 67 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public High Memory Utilization Views and Routes • To reduce memory consumption: Reduce the number of routes This is particularly true in providers supporting L3VPN services The route and view count can escalate quickly when supporting many customer’s L3VPNs Filter aggressively Accept partial routing tables, rather than full routing tables Reduce the number of views Use route reflectors rather than full mesh iBGP peering Peer only when needed BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 68 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrHigh Memory Utilization Attributes • BGP implementations build 10.1.1.0/24 AS Path 1 their memory structures around minimizing storage 10.1.2.0/24 AS Path 2 • Attributes are stored once 10.1.3.0/24 Rather than once per route 10.1.4.0/24 Community Set 1 Each route references an Community Set 2 attribute set, rather than storing the attribute set • This is similar to the way BGP updates are formed BRKRST3320 69 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public High Memory Utilization Attributes • The more unique attribute 10.1.1.0/24 AS Path 1 sets you’re receiving, the 10.1.2.0/24 AS Path 2 more unique attribute sets you need to store 10.1.3.0/24 AS Path 3 You might have the same 10.1.4.0/24 Community Set 1 number of routes and views over time, but memory Community Set 2 utilization can increase Community Set 3 Community Set 4 BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 70 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrHigh Memory Utilization Attributes • To Conserve Memory Strip unneeded attributes on the inbound side of eBGP peering sessions Verify you don’t really need them, or they aren’t useful after the route has transited your AS Communities are the biggest/only target Use Communities wisely within your network A large mishmash of communities can consume memory BRKRST3320 71 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public High Memory Utilization Soft Reconfiguration • B advertises 10.1.1.0/24 to A A • A filters the route locally Blocked by filter • The filters on A are changed to permit 10.1.1.0/24 • But how does A relearn 10.1.1.0/24 Advertise 10.1.1.0/24 B 10.1.1.0/24 BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 72 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrHigh Memory Utilization Soft Reconfiguration • With soft reconfiguration, A saves all the routes it receives A Blocked by filter from B Local Filter • Applies any inbound filters Local Copy between this saved copy of B’s updates and the local BGP table • If the local filters change, they Advertise 10.1.1.0/24 B can be reapplied by simply pulling all the updates from the saved table into the local BGP table 10.1.1.0/24 BRKRST3320 73 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public High Memory Utilization Soft Reconfiguration • Keeping this local copy uses a lot of memory A Blocked by filter • In general, don’t use soft Local Filter reconfiguration Local Copy • BGP now uses the route Consumes large amounts of memory refresh capability to rebuild the local table if the local filters Advertise 10.1.1.0/24 B change 10.1.1.0/24 BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 74 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrRouting Problems BRKRST3320 75 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public BGP Routing Problems • Route Reflector Loops • Route Reflector Suboptimal Routes • Inbound Traffic Path Problems BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 76 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.screBGP eBGP Route Reflector Loops B • Router B BGP NextHop: Router A Local NextHop: Router A C Set: NextHopSelf A • Router C BGP NextHop: Router B D Local NextHop: Router D • Router D BGP NextHop: Router E Local NextHop: Router C E • Router E BGP NextHop: Router A Local NextHop: Router A Set: NextHopSelf BRKRST3320 77 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Route Reflector Loops B ƒ This results in a permanent routing loop C ƒ Route reflectors must always follow the A topology D ƒ Never peer through a route reflector client to reach a route reflector E BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 78 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scr eBGP eBGP RRC RRC RRC RRCRoute Reflector Suboptimal Routing ƒ Route reflectors can also A IGP costs cause routing to be different (or suboptimal) compared to full mesh iBGP 10 5 5 ƒ E advertises 10.1.1.0/24 through eBGP to both B and 10 C 10 5 ƒ The local preference, MED, BD C AS Path length, and all other eBGP attributes are the same for eBGP 10.1.1.0/24 at both B and C E 10.1.1.0/24 BRKRST3320 79 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Route Reflector Suboptimal Routing • Assume A, B, C, and D A IGP costs are configured for full mesh iBGP Best Path 10 • A chooses B as its exit 5 5 point because of the 10 IGP cost 10 5 • D chooses C as its exit BD C Best Path point, because of the eBGP IGP cost eBGP E 10.1.1.0/24 BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 80 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrRoute Reflector Suboptimal Routing • Assume B, C and, D are A IGP costs configured as route reflector clients of A • A chooses B as its best path Best Path 10 because of the IGP cost 5 5 • A reflects this choice to C, but C chooses its locally learned 10 eBGP route over the internal through B 10 5 BD C • A reflects this choice to D, and Best Path eBGP D chooses the path through B, even though the path through C eBGP is shorter E 10.1.1.0/24 BRKRST3320 81 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Route Reflector Suboptimal Routing • There is little you can do about this • Whenever you remove routing information, you risk suboptimal routing • Keeping the route reflector topology in line with the layer 3 topology helps • iBGP multipath can resolve some of these problems At the cost of additional memory • Otherwise, use policy to choose the best exit point BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 82 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrImpacting Inbound Traffic Path ƒ I’m in AS65100 65600 ƒ Why does my traffic Come in through AS65200 and AS65300, 65300 65200 although I want it to come in through 65400 65500 AS65300 only Even though I do AS 65100 Path Prepend.... BRKRST3320 83 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Impacting Inbound Traffic Path ƒ Why would AS65200 ever 65600 prefer Path 2 over Path 1 You pay for the AS65200 link They pay for the AS65200 to AS65300 link 65300 65200 If they preferred Path 2, they Path 2 would be paying to support 65400 65500 your preferred inbound traffic Path 1 path 65100 There’s not much of a chance of this happening.... 10.1.1.0/24 BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 84 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrImpacting Inbound Traffic Path ƒ How does AS65200 implement 65600 this policy Routes received from customers are preferred over routes received from peers, 65300 65200 using Local Preference Path 2 Adding AS Path hops won’t overcome AS65200’s Local 65400 65500 Preference Path 1 65100 So, traffic from AS65500 will always come in through the AS65200 link, as long as you’re 10.1.1.0/24 advertising 10.1.1.0/24 through the link BRKRST3320 85 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Impacting Inbound Traffic Path ƒ Possible Solutions 65600 Live with traffic from AS65200’s peers coming in through this link 65300 65200 Use conditional advertisement Path 2 65400 65500 Conditional Path 1 advertisement could be 65100 slow, though 10.1.1.0/24 BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 86 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrImpacting Inbound Traffic Path ƒ Possible Solutions 65600 Use RFC1998 Communities You set a community on 10.1.1.0/24 65300 65200 AS65200 translates this community into a Local Path 2 Preference 65400 65500 AS65200 then prefers the Path 1 route through AS65300 65100 over the connected route Don’t count on this 10.1.1.0/24 happening—most providers don’t support RFC1998 communities BRKRST3320 87 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Impacting Inbound Traffic Path ƒ Why can’t I load 65600 share traffic between the two links 65300 65200 I’ve tried AS Path prepend, why doesn’t Path 2 it work 65400 65500 Path 1 65100 10.1.1.0/24 BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 88 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrImpacting Inbound Traffic Path ƒ Any traffic from AS65500 65600 will always come through AS65200 ƒ Any traffic from AS65300 will always come through 65300 65200 AS65300 Path 2 ƒ There’s no way to alter this 65400 65500 ƒ So, if the majority of your Path 1 65100 traffic comes from AS65500, there’s not much you can do.... 10.1.1.0/24 BRKRST3320 89 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Impacting Inbound Traffic Path ƒ The only traffic you can really 65600 adjust with AS Path prepend is from AS65600 You can influence which path AS65600 will take 65300 65200 Through AS65200 or through AS65200 65400 65500 This may or may not allow you to tune inbound traffic well 65100 10.1.1.0/24 BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 90 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrRecommended Reading ƒ Continue your Cisco Live learning experience with further reading from Cisco Press ƒ Check the Recommended Reading flyer for suggested books Available Onsite at the Cisco Company Store BRKRST3320 91 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public Complete Your Online Session Evaluation ƒ Give us your feedback and you could win Don’t forget to activate your Cisco Live virtual fabulous prizes. Winners announced daily. account for access to ƒ Receive 20 Passport points for each session all session material ondemand and return evaluation you complete. for our live virtual event in October 2008. ƒ Complete your session evaluation online now (open a browser through our wireless network Go to the Collaboration to access our portal) or visit one of the Internet Zone in World of Solutions or visit stations throughout the Convention Center. www.ciscolive.com. BRKRST3320 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 92 © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scrBRKRST3320 93 14702052008x1 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public © 2007, Cisco Systems, Inc. All rights reserved. 13884052007c1.scr
Website URL
Comment