Question? Leave a message!




High Speed Router Design

High Speed Router Design 33
High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1Overview  Introduction  Evolution of HighSpeed Routers  High Speed Router Components: Lookup Algorithm Classification Switching Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2What do switches/routers look like Access routers e.g. ISDN, ADSL Core router Core ATM switch e.g. OC48c POS Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3Basic Architectural Components Congestion Control Admission Control Reservation Control Routing Datapath: Output perpacket Policing Switching Scheduling processing Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4Basic Architectural Components: Forwarding Decision 3. 1. Output 2. Scheduling Forwarding Table Interconnect Forwarding Decision Forwarding Table Forwarding Decision Forwarding Table Forwarding Decision Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5Perpacket processing in an IP Router 1. Accept packet arriving on an incoming link. 2. Lookup packet destination address in the forwarding table, to identify outgoing port(s). 3. Manipulate packet header: e.g., decrement TTL, update header checksum. 4. Send (switch) packet to the outgoing port(s). 5. Classify and buffer packet in the queue. 6. Transmit packet onto outgoing link. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6Lookup and Forwarding Engine Packet header payload Router Routing Lookup Destination Outgoing Data Structure Address Port Forwarding Table Destnetwork Port 65.0.0.0/8 3 128.9.0.0/16 1 149.12.0.0/19 7 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7Example Forwarding Table Destination IP Prefix Outgoing Port 65.0.0.0/ 8 3 Prefix length 128.9.0.0/16 1 142.12.0.0/19 7 IP prefix: 032 bits 142.12.0.0/19 128.9.0.0/16 65.0.0.0/8 32 0 128.9.16.14 24 2 1 2 65.0.0.0 65.255.255.255 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8Prefixes can Overlap Longest matching prefix 128.9.176.0/24 128.9.16.0/21128.9.172.0/21 142.12.0.0/19 65.0.0.0/8 128.9.0.0/16 32 0 128.9.16.14 2 1 Routing lookup: Find the longest matching prefix (aka the most specific route) among all prefixes that match the destination address. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9Difficulty of Longest Prefix Match 2dimensional search: Prefix Length 32 Prefix Value 24 128.9.176.0/24 128.9.16.0/21 128.9.172.0/21 142.12.0.0/19 128.9.0.0/16 65.0.0.0/8 8 128.9.16.14 Prefix Values Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10 Prefix LengthLookup Rates Required Year Line Linerate 40B (Gbps) packets (Mpps) 199899 OC12c 0.622 1.94 199900 OC48c 2.5 7.81 200001 OC192c 10.0 31.25 200203 OC768c 40.0 125 31.25 Mpps  33 ns DRAM: 5080 ns, SRAM: 510 ns Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11Update Rates Required  Recent BGP studies show that updates can be: Bursty: several 100s of routes updated/withdrawn = insert/delete operations Frequent: Average 100+ updates per second  Need data structure to be efficient in terms of lookup as well as update (insert/delete) operations. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12Size of the Forwarding Table 100000 90000 Renewed Exponential 80000 Growth 70000 60000 10,000/year 50000 40000 30000 20000 10000 0 95 96 97 98 99 00 Year Renewed growth due to multihoming of enterprise networks Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Source: http://www.telstra.net/op13 s/bgptable.html Number of PrefixesPotential HyperExponential Growth Global routing table vs Moore's law since 1999 160000 Global prefixes Moore's law 150000 Double growth 140000 130000 120000 110000 100000 90000 80000 70000 60000 50000 01/99 04/99 07/99 10/99 01/00 04/00 07/00 10/00 01/01 04/01 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14 Prefixeslog N 2 Trees and Tries Binary Search Tree Binary Search Trie 0 1 0 1 0 1 010 111 N entries Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15Trees and Tries Multiway tries 16ary Search Trie 0000, ptr 1111, ptr 1111, ptr 0000, 0 1111, ptr 0000, 0 000011110000 111111111111 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16Lookup: Multiway Tries Tradeoffs Degree of Mem Nodes Total Memory Fraction 6 Tree References (Mbytes) Wasted () (x10 ) 2 48 1.09 4.3 49 4 24 0.53 4.3 73 8 16 0.35 5.6 86 16 12 0.25 8.3 93 64 8 0.17 21 98 256 6 0.12 64 99.5 15 Table produced from 2 randomly generated 48bit addresses Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17Routing Lookups in Hardware Prefix length Most prefixes are 24bits or shorter Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18 NumberRouting Lookups in Hardware Prefixes up to 24bits 24 2 = 16M entries 142.19.6 Next Hop Next Hop 1 24 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19 142.19.6.14 14 142.19.6Routing Lookups in Hardware Prefixes up to 24bits 128.3.72 Next Hop 1 Next Hop 24 Pointer 0 Prefixes above 24bits Next Hop Next Hop 8 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20 128.3.72.44 44 128.3.72 base offsetBasic Architectural Components: Interconnect 3. 1. Output 2. Scheduling Forwarding Table Interconnect Forwarding Decision Forwarding Table Forwarding Decision Forwarding Table Forwarding Decision Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21FirstGeneration IP Routers Shared Backplane Buffer CPU Memory DMA DMA DMA Line Line Line Interface Interface Interface MAC MAC MAC  Most Ethernet switches and cheap packet routers  Bottleneck can be CPU, hostadaptor or I/O bus  What is costly Bus Memory Interface CPU Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22SecondGeneration IP Routers Buffer CPU Memory DMA DMA DMA Line Line Line Card Card Card Local Local Local Buffer Buffer Buffer Memory Memory Memory MAC MAC MAC  Port mapping intelligence in line cards  Higher hit rate in local lookup cache  What is costly Bus Memory Interface CPU Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23ThirdGeneration Switches/Routers Switched Backplane Line CPU Line Card Card Card Local Local Buffer Buffer Memory Memory MAC MAC Third generation switch provides parallel paths (fabric)  What’s costly Bus Memory, CPU Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24FourthGeneration Switches/Routers Clustering and Multistage 1 2 3 4 5 6 13 14 15 16 17 18 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 111213141516 171819 20 2122232425262728 29303132 19 20 21 22 23 24 7 8 9 10 11 12 31 32 21 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 25Circuit switch  A switch that can handle N calls has N logical inputs and N logical outputs  N up to 200,000  Moves 8bit samples from an input to an output port  Recall that samples have no headers  Destination of sample depends on time at which it arrives at the switch  In practice, input trunks are multiplexed  Multiplexed trunks carry frames = set of samples  Goal: extract samples from frame, and depending on position in frame, switch to output  each incoming sample has to get to the right output line and the right slot in the output frame Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 26Call blocking  Can’t find a path from input to output  Internal blocking slot in output frame exists, but no path  Output blocking no slot in output frame is available  Output blocking is reduced in transit switches need to put a sample in one of several slots going to the desired next hop Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27Multiplexors and demultiplexors  Most trunks time division multiplex voice samples  At a central office, trunk is demultiplexed and distributed to active circuits  Synchronous multiplexor N input lines Output runs N times as fast as input 1 1 2 2 3 De 3 MUX … MUX … 1 2 3 … N N N Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 28Switching: what does a switch do  Transfers data from an input to an output many ports (density), high speeds  Eg: Crossbar Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 29Time division switching  Key idea: when demultiplexing, position in frame determines output trunk  Time division switching interchanges sample position within a frame: time slot interchange (TSI) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 30Space division switching  Each sample takes a different path through the switch, depending on its destination Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 31Crossbar  Simplest possible spacedivision switch  Crosspoints can be turned on or off, long enough to transfer a packet from an input to an output  Internally nonblocking 2  but need N crosspoints  time to set each crosspoint grows quadratically Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 32Multistage crossbar  In a crossbar during each switching time only one crosspoint per row or column is active  Can save crosspoints if a crosspoint can attach to more than one input line (why)  This is done in a multistage crossbar  Need to rearrange connections every switching time Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 33Multistage crossbar  Can suffer internal blocking unless sufficient number of secondlevel stages 2  Number of crosspoints N  Finding a path from input to output requires a depthfirstsearch  Scales better than crossbar, but still not too well 120,000 call switch needs 250 million crosspoints Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 34Packet switches  In a circuit switch, path of a sample is determined at time of connection establishment  No need for a sample headerposition in frame used  In a packet switch, packets carry a destination field or label  Need to look up destination port onthefly  Datagram switches  lookup based on entire destination address (longestprefix match)  Cell or Labelswitches  lookup based on VCI or Labels Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 35Blocking in packet switches  Can have both internal and output blocking  Internal no path to output  Output trunk unavailable  Unlike a circuit switch, cannot predict if packets will block (why)  If packet is blocked = must either buffer or drop Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 36Dealing with blocking in packet switches  Overprovisioning internal links much faster than inputs  Buffers at input or output  Backpressure if switch fabric doesn’t have buffers, prevent packet from entering until path is available  Parallel switch fabrics increases effective switching capacity Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 37Switch Fabrics: Buffered crossbar  What happens if packets at two inputs both want to go to same output  Can defer one at an input buffer  Or, buffer crosspoints: complex arbiter Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 38Switch fabric element  Goal: towards building “selfrouting” fabrics  Can build complicated fabrics from a simple element  Routing rule: if 0, send packet to upper output, else to lower output If both packets to same output, buffer or drop Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 39Banyan  Simplest selfrouting recursive fabric  What if two packets both want to go to the same output output blocking Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 40Blocking in Banyan S/ws: Sorting  Can avoid blocking by choosing order in which packets appear at input ports  If we can  present packets at inputs sorted by output  remove duplicates  remove gaps  precede banyan with a perfect shuffle stage  then no internal blocking  For example: X, 010, 010, X, 011, X, X, X:  Sort = 010, 011, 011, X, X, X, X, X  Remove dups = 010, 011, X, X, X, X, X, X  Shuffle = 010, X, 011, X, X, X, X, X  Need sort, shuffle, and trap networks Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 41Sorting using Merging  Build sorters from merge networks  Assume we can merge two sorted lists  Sort pairwise, merge, recurse Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 42NonBlocking BatcherBanyan Batcher Sorter SelfRouting Network 3 7 7 7 7 7 7 000 7 2 5 0 4 6 6 001 5 3 2 5 5 4 5 010 2 5 3 1 6 5 4 011 6 6 1 3 0 3 3 100 0 1 0 4 3 2 2 101 1 0 6 2 1 0 1 110 4 4 4 6 2 2 0 111 • Fabric can be used as scheduler. •BatcherBanyan network is blocking for multicast. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 43Basic Architectural Components: Queuing, Classification 3. 1. Output 2. Scheduling Forwarding Table Interconnect Forwarding Decision Forwarding Table Forwarding Decision Forwarding Table Forwarding Decision Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 44Queuing: Two basic techniques Input Queueing Output Queueing Usually a nonblocking Usually a fast bus switch fabric (e.g. crossbar) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 45Queuing: Output Queueing Individual Output Queues Centralized Shared Memory Memory b/w = 2N.R 1 2 N 1 2 Memory b/w = (N+1).R N Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 46Input Queueing Head of Line Blocking Load 100 58.6 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 47 DelaySolution: Input Queueing w/ Virtual output queues (VOQ) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 48Input Queues Virtual Output Queues Load 100 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 49 DelayPacket Classification H Forwarding Engine E Action A Packet Classification D E R Classifier (Policy Database) Predicate Action Incoming Packet Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 50Multifield Packet Classification Field 1 Field 2… Field k Action 152.163.190.69/21 152.163.80.11/32 UDP A1 Rule 1… 152.168.3.0/24 152.163.0.0/16 TCP A2 Rule 2 … … …………… 152.168.0.0/16 152.0.0.0/8 ANY An Rule N… Given a classifier with N rules, find the action associated with the highest priority rule matching Shivkumar Kalyanaraman an incoming packet. Rensselaer Polytechnic Institute 51Prefix matching: 1d range problem 128.9.19/24 128.9.25/24 128.9.16/20 128.9.176/20 128.9/16 32 2 1 0 128.9.16.14 Most specific route = “longest matching prefix” Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 52Classification: 2D Geometry problem Field 1 Field 2 Data R7 R6 P1 P2 R3 e.g. (144.24/16, 64/24) e.g. (128.16.46.23, ) R1 R4 R5 R2 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Field 1 53 Field 2Summary  High speed routers: lookup, switching, classification, buffer management  Lookup: Rangematching, tries, multiway tries  Switching: circuit s/w, crossbar, batcherbanyan,  Queuing: input/output queuing issues  Classification: Multidimensional geometry problem Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 54
Website URL
Comment