Question? Leave a message!




Availability, Survivability, Protection/Restoration, Fast Re-Route

Availability, Survivability, Protection/Restoration, Fast Re-Route 10
Availability, Survivability, Protection/Restoration, Fast Re Route Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1Overview  Availability: the driver…  Survivability: protection and restoration architectures  FastReroute Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2Availability: Impact of Outages Service Outage FCC Impact Social/ Reportable Business Packet Impacts Call (X.25) 6th Dropping Disconnect Range Private Line 5th May Drop Disconnect Trigger Range Voiceband Change 4th Calls over of Range 3rd CCS 2nd Range "Hit" Links Range 1st Range APS 5 0 50 200 2 10 30 min msec msec sec sec min Disruptions cost a lot of money Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3Market Drivers for Survivability  Customer Relations  Competitive Advantage  Revenue  Negative Tariff Rebates  Positive Premium Services Business Customers Medical Institutions Government Agencies  Impact on Operations  Minimize Liability Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4Network Survivability: drivers  Availability: 99.999 (5 nines) = less than 5 min downtime per year  Since a network is made up of several components, the ONLY way to reach 5nines is to add survivability in the face of failures…  Survivability = continued services in the presence of failures  Protection switching or restoration: mechanisms used to ensure survivability Add redundant capacity, detect faults and automatically reroute traffic around the failure  Restoration: related term, but slower timescale  Protection: fast timescale: 10s100s of ms…  implemented in a distributed manner to ensure fast restoration Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5Failure Types Other Motivations  Types of failure:  Components: links, nodes, channels in WDM, active components, software…  Human error: backhoe fiber cut Fiber inside oil/gas pipelines less likely to be cut  Systems: Entire COs can fail due to catastrophic events  Protection allows easy maintenance and upgrades :  Eg: switchover traffic when servicing a link…  Single failure vs multiple concurrent failures…  Goal: mean repair time mean time between failures…  Protection also depends upon kind of application:  SONET/SDH: 60 ms (legacy drop calls threshold)  Do data apps really need this level of protection  Survivability may hence be provided at several layers Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6Network Survivability Architectures Network Survivability Architectures Restoration Protection Selfhealing ReConfigurable Protection Switching Network Network Linear Protection Ring Protection Mesh Restoration Architectures Architectures Architectures Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7Network Availability Survivability Availability is the probability that an item will be able to perform its designed functions at the stated performance level, within the stated conditions and in the stated environment when called upon to do so. Reliability Availability = Reliability + Recovery Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8Quantification of Availability Percent NNines Downtime Time Availability Minutes/Year 2Nines 5,000 Min/Yr 99 99.9 3Nines 500 Min/Yr 4Nines 50 Min/Yr 99.99 5Nines 5 Min/Yr 99.999 99.9999 6Nines .5 Min/Yr Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9PSTN : The Yardstick  Individual elements have an availability of 99.99  One cut off call in 8000 calls (3 min for average call). Five ineffective calls in every 10,000 calls. PSTN End2End Availability 99.94 NI NI 0.005 0.005 AN AN Facility Facility 0.01 LE 0.01 LE Entrance Entrance NI : Network Interface 0.005 LE : Local Exchange LD 0.005 LD : Long Distance 0.02 AN : Access Network Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Source : http://www.packetcable.com/downloads/specs/pkttrvoiparv01001128.pdf 10Services Determine the Requirements on Network Availability Source : www.t1.org Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11IP Network Expectations Service Delay Jitter Loss Availability Real Time Interactive L L L H (VOIP, Cell Relay ..) Layer 2 Layer 3 VPN’s M (FR/Ethernet/AAL5) H L L Internet Service H H M L Video Services L M M H L : Low M : Medium H : High Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12Measuring Availability: The Port Method  Based on Port count in Network (Total of Ports X Sample Period) (number of impacted port x outage duration) x 100 (Total number of Ports x sample period)  Does not take into account the Bandwidth of ports e.g. OC192 and 64k are both ports  Good for dedicated Access service because ports are tied to customers. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13The Port Method Example  10,000 active access ports Network  An Access Router with 100 access ports fails for 30 minutes.  Total Available PortHours = 10,00024 = 240,000  Total Down PortHours = 100.5 = 50  Availability for a Single Day = (24000050/240,000)100 = 99.979166 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14The Bandwidth Method  Based on Amount of Bandwidth available in Network (Total amount of BW X Sample Period) (Amount of BE impacted x outage duration) x 100 (Total amount of BW in network x sample period)  Takes into account the Bandwidth of ports  Good for Core Routers Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15The Bandwidth Method Example  Total capacity of network 100 Gigabits/sec  An Access Router with 1 Gigabits/sec BW fails for 30 minutes. Total BW available in network for a day = 10024 = 2400 Gigabits/sec Total BW lost in outage = 1.5 = 0.5 Availability for a Single Day = ((24000.5)/2,400)100 = 99.979166 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16Defects Per Million  Used in PSTN networks, defined as number of blocked calls per one million calls averaged over one year. (number of impacted customers x outage duration) 6 x 10 DPM = (total number of customers x sample period) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17Defects Per Million Example  10,000 active access ports Network  An Access Router with 100 access ports fails for 30 minutes.  Total Available PortHours = 10,00024 = 240,000  Total Down PortHours = 100.5 = 50  Daily DPM = (50/240,000)1,000,000 = 208 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18Basic Ideas: Working and Protect Fibers Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19Protection Topologies Linear  Two nodes connected to each other with two or more sets of links Working Protect Working Protect (1+1) (1:n) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20Protection Topologies Ring  Two or more nodes connected to each other with a ring of links Line vs. Drop interfaces East vs. West interfaces W E D L E L W Working Protect W E E W Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21Protection Topologies Mesh  Three or more nodes connected to each other Can be sparse or complete meshes Spans may be individually protected with linear protection Overall edgetoedge connectivity is protected through multiple paths Working Protect Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22Topologies: Rings, Fibers, Directionality ADM ADM 2 Fiber Ring 4 Fiber Ring DCC ADM DCC ADM Each Line Is Each Line Is ADM ADM Full Duplex Full Duplex ADM ADM DCC ADM DCC ADM ADM ADM Uni vs. Bi Directional All Traffic Runs Clockwise, vs Either Way Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23SONET: Automatic Protection Switching (APS) ADM ADM ADM ADM ADM ADM Line Protection Switching Path Protection Switching Uses TOH Uses POH Trunk Application Access Line Applications Backup Capacity Is Idle Duplicate Traffic Sent On Protect Supports 1:n, where n=114 1+1 Automatic Protection Switching • Line Or Path Based • Revertive vs. NonRevertive • Restoration Times 50 ms • K1, K2 Bytes Signal Change Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24Protection Switching Terminology  1+1 architectures permanent bridge at the source select at sink  m:n architectures m entities provide protection for n working entities where m is less than or equal to n  allows unprotected extra traffic  most common SONET linear 1:1 and 1:n  Coordination Protocol provides coordination between controllers in source and sink  Required for all m:n architectures  Not required for 1+1 architectures unless they employ bidirectional protection switching Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 251+1 vs 1:n Working Protect Working Protect (1+1) (1:n) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 26SONET Linear 1+1 APS TX = Transmitter BR = Bridge RX = Receiver SW = Switch Working TX RX BR SW Protection RX TX Working RX TX SW BR Protection RX TX Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27SONET 1:1 Linear APS TX = Transmitter BR = Bridge RX = Receiver SW = Switch APS Channel TX RX BR SW Protection RX TX Working RX TX SW BR Protection RX TX Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 28SONET Linear APS Linear APS States Management Commands K1 Byte Bits 1234 Automatically Initiated, External, or State Request 1111 Lockout of Protection 1110 Forced Switch 1101 SF High Priority 1100 SF Low Priority 1011 SD High Priority K1/K2 Bytes 1010 SD Low Priority APS Controller 1001 Not Used 1000 Manual Switch 0111 Not Used 0110 Wait to Restore 0101 Not Used 0100 Exercise 0011 Not Used 0010 Reverse Request 0001 Do Not Revert Local SF/SD Detection 0000 No Request Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 29Protection Switching: Terminology  Dedicated vs Shared: working connection assigned dedicated or shared protection bandwidth  1+1 is dedicated, 1:n is shared  Revertive vs Nonrevertive: after failure is fixed, traffic is automatically or manually switched back  Shared protection schemes are usually revertive  Unidirectional or bidirectional protection:  Uni: each direction of traffic is handled independent of the other.  Fiber cut = only one direction switched over to protection . Usually done with dedicated protection; no signaling required.  Bidirectional transmission on fiber (full duplex) = requires bidirectional switching signaling required Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 30Current Architectures: Ring Protection Today: multiple “stacked” rings over DWDM (different s) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 31Unidirectional Path Switched Ring (UPSR) AB BA Bridge Failurefree State Path Selection W B fiber 1 Bridge P AB C A BA Path fiber 2 Selection D One fiber is “working” and the other is “protect” at all nodes… Traffic sent SIMULTANEOUSLY on working and protect paths… Shivkumar Kalyanaraman Protection done at path layer (like 1+1)… Rensselaer Polytechnic Institute 32Unidirectional Path Switched Ring (UPSR) Bridge Path Selection Failure State W fiber 1 B Bridge P AB A C BA Path fiber 2 Selection D Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 33UPSR: discussion  Easily handles failures of links, transmitters, receivers or nodes  Simple to implement: no signaling protocol or communication needed between nodes  Drawback: does not spatially reuse the fiber capacity because it is similar to 1+1 linear protection model  I.e. no sharing of protection (like m:n model)  BLSRs can support aggregate traffic capacities higher than transmission rate  UPSRs popular in lowerspeed local exchange and access networks (traffic is hubbed into the core)  No specified limit on number of nodes or ring length of UPSR, only limited by difference in delays of paths Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 34Deployment of UPSR and BLSR Regional Ring (BLSR) IntraRegional Ring (BLSR) IntraRegional Ring (BLSR) Access Rings (UPSR) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 35Bidirectional Line Switched Ring (BLSR/2) Working Protection 2Fiber BLSR B AC AC C C A A C A D Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 36Bidirectional Line Switched Ring (BLSR/2) Working Protection Ring Switch 2Fiber BLSR B A AC AC C C A C A Ring Switch D Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 37Bidirectional Line Switched Ring (BLSR/2) Working Protection Node Failure 2Fiber BLSR B A AC AC C C A C A Ring Switch Ring Switch D Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 38Node Failures = “Squelching” Customer 1 Customer 2 2Fiber BLSR B Node Failure Customer 1 Customer 2 A AC AC C C A C A Ring Switch Ring Switch D Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 39Bidirectional Line Switched Ring (BLSR/4) Working 4Fiber BLSR B Protection A AC AC C C A C A D Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 40Bidirectional Line Switched Ring Span Switch 4Fiber BLSR B AC AC C C A A C A Protection Working D Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 41Bidirectional Line Switched Ring Node Failure 4Fiber BLSR B Ring Switch AC A AC C C A C A Ring Switch Protection Also Need to Squelch Working any Misconnected Traffic D Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 42BLSR: Discussion  BLSR/2 can be thought of as BLSR/4 with protection fibers embedded in the same fiber  I.e. ½ the capacity is used for protection purposes in each fiber  Span switching and ring switching is possible only in BLSR, not in UPSR  1:n and m:n capabilities possible in BLSR  More efficient in protecting distributed traffic patterns due to the sharing idea  Ring management more complex in BLSR/4  K1/K2 bytes of SONET overhead is used to accomplish this Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 43Mesh Restoration Central Controller DC DCS DCS DC DC DCS DCS DCS DCS DC DCS DCS Self Healing Reconfigurable (or Rerouting) Restoration Architecture Restoration Architecture DC = Distributed Controller Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 44Mesh Restoration Working Path DCS DCS Line or Link Restoration DCS DCS DCS DCS Path Restoration • Control: Centralized or Distributed • Route Calculation: Preplanned or Dynamic • Type of Alternate Routing: Line or Path Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 45Mesh Restoration vs Ring/Linear Protection Attributes Linear APS Ring PS Mesh Restoration Spare Capacity Needed Most Moderate Least Fiber Counts Highest Moderate Moderate Restoration Time 50 ms 50 ms 210 seconds Software Complexity Least Moderate Most Protection Against Major Worst Medium Best Failures Planning/Operations Least Moderate/least Most Complexity Extracted from: TH. Wu, Emerging Technologies for Fiber Network Survivability, See References Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 46Fast Reroute  Do the “restoration” at the MPLS (I.e. Layer 2) …  Also possible to do fastreroute at layer 3 (IP) with BANANAS framework.  Issues: Can MPLS reroute as fast as SONET (50ms) Can traditional IP reroute as fast as MPLS Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 47Fast Reroute (2)  First question: how fast is fast Do you really need 50 ms failover  Second question: can you reroute really quickly while maintaining network stability  Third question: what are the scalability issues with fast reroute Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 48Fast Reroute: MPLS vs. IP C 10 1000 pkt to B A B 10 IP routing to B MPLS detour to B Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 49Fast Reroute vs IP Routing IP MPLS (RSVPTE)  All nodes must be told of  Only the two ends of the failure link need be told (no signaling)  Local operation: explicit  Fast propagation, fast routing; more stable SPF trigger: how stable  Two step process: detour  One step to full re + converge convergence Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 50
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.NeerajMittal
User Type:
Teacher
Country:
India
Uploaded Date:
19-07-2017