Question? Leave a message!




TCP Flow Control and Congestion Control

TCP Flow Control and Congestion Control
TCP Flow Control and Congestion Control EECS 489 Computer Networks http://www.eecs.umich.edu/courses/eecs489/w07 Z. Morley Mao Monday Feb 5, 2007 1 Mao W07 Acknowledgement: Some slides taken from Kurose&Ross and Katz&Stoicaƒ ƒ TCP Flow Control flow control sender won’t overflow receive side of TCP receiver’s buffer by connection has a receive transmitting too much, buffer: too fast speed-matching service: matching the send rate to the receiving app’s drain rate app process may be slow at reading from buffer 2 Mao W07ƒƒ ƒ TCP Flow control: how it works Rcvr advertises spare room by including value of RcvWindow in segments Sender limits unACKed data to RcvWindow - guarantees receive buffer doesn’t overflow (Suppose TCP receiver discards out-of-order segments) spare room in buffer = RcvWindow = RcvBuffer-LastByteRcvd - LastByteRead 3 Mao W07ƒƒƒ TCP Connection Management Three way handshake: Recall: TCP sender, receiver establish “connection” before Step 1: client host sends TCP exchanging data segments SYN segment to server initialize TCP variables: - specifies initial seq - seq. s - no data - buffers, flow control info (e.g. RcvWindow) Step 2: server host receives SYN, replies with SYNACK segment client: connection initiator Socket clientSocket = new - server allocates buffers Socket("hostname","port - specifies server initial seq. number"); Step 3: client receives SYNACK, server: contacted by client replies with ACK segment, Socket connectionSocket = which may contain data welcomeSocket.accept(); 4 Mao W07FIN ACK TCP Connection Management (cont.) client server Closing a connection: close client closes socket: clientSocket.close(); Step 1: client end system sends TCP FIN control close segment to server Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN. closed 5 Mao W07 ACK FIN timed waitFIN ACK TCP Connection Management (cont.) client server Step 3: client receives FIN, replies with ACK. closing - Enters “timed wait” - will respond with ACK to received FINs Step 4: server, receives ACK. closing Connection closed. Note: with small modification, can handle simultaneous FINs. closed closed 6 Mao W07 ACK FIN timed waitTCP Connection Management (cont) TCP server lifecycle TCP client lifecycle 7 Mao W07ƒƒƒƒ Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control manifestations: - lost packets (buffer overflow at routers) - long delays (queueing in router buffers) a top-10 problem 8 Mao W07ƒƒ ƒƒƒ Causes/costs of congestion: scenario 1 Host A λ out λ : original data in two senders, two receivers one router, infinite unlimited shared Host B output link buffers buffers no retransmission large delays when congested maximum achievable throughput 9 Mao W07ƒƒ Causes/costs of congestion: scenario 2 one router, finite buffers sender retransmission of lost packet Host A λ out λ : original data in λ' : original data, plus in retransmitted data Host B finite shared output link buffers 10 Mao W07ƒƒƒ Causes/costs of congestion: scenario 2 = λ λ always: (goodput) out in “perfect” retransmission only when loss: λ λ out in retransmission of delayed (not lost) packet makes larger (than λ in perfect case) for same λ out R/2 R/2 R/2 R/3 R/4 R/2 R/2 R/2 λλλ in in in a. b. c. “costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple copies of pkt 11 Mao W07 λ out λ out λ outƒƒƒ Causes/costs of congestion: scenario 3 four senders Q: what happens as λ multihop paths in and increase ? λ timeout/retransmit in Host A λ out λ : original data in λ' : original data, plus in retransmitted data finite shared output link buffers Host B 12 Mao W07Causes/costs of congestion: scenario 3 H λ o o s u t t A H o s t B Another “cost” of congestion: when packet dropped, any “upstream transmission capacity used for that packet was wasted 13 Mao W07ƒ ƒƒƒ Approaches towards congestion control Two broad approaches towards congestion control: Network-assisted congestion End-end congestion control: control: no explicit feedback from network routers provide feedback to end systems congestion inferred from end- system observed loss, delay - single bit indicating congestion (SNA, DECbit, approach taken by TCP TCP/IP ECN, ATM) - explicit rate sender should send at 14 Mao W07ƒƒƒ ƒƒƒ Case study: ATM ABR congestion control RM (resource management) ABR: available bit rate: cells: “elastic service” if sender’s path sent by sender, interspersed with “underloaded”: data cells - sender should use bits in RM cell set by switches available bandwidth (“network-assisted”) if sender’s path congested: - NI bit: no increase in rate - sender throttled to (mild congestion) minimum guaranteed - CI bit: congestion indication rate RM cells returned to sender by receiver, with bits intact 15 Mao W07ƒƒ Case study: ATM ABR congestion control two-byte ER (explicit rate) field in RM cell - congested switch may lower ER value in cell - sender’ send rate thus minimum supportable rate on path EFCI bit in data cells: set to 1 in congested switch - if data cell preceding RM cell has EFCI set, sender sets CI bit in returned RM cell 16 Mao W07ƒƒ ƒƒƒƒ TCP Congestion Control How does sender perceive end-end control (no network assistance) congestion? sender limits transmission: loss event = timeout or 3 LastByteSent-LastByteAcked duplicate acks ≤ CongWin TCP sender reduces rate Roughly, (CongWin) after loss event three mechanisms: -AIMD CongWin is dynamic, function of - slow start perceived network congestion - conservative after timeout events CongWin rate = Bytes/sec RTT 17 Mao W07TCP AIMD multiplicative decrease: additive increase: increase cut CongWin in half CongWin by 1 MSS every after loss event RTT in the absence of loss events: probing congestion window 24 Kbytes 16 Kbytes 8 Kbytes time Long-lived TCP connection 18 Mao W07ƒƒ TCP Slow Start When connection begins, When connection begins, CongWin = 1 MSS increase rate exponentially fast - Example: MSS = 500 bytes & until first loss event RTT = 200 msec - initial rate = 20 kbps available bandwidth may be MSS/RTT - desirable to quickly ramp up to respectable rate 19 Mao W07ƒƒ one segment two segments four segments TCP Slow Start (more) When connection begins, Host A Host B increase rate exponentially until first loss event: - double CongWin every RTT - done by incrementing CongWin for every ACK received Summary: initial rate is slow but ramps up exponentially fast time 20 Mao W07 RTTƒƒ Refinement Philosophy: After 3 dup ACKs: • 3 dup ACKs indicates - CongWin is cut in half - window then grows linearly network capable of But after timeout event: delivering some segments - CongWin instead set to 1 MSS; • timeout before 3 dup - window then grows exponentially ACKs is “more alarming” - to a threshold, then grows linearly 21 Mao W07ƒƒ Refinement (more) Q: When should the exponential increase switch to linear? A: When CongWin gets to 1/2 of its value before timeout. Implementation: Variable Threshold At loss event, Threshold is set to 1/2 of CongWin just before loss event 22 Mao W07ƒƒƒƒ Summary: TCP Congestion Control When CongWin is below Threshold, sender in slow- start phase, window grows exponentially. When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly. When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold. When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS. 23 Mao W07TCP sender congestion control Event State TCP Sender Action Commentary ACK receipt for Slow Start CongWin = CongWin + MSS, Resulting in a doubling of (SS) If (CongWin Threshold) CongWin every RTT previously unacked data set state to “Congestion Avoidance” ACK receipt for Congestion CongWin = CongWin+MSS Additive increase, resulting previously Avoidance (MSS/CongWin) in increase of CongWin by unacked data (CA) 1 MSS every RTT Loss event SS or CA Threshold = CongWin/2, Fast recovery, detected by CongWin = Threshold, implementing multiplicative triple duplicate Set state to “Congestion decrease. CongWin will not ACK Avoidance” drop below 1 MSS. Timeout SS or CA Threshold = CongWin/2, Enter slow start CongWin = 1 MSS, Set state to “Slow Start” Duplicate ACK SS or CA Increment duplicate ACK count CongWin and Threshold not for segment being acked changed 24 Mao W07ƒƒƒƒƒ TCP throughput What’s the average throughout of TCP as a function of window size and RTT? - Ignore slow start Let W be the window size when loss occurs. When window is W, throughput is W/RTT Just after loss, window drops to W/2, throughput to W/2RTT. Average throughout: .75 W/RTT 25 Mao W07ƒƒƒƒƒ TCP Futures Example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput Requires window size W = 83,333 in-flight segments Throughput in terms of loss rate: 1.22⋅MSS RTT L -10 ➜ L = 2·10 Wow New versions of TCP for high-speed needed 26 Mao W07TCP Fairness Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 bottleneck TCP router connection 2 capacity R 27 Mao W07ƒƒ Why is TCP fair? Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally equal bandwidth share R loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R 28 Mao W07 Connection 2 throughputƒƒƒ ƒƒƒ Fairness (more) Fairness and parallel TCP Fairness and UDP connections Multimedia apps often do not nothing prevents app from use TCP opening parallel cnctions between - do not want rate throttled by 2 hosts. congestion control Web browsers do this Instead use UDP: Example: link of rate R supporting - pump audio/video at 9 cnctions; constant rate, tolerate packet loss - new app asks for 1 TCP, gets rate R/10 Research area: TCP friendly - new app asks for 11 TCPs, gets R/2 29 Mao W07ƒƒƒƒƒƒ ƒƒƒ Delay modeling Notation, assumptions: Q: How long does it take to receive Assume one link between client and server of rate R an object from a Web server after S: MSS (bits) sending a request? O: object size (bits) Ignoring congestion, delay is no retransmissions (no loss, no influenced by: corruption) TCP connection establishment Window size: data transmission delay First assume: fixed congestion slow start window, W segments Then dynamic window, modeling slow start 30 Mao W07TCP Delay Modeling: Slow Start (1) Now suppose window grows according to slow start Will show that the delay for one object is: O S S ⎡⎤ P Latency= 2RTT++P RTT+− (2−1) ⎢⎥ R R R ⎣⎦ where P is the number of times TCP idles at server: P= minQ,K−1 - where Q is the number of times the server idles if the object were of infinite size. - and K is the number of windows that cover the object. 31 Mao W07TCP Delay Modeling: Slow Start (2) Delay components: initiate TCP connection • 2 RTT for connection estab and request request object • O/R to transmit first window = S/R object • time server idles due RTT second wind to slow start = 2S/R Server idles: third window = 4S/R P = minK-1,Q times Example: • O/S = 15 segments fourth window = 8S/R •K = 4 windows •Q = 2 • P = minK-1,Q = 2 complete object transmission Server idles P=2 times delivered time at time at server client 32 Mao W07TCP Delay Modeling (3) S +RTT= time from when server starts to send segment R initiate TCP until server receives acknowledgement connection S k−1 2= time to transmit the kth window request R object first window = S/R + RTT S S ⎡⎤ k−1 second window +RTT− 2= idle time after the kth window = 2S/R ⎢⎥ R R ⎣⎦ third window = 4S/R P O fourth window delay=+ 2RTT+ idleTime ∑ p = 8S/R R p=1 P O S S k−1 =+ 2RTT+ +RTT− 2 ∑ R R R complete k=1 object transmission delivered O S S P time at =+ 2RTT+PRTT+ − (2−1) time at server R R R client 33 Mao W07TCP Delay Modeling (4) Recall K = number of windows that cover object How do we calculate K ? 0 1 k−1 K= mink : 2 S+ 2 S+L+ 2 S≥O 0 1 k−1 = mink : 2+ 2+L+ 2≥O /S O k = mink : 2−1≥ S O = mink :k≥ log (+1) 2 S O ⎡⎤ = log (+1) 2 ⎢⎥ S ⎢⎥ Calculation of Q, number of idles for infinite-size object, is similar (see HW). 34 Mao W07ƒƒƒƒ HTTP Modeling Assume Web page consists of: - 1 base HTML page (of size O bits) - M images (each of size O bits) Non-persistent HTTP: - M+1 TCP connections in series - Response time = (M+1)O/R + (M+1)2RTT + sum of idle times Persistent HTTP: - 2 RTT to request and receive base HTML file - 1 RTT to request and receive M images - Response time = (M+1)O/R + 3RTT + sum of idle times Non-persistent HTTP with X parallel connections - Suppose M/X integer. - 1 TCP connection for base file - M/X sets of parallel connections for images. - Response time = (M+1)O/R + (M/X + 1)2RTT + sum of idle times 35 Mao W07HTTP Response time (in seconds) RTT = 100 msec, O = 5 Kbytes, M=10 and X=5 20 18 16 14 non-persistent 12 10 persistent 8 6 parallel non- 4 persistent 2 0 28 100 1 Mbps 10 Kbps Kbps Mbps For low bandwidth, connection & response time dominated by transmission time. Persistent connections only give minor improvement over parallel 36 Mao W07 connections.HTTP Response time (in seconds) RTT =1 sec, O = 5 Kbytes, M=10 and X=5 70 60 50 non-persistent 40 persistent 30 20 parallel non- persistent 10 0 28 100 1 Mbps 10 Kbps Kbps Mbps For larger RTT, response time dominated by TCP establishment & slow start delays. Persistent connections now give important improvement: particularly in high delay•bandwidth networks. 37 Mao W07ƒƒƒƒ Issues to Think About What about short flows? (setting initial cwnd) - most flows are short - most bytes are in long flows How does this work over wireless links? - packet reordering fools fast retransmit - loss not always congestion related High speeds? - to reach 10gbps, packet losses occur every 90 minutes Fairness: how do flows with different RTTs share link? 38 Mao W07ƒ Security issues with TCP Example attacks: - Sequence number spoofing - Routing attacks - Source address spoofing - Authentication attacks 39 Mao W07ƒƒ Network Layer goals: understand principles behind network layer services: - routing (path selection) - dealing with scale - how a router works - advanced topics: IPv6, mobility instantiation and implementation in the Internet 40 Mao W07ƒƒƒƒƒ Network layer application transport segment from sending transport network to receiving host data link network physical data link network on sending side encapsulates network physical data link data link segments into datagrams physical physical network on rcving side, delivers data link segments to transport layer physical network data link physical network layer protocols in every host, router network network data link data link Router examines header fields physical physical in all IP datagrams passing network application through it data link transport physical network data link physical 41 Mao W07ƒƒ ƒƒ Key Network-Layer Functions analogy: forwarding: move packets from router’s input to routing: process of appropriate router output planning trip from source to dest routing: determine route taken by packets from forwarding: process of source to dest. getting through single - Routing algorithms interchange 42 Mao W07Interplay between routing and forwarding routing algorithm local forwarding table header value output link 0100 3 0101 2 0111 2 1001 1 value in arriving packet’s header 1 0111 2 3 43 Mao W07ƒƒƒ Connection setup rd 3 important function in some network architectures: - ATM, frame relay, X.25 Before datagrams flow, two hosts and intervening routers establish virtual connection - Routers get involved Network and transport layer cnctn service: - Network: between two hosts - Transport: between two processes 44 Mao W07ƒƒƒ ƒƒ Network service model Q: What service model for “channel” transporting datagrams from sender to rcvr? Example services for a flow of Example services for individual datagrams: datagrams: In-order datagram delivery guaranteed delivery Guaranteed minimum Guaranteed delivery with less bandwidth to flow than 40 msec delay Restrictions on changes in inter-packet spacing 45 Mao W07Network layer service models: Guarantees ? Network Service Congestion Architecture Model Bandwidth Loss Order Timing feedback no Internet best effort none no no no (inferred via loss) yes ATM CBR constant yes yes no rate congestion yes ATM VBR guaranteed yes yes no rate congestion no ATM ABR guaranteed yes no yes minimum none no ATM UBR yes no no 46 Mao W07ƒƒƒ Network layer connection and connection-less service Datagram network provides network-layer connectionless service VC network provides network-layer connection service Analogous to the transport-layer services, but: - Service: host-to-host - No choice: network provides one or the other - Implementation: in the core 47 Mao W07ƒƒƒƒ Virtual circuits “source-to-dest path behaves much like telephone circuit” - performance-wise - network actions along source-to-dest path call setup, teardown for each call before data can flow each packet carries VC identifier (not destination host address) every router on source-dest path maintains “state” for each passing connection link, router resources (bandwidth, buffers) may be allocated to VC 48 Mao W07ƒƒ VC implementation A VC consists of: 1. Path from source to destination 2. VC numbers, one number for each link along path 3. Entries in forwarding tables in routers along path Packet belonging to VC carries a VC number. VC number must be changed on each link. - New VC number comes from forwarding table 49 Mao W07VC number Forwarding table 22 32 12 3 1 2 interface Forwarding table in number northwest router: Incoming interface Incoming VC Outgoing interface Outgoing VC 1 12 2 22 2 63 1 18 3 7 2 17 1 97 3 87 …… … … Routers maintain connection state information 50 Mao W07ƒƒƒ Virtual circuits: signaling protocols used to setup, maintain teardown VC used in ATM, frame-relay, X.25 not used in today’s Internet application 6. Receive data application 5. Data flow begins transport transport 4. Call connected 3. Accept call network network 1. Initiate call 2. incoming call data link data link physical physical 51 Mao W07ƒƒƒ Datagram networks no call setup at network layer routers: no state about end-to-end connections - no network-level concept of “connection” packets forwarded using destination host address - packets between same source-dest pair may take different paths application application transport transport network network 1. Send data 2. Receive data data link data link physical physical 52 Mao W074 billion Forwarding table possible entries Destination Address Range Link Interface 11001000 00010111 00010000 00000000 through 0 11001000 00010111 00010111 11111111 11001000 00010111 00011000 00000000 through 1 11001000 00010111 00011000 11111111 11001000 00010111 00011001 00000000 through 2 11001000 00010111 00011111 11111111 otherwise 3 53 Mao W07Longest prefix matching Prefix Match Link Interface 11001000 00010111 00010 0 11001000 00010111 00011000 1 11001000 00010111 00011 2 otherwise 3 Examples Which interface? DA: 11001000 00010111 00010110 10100001 Which interface? DA: 11001000 00010111 00011000 10101010 54 Mao W07ƒƒƒ ƒƒƒ Datagram or VC network: why? Internet ATM data exchange among computers evolved from telephony - “elastic” service, no strict human conversation: timing req. - strict timing, reliability “smart” end systems (computers) requirements - can adapt, perform control, - need for guaranteed service error recovery “dumb” end systems - simple inside network, - telephones complexity at “edge” - complexity inside network many link types - different characteristics - uniform service difficult 55 Mao W07
Website URL
Comment