Question? Leave a message!




TCP Connection Management

TCP Connection Management
TCP EECS 489 Computer Networks http://www.eecs.umich.edu/courses/eecs489/w07 Z. Morley Mao Wednesday Jan 31, 2007 1 Mao W07 Acknowledgement: Some slides taken from KuroseRoss and KatzStoicaƒƒƒ ƒƒƒƒ TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 pointtopoint: full duplex data: one sender, one receiver bidirectional data flow in same connection reliable, inorder byte steam: MSS: maximum segment size no “message boundaries” connectionoriented: pipelined: handshaking (exchange of TCP congestion and flow control msgs) init’s sender, control set window size receiver state before data send receive buffers exchange flow controlled: sender will not overwhelm receiver application application writes data reads data socket socket door door TCP TCP send buffer receive buffer segment 2 Mao W07TCP segment structure 32 bits URG: urgent data counting source port dest port (generally not used) by bytes sequence number of data ACK: ACK (not segments) acknowledgement number valid head not UAP R S Receive window F PSH: push data now len used bytes (generally not used) checksum Urg data pnter rcvr willing to accept RST, SYN, FIN: Options (variable length) connection estab (setup, teardown commands) application data Internet (variable length) checksum (as in UDP) 3 Mao W07Seq=42, ACK=79, data = ‘C’ Seq=43, ACK=80 TCP seq. ’s and ACKs Host A Host B Seq. ’s: byte stream User types “number” of first byte ‘C’ in segment’s data host ACKs ACKs: receipt of ‘C’, echoes seq of next byte back ‘C’ expected from other side host ACKs cumulative ACK receipt Q: how receiver handles of echoed outoforder segments ‘C’ A: TCP spec doesn’t say, up to time implementor simple telnet scenario 4 Mao W07 Seq=79, ACK=43, data = ‘C’ƒƒ ƒƒƒ TCP Round Trip Time and Timeout Q: how to set TCP timeout Q: how to estimate RTT value SampleRTT: measured time from longer than RTT segment transmission until ACK receipt but RTT varies ignore retransmissions too short: premature timeout SampleRTT will vary, want estimated unnecessary RTT “smoother” retransmissions average several recent too long: slow reaction to measurements, not just current segment loss SampleRTT 5 Mao W07TCP Round Trip Time and Timeout EstimatedRTT = (1α)EstimatedRTT + αSampleRTT Exponential weighted moving average influence of past sample decreases exponentially fast typical value: α = 0.125 6 Mao W07Example RTT estimation: RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 300 250 200 150 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconnds) SampleRTT Estimated RTT 7 Mao W07 RTT (milliseconds)ƒƒ TCP Round Trip Time and Timeout Setting the timeout EstimtedRTT plus “safety margin” large variation in EstimatedRTT larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT: DevRTT = (1β)DevRTT + βSampleRTTEstimatedRTT (typically, β = 0.25) Then set timeout interval: TimeoutInterval = EstimatedRTT + 4DevRTT 8 Mao W07ƒƒ ƒƒƒƒ TCP reliable data transfer TCP creates rdt service on Retransmissions are top of IP’s unreliable triggered by: service timeout events Pipelined segments duplicate acks Cumulative acks Initially consider simplified TCP sender: TCP uses single ignore duplicate acks retransmission timer ignore flow control, congestion control 9 Mao W07ƒƒƒ ƒƒƒƒ TCP sender events: data rcvd from app: timeout: Create segment with seq retransmit segment that caused timeout seq is bytestream number of first data byte in segment restart timer start timer if not already Ack rcvd: running (think of timer as for If acknowledges previously oldest unacked segment) unacked segments expiration interval: update what is known to be TimeOutInterval acked start timer if there are outstanding segments 10 Mao W07NextSeqNum = InitialSeqNum SendBase = InitialSeqNum TCP sender loop (forever) switch(event) (simplified) event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP Comment: NextSeqNum = NextSeqNum + length(data) • SendBase1: last cumulatively event: timer timeout retransmit notyetacknowledged segment with ack’ed byte smallest sequence number Example: start timer • SendBase1 = 71; y= 73, so the rcvr event: ACK received, with ACK field value of y wants 73+ ; if (y SendBase) y SendBase, so SendBase = y that new data is if (there are currently notyetacknowledged segments) acked start timer / end of loop forever / 11 Mao W07Seq=100, 20 bytes data Seq=92, 8 bytes data Seq=92, 8 bytes data Seq=92, 8 bytes data Seq=92, 8 bytes data TCP: retransmission scenarios Host A Host A Host B Host B X loss Sendbase = 100 SendBase = 120 SendBase SendBase = 100 = 120 premature timeout time time lost ACK scenario 12 Mao W07 ACK=100 ACK=100 ACK=120 ACK=120 ACK=100 timeout Seq=92 timeout Seq=92 timeoutSeq=92, 8 bytes data Seq=100, 20 bytes data TCP retransmission scenarios (more) Host A Host B X loss SendBase = 120 time Cumulative ACK scenario 13 Mao W07 ACK=120 ACK=100 timeoutTCP ACK generation RFC 1122, RFC 2581 TCP Receiver action Event at Receiver Arrival of inorder segment with Delayed ACK. Wait up to 500ms for next segment. If no next segment, expected seq . All data up to send ACK expected seq already ACKed Immediately send single cumulative Arrival of inorder segment with expected seq . One other ACK, ACKing both inorder segments segment has ACK pending Arrival of outoforder segment Immediately send duplicate ACK, indicating seq. of next expected byte higherthanexpect seq. . Gap detected Immediate send ACK, provided that Arrival of segment that partially or completely fills gap segment startsat lower end of gap 14 Mao W07ƒ ƒƒ Fast Retransmit If sender receives 3 ACKs for Timeout period often the same data, it supposes relatively long: that segment after ACKed data long delay before was lost: resending lost packet fast retransmit: resend Detect lost segments via segment before timer expires duplicate ACKs. Sender often sends many segments backtoback If segment is lost, there will likely be many duplicate ACKs. 15 Mao W07Fast retransmit algorithm: event: ACK received, with ACK field value of y if (y SendBase) SendBase = y if (there are currently notyetacknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y a duplicate ACK for fast retransmit already ACKed segment 16 Mao W07ƒ ƒ TCP Flow Control flow control sender won’t overflow receive side of TCP receiver’s buffer by connection has a receive transmitting too much, buffer: too fast speedmatching service: matching the send rate to the receiving app’s drain rate app process may be slow at reading from buffer 17 Mao W07ƒƒ ƒ TCP Flow control: how it works Rcvr advertises spare room by including value of RcvWindow in segments Sender limits unACKed data to RcvWindow guarantees receive buffer doesn’t overflow (Suppose TCP receiver discards outoforder segments) spare room in buffer = RcvWindow = RcvBufferLastByteRcvd LastByteRead 18 Mao W07ƒƒƒ TCP Connection Management Three way handshake: Recall: TCP sender, receiver establish “connection” before Step 1: client host sends TCP exchanging data segments SYN segment to server initialize TCP variables: specifies initial seq seq. s no data buffers, flow control info (e.g. RcvWindow) Step 2: server host receives SYN, replies with SYNACK segment client: connection initiator Socket clientSocket = new server allocates buffers Socket("hostname","port specifies server initial seq. number"); Step 3: client receives SYNACK, server: contacted by client replies with ACK segment, Socket connectionSocket = which may contain data welcomeSocket.accept(); 19 Mao W07FIN ACK TCP Connection Management (cont.) client server Closing a connection: close client closes socket: clientSocket.close(); Step 1: client end system sends TCP FIN control close segment to server Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN. closed 20 Mao W07 ACK FIN timed waitFIN ACK TCP Connection Management (cont.) client server Step 3: client receives FIN, replies with ACK. closing Enters “timed wait” will respond with ACK to received FINs Step 4: server, receives ACK. closing Connection closed. Note: with small modification, can handle simultaneous FINs. closed closed 21 Mao W07 ACK FIN timed waitTCP Connection Management (cont) TCP server lifecycle TCP client lifecycle 22 Mao W07ƒƒƒƒ Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control manifestations: lost packets (buffer overflow at routers) long delays (queueing in router buffers) a top10 problem 23 Mao W07ƒƒ ƒƒƒ Causes/costs of congestion: scenario 1 Host A λ out λ : original data in two senders, two receivers one router, infinite unlimited shared Host B output link buffers buffers no retransmission large delays when congested maximum achievable throughput 24 Mao W07ƒƒ Causes/costs of congestion: scenario 2 one router, finite buffers sender retransmission of lost packet Host A λ out λ : original data in λ' : original data, plus in retransmitted data Host B finite shared output link buffers 25 Mao W07ƒƒƒ Causes/costs of congestion: scenario 2 = λ λ always: (goodput) out in “perfect” retransmission only when loss: λ λ out in retransmission of delayed (not lost) packet makes larger (than λ in perfect case) for same λ out R/2 R/2 R/2 R/3 R/4 R/2 R/2 R/2 λλλ in in in a. b. c. “costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple copies of pkt 26 Mao W07 λ out λ out λ outƒƒƒ Causes/costs of congestion: scenario 3 four senders Q: what happens as λ multihop paths in λ and increase timeout/retransmit in Host A λ out λ : original data in λ' : original data, plus in retransmitted data finite shared output link buffers Host B 27 Mao W07Causes/costs of congestion: scenario 3 H λ o o s u t t A H o s t B Another “cost” of congestion: when packet dropped, any “upstream transmission capacity used for that packet was wasted 28 Mao W07ƒ ƒƒƒ Approaches towards congestion control Two broad approaches towards congestion control: Networkassisted congestion Endend congestion control: control: no explicit feedback from network routers provide feedback to end systems congestion inferred from end system observed loss, delay single bit indicating congestion (SNA, DECbit, approach taken by TCP TCP/IP ECN, ATM) explicit rate sender should send at 29 Mao W07ƒƒƒ ƒƒƒ Case study: ATM ABR congestion control RM (resource management) ABR: available bit rate: cells: “elastic service” if sender’s path sent by sender, interspersed with “underloaded”: data cells sender should use bits in RM cell set by switches available bandwidth (“networkassisted”) if sender’s path congested: NI bit: no increase in rate sender throttled to (mild congestion) minimum guaranteed CI bit: congestion indication rate RM cells returned to sender by receiver, with bits intact 30 Mao W07ƒƒ Case study: ATM ABR congestion control twobyte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sender’ send rate thus minimum supportable rate on path EFCI bit in data cells: set to 1 in congested switch if data cell preceding RM cell has EFCI set, sender sets CI bit in returned RM cell 31 Mao W07ƒƒ ƒƒƒƒ TCP Congestion Control How does sender perceive endend control (no network assistance) congestion sender limits transmission: loss event = timeout or 3 LastByteSentLastByteAcked duplicate acks ≤ CongWin TCP sender reduces rate Roughly, (CongWin) after loss event three mechanisms: AIMD CongWin is dynamic, function of slow start perceived network congestion conservative after timeout events CongWin rate = Bytes/sec RTT 32 Mao W07TCP AIMD multiplicative decrease: additive increase: increase cut CongWin in half CongWin by 1 MSS every after loss event RTT in the absence of loss events: probing congestion window 24 Kbytes 16 Kbytes 8 Kbytes time Longlived TCP connection 33 Mao W07ƒƒ TCP Slow Start When connection begins, When connection begins, CongWin = 1 MSS increase rate exponentially fast Example: MSS = 500 bytes until first loss event RTT = 200 msec initial rate = 20 kbps available bandwidth may be MSS/RTT desirable to quickly ramp up to respectable rate 34 Mao W07ƒƒ one segment two segments four segments TCP Slow Start (more) When connection begins, Host A Host B increase rate exponentially until first loss event: double CongWin every RTT done by incrementing CongWin for every ACK received Summary: initial rate is slow but ramps up exponentially fast time 35 Mao W07 RTTƒƒ Refinement Philosophy: After 3 dup ACKs: • 3 dup ACKs indicates CongWin is cut in half window then grows linearly network capable of But after timeout event: delivering some segments CongWin instead set to 1 MSS; • timeout before 3 dup window then grows exponentially ACKs is “more alarming” to a threshold, then grows linearly 36 Mao W07ƒƒ Refinement (more) Q: When should the exponential increase switch to linear A: When CongWin gets to 1/2 of its value before timeout. Implementation: Variable Threshold At loss event, Threshold is set to 1/2 of CongWin just before loss event 37 Mao W07ƒƒƒƒ Summary: TCP Congestion Control When CongWin is below Threshold, sender in slow start phase, window grows exponentially. When CongWin is above Threshold, sender is in congestionavoidance phase, window grows linearly. When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold. When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS. 38 Mao W07TCP sender congestion control Event State TCP Sender Action Commentary ACK receipt Slow Start CongWin = CongWin + MSS, Resulting in a doubling of for (SS) If (CongWin Threshold) CongWin every RTT previously set state to “Congestion unacked Avoidance” data ACK receipt Congestion CongWin = CongWin+MSS Additive increase, resulting for Avoidance (MSS/CongWin) in increase of CongWin by previously (CA) 1 MSS every RTT unacked data Loss event SS or CA Threshold = CongWin/2, Fast recovery, detected by CongWin = Threshold, implementing multiplicative triple Set state to “Congestion decrease. CongWin will duplicate Avoidance” not drop below 1 MSS. ACK Timeout SS or CA Threshold = CongWin/2, Enter slow start CongWin = 1 MSS, Set state to “Slow Start” Duplicate SS or CA Increment duplicate ACK count CongWin and Threshold ACK for segment being acked not changed 39 Mao W07ƒƒƒƒƒ TCP throughput What’s the average throughout ot TCP as a function of window size and RTT Ignore slow start Let W be the window size when loss occurs. When window is W, throughput is W/RTT Just after loss, window drops to W/2, throughput to W/2RTT. Average throughout: .75 W/RTT 40 Mao W07ƒƒƒƒƒ TCP Futures Example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput Requires window size W = 83,333 inflight segments Throughput in terms of loss rate: 1.22⋅MSS RTT L 10 ➜ L = 2·10 Wow New versions of TCP for highspeed needed 41 Mao W07TCP Fairness Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 bottleneck TCP router connection 2 capacity R 42 Mao W07ƒƒ Why is TCP fair Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally equal bandwidth share R loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R 43 Mao W07 Connection 2 throughputƒƒƒ ƒƒƒ Fairness (more) Fairness and parallel TCP Fairness and UDP connections Multimedia apps often do not nothing prevents app from use TCP opening parallel cnctions between do not want rate throttled by 2 hosts. congestion control Web browsers do this Instead use UDP: Example: link of rate R supporting pump audio/video at 9 cnctions; constant rate, tolerate packet loss new app asks for 1 TCP, gets rate R/10 Research area: TCP friendly new app asks for 11 TCPs, gets R/2 44 Mao W07ƒƒƒƒƒƒ ƒƒƒ Delay modeling Notation, assumptions: Q: How long does it take to receive Assume one link between client and server of rate R an object from a Web server after S: MSS (bits) sending a request O: object size (bits) Ignoring congestion, delay is no retransmissions (no loss, no influenced by: corruption) TCP connection establishment Window size: data transmission delay First assume: fixed congestion slow start window, W segments Then dynamic window, modeling slow start 45 Mao W07TCP Delay Modeling: Slow Start (1) Now suppose window grows according to slow start Will show that the delay for one object is: O S S ⎡⎤ P Latency= 2RTT++P RTT+− (2−1) ⎢⎥ R R R ⎣⎦ where P is the number of times TCP idles at server: P= minQ,K−1 where Q is the number of times the server idles if the object were of infinite size. and K is the number of windows that cover the object. 46 Mao W07TCP Delay Modeling: Slow Start (2) Delay components: initiate TCP connection • 2 RTT for connection estab and request request object • O/R to transmit first window = S/R object • time server idles due RTT second wind to slow start = 2S/R Server idles: third window = 4S/R P = minK1,Q times Example: • O/S = 15 segments fourth window = 8S/R •K = 4 windows •Q = 2 • P = minK1,Q = 2 complete object transmission Server idles P=2 times delivered time at time at server client 47 Mao W07TCP Delay Modeling (3) S +RTT= time from when server starts to send segment R initiate TCP until server receives acknowledgement connection S k−1 2= time to transmit the kth window request R object first window = S/R + RTT S S ⎡⎤ k−1 second window +RTT− 2= idle time after the kth window = 2S/R ⎢⎥ R R ⎣⎦ third window = 4S/R P O fourth window delay=+ 2RTT+ idleTime ∑ p = 8S/R R p=1 P O S S k−1 =+ 2RTT+ +RTT− 2 ∑ R R R complete k=1 object transmission delivered O S S P time at =+ 2RTT+PRTT+ − (2−1) time at server R R R client 48 Mao W07TCP Delay Modeling (4) Recall K = number of windows that cover object How do we calculate K 0 1 k−1 K= mink : 2 S+ 2 S+L+ 2 S≥O 0 1 k−1 = mink : 2+ 2+L+ 2≥O /S O k = mink : 2−1≥ S O = mink :k≥ log (+1) 2 S O ⎡⎤ = log (+1) 2 ⎢⎥ S ⎢⎥ Calculation of Q, number of idles for infinitesize object, is similar (see HW). 49 Mao W07ƒƒƒƒ HTTP Modeling Assume Web page consists of: 1 base HTML page (of size O bits) M images (each of size O bits) Nonpersistent HTTP: M+1 TCP connections in series Response time = (M+1)O/R + (M+1)2RTT + sum of idle times Persistent HTTP: 2 RTT to request and receive base HTML file 1 RTT to request and receive M images Response time = (M+1)O/R + 3RTT + sum of idle times Nonpersistent HTTP with X parallel connections Suppose M/X integer. 1 TCP connection for base file M/X sets of parallel connections for images. Response time = (M+1)O/R + (M/X + 1)2RTT + sum of idle times 50 Mao W07HTTP Response time (in seconds) RTT = 100 msec, O = 5 Kbytes, M=10 and X=5 20 18 16 14 nonpersistent 12 10 persistent 8 6 parallel non 4 persistent 2 0 28 100 1 Mbps 10 Kbps Kbps Mbps For low bandwidth, connection response time dominated by transmission time. Persistent connections only give minor improvement over parallel 51 Mao W07 connections.HTTP Response time (in seconds) RTT =1 sec, O = 5 Kbytes, M=10 and X=5 70 60 50 nonpersistent 40 persistent 30 20 parallel non persistent 10 0 28 100 1 Mbps 10 Kbps Kbps Mbps For larger RTT, response time dominated by TCP establishment slow start delays. Persistent connections now give important improvement: particularly in high delay•bandwidth networks. 52 Mao W07