Question? Leave a message!




Principles of Reliable data transfer

Principles of Reliable data transfer
Transport Layer EECS 489 Computer Networks http://www.eecs.umich.edu/courses/eecs489/w07 Z. Morley Mao Monday Jan 29, 2007 1 Mao W07 Acknowledgement: Some slides taken from KuroseRoss and KatzStoicaÆ ƒƒƒ Topdown New approach (E.g., Kurose Ross) – start from the application layer all the Application way down to the physical layer Advantages – goals are very clear Transport start from application needs Network Disadvantages – harder to understand (IP) some assumptions made about lower Link layers (e.g., packet losses in the Internet are because of congestion) Physical 2 Mao W07ƒ ƒ Transport Layer Our goals: learn about transport layer protocols in the understand principles Internet: behind transport layer services: UDP: connectionless transport multiplexing/demultiple xing TCP: connection oriented transport reliable data transfer TCP congestion control flow control congestion control 3 Mao W07ƒƒƒ logical endend transport Transport services and protocols provide logical communication application transport between app processes running network data link network on different hosts physical data link network physical transport protocols run in end data link physical systems network data link send side: breaks app physical network data link messages into segments, physical passes to network layer network data link rcv side: reassembles physical segments into messages, application passes to app layer transport network data link more than one transport physical protocol available to apps Internet: TCP and UDP 4 Mao W07ƒƒƒƒƒ ƒƒƒ Transport vs. network layer network layer: logical Household analogy: communication between 12 kids sending letters to 12 kids hosts processes = kids transport layer: logical communication between app messages = letters in processes envelopes relies on, enhances, hosts = houses network layer services transport protocol = Ann and Bill Q: what is an example networklayer protocol = postal property that network service layer does not have, transport layer provides And vice versa 5 Mao W07ƒƒƒ logical endend transport Internet transportlayer protocols application Reliable, inorder delivery transport network (TCP) data link network physical data link congestion control network physical data link flow control physical network data link connection setup physical network data link Unreliable, unordered physical network delivery: UDP data link physical nofrills extension of “best effort” IP application transport network Services not available: data link physical delay guarantees bandwidth guarantees 6 Mao W07Multiplexing/demultiplexing Multiplexing at send host: Demultiplexing at rcv host: gathering data from multiple delivering received segments sockets, enveloping data with to correct socket header (later used for demultiplexing) = socket = process P4 application P1 P2 P3 application P1 application transport transport transport network network network link link link physical physical physical host 3 host 2 host 1 7 Mao W07ƒƒ How demultiplexing works host receives IP datagrams 32 bits each datagram has source port dest port source IP address, destination IP address each datagram carries 1 other header fields transportlayer segment each segment has source, destination port application number data (recall: wellknown port (message) numbers for specific applications) host uses IP addresses TCP/UDP segment format port numbers to direct segment to appropriate socket 8 Mao W07ƒƒ ƒƒ Connectionless demultiplexing When host receives UDP Create sockets with port segment: numbers: checks destination port DatagramSocket mySocket1 = new DatagramSocket(99111); number in segment DatagramSocket mySocket2 = directs UDP segment to new DatagramSocket(99222); socket with that port number UDP socket identified by IP datagrams with different twotuple: source IP addresses and/or source port (dest IP address, dest port number) numbers directed to same socket 9 Mao W07Connectionless demux (cont) DatagramSocket serverSocket = new DatagramSocket(6428); P1 P2 P1 P3 SP: 6428 SP: 6428 DP: 9157 DP: 5775 SP: 9157 SP: 5775 DP: 6428 DP: 6428 client Client server IP:B IP: A IP: C SP provides “return address” 10 Mao W07ƒƒ ƒƒ Connectionoriented demux TCP socket identified by Server host may support 4tuple: many simultaneous TCP sockets: source IP address each socket identified by its source port number own 4tuple dest IP address Web servers have different dest port number sockets for each Recv host uses all four connecting client values to direct segment nonpersistent HTTP will to appropriate socket have different socket for each request 11 Mao W07Connectionoriented demux (cont) P1 P6 P2 P1 P4 P5 P3 SP: 5775 DP: 80 SIP: B DIP:C SP: 9157 SP: 9157 DP: 80 DP: 80 client Client server SIP: A SIP: B IP:B IP: A IP: C DIP:C DIP:C 12 Mao W07Connectionoriented demux: Threaded Web Server P1 P2 P1 P4 P3 SP: 5775 DP: 80 SIP: B DIP:C SP: 9157 SP: 9157 DP: 80 DP: 80 client Client server SIP: A SIP: B IP:B IP: A IP: C DIP:C DIP:C 13 Mao W07ƒƒƒƒ ƒƒƒ UDP: User Datagram Protocol RFC 768 “no frills,” “bare bones” Internet transport protocol Why is there a UDP “best effort” service, UDP no connection establishment segments may be: (which can add delay) lost simple: no connection state delivered out of order to at sender, receiver app small segment header connectionless: no congestion control: UDP no handshaking between can blast away as fast as UDP sender, receiver desired each UDP segment handled independently of others 14 Mao W07ƒƒƒ UDP Often used for streaming 32 bits multimedia apps loss tolerant source port dest port Length, in rate sensitive bytes of UDP checksum length segment, Other UDP uses including DNS header SNMP Application Reliable transfer over UDP: add reliability at application data layer (message) applicationspecific error recovery UDP segment format 15 Mao W07ƒƒ ƒƒƒ UDP checksum Goal: detect “errors” (e.g., flipped bits) in transmitted segment Sender: Receiver: treat segment contents compute checksum of received as sequence of 16bit segment integers check if computed checksum checksum: addition (1’s equals checksum field value: complement sum) of NO error detected segment contents YES no error detected. But sender puts checksum maybe errors nonetheless value into UDP More later …. checksum field 16 Mao W07ƒƒ Internet Checksum Example Note When adding numbers, a carryout from the most significant bit needs to be added to the result Example: add two 16bit integers 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 17 Mao W07ƒ ƒƒ Principles of Reliable data transfer important in app., transport, link layers top10 list of important networking topics characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt) 18 Mao W07Reliable data transfer: getting started rdtsend(): called from above, deliverdata(): called by (e.g., by app.). Passed data to rdt to deliver data to upper deliver to receiver upper layer send receive side side udtsend(): called by rdt, rdtrcv(): called when packet to transfer packet over arrives on rcvside of channel unreliable channel to receiver 19 Mao W07ƒƒƒ Reliable data transfer: getting started We’ll: incrementally develop sender, receiver sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer but control info will flow on both directions use finite state machines (FSM) to specify sender, receiver event causing state transition actions taken on state transition state: when in this state “state” next state state event 1 uniquely determined 2 actions by next event 20 Mao W07ƒƒ Rdt1.0: reliable transfer over a reliable channel underlying channel perfectly reliable no bit errors no loss of packets separate FSMs for sender, receiver: sender sends data into underlying channel receiver read data from underlying channel rdtsend(data) rdtrcv(packet) Wait for Wait for call from call from extract (packet,data) packet = makepkt(data) below above deliverdata(data) udtsend(packet) sender receiver 21 Mao W07ƒƒƒ Rdt2.0: channel with bit errors Underlying channel may flip bits in packet checksum to detect bit errors The question: how to recover from errors: acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK New mechanisms in rdt2.0 (beyond rdt1.0): error detection receiver feedback: control msgs (ACK,NAK) rcvrsender 22 Mao W07rdt2.0: FSM specification receiver rdtsend(data) snkpkt = makepkt(data, checksum) udtsend(sndpkt) rdtrcv(rcvpkt) rdtrcv(rcvpkt) corrupt(rcvpkt) isNAK(rcvpkt) Wait for Wait for udtsend(NAK) call from ACK or udtsend(sndpkt) above NAK Wait for call from rdtrcv(rcvpkt) isACK(rcvpkt) below Λ sender rdtrcv(rcvpkt) notcorrupt(rcvpkt) extract(rcvpkt,data) deliverdata(data) udtsend(ACK) 23 Mao W07rdt2.0: operation with no errors rdtsend(data) snkpkt = makepkt(data, checksum) udtsend(sndpkt) rdtrcv(rcvpkt) isNAK(rcvpkt) Wait for Wait for rdtrcv(rcvpkt) call from ACK or udtsend(sndpkt) corrupt(rcvpkt) above NAK udtsend(NAK) rdtrcv(rcvpkt) isACK(rcvpkt) Wait for Λ call from below rdtrcv(rcvpkt) notcorrupt(rcvpkt) extract(rcvpkt,data) deliverdata(data) udtsend(ACK) 24 Mao W07rdt2.0: error scenario rdtsend(data) snkpkt = makepkt(data, checksum) udtsend(sndpkt) rdtrcv(rcvpkt) isNAK(rcvpkt) Wait for Wait for rdtrcv(rcvpkt) call from ACK or udtsend(sndpkt) corrupt(rcvpkt) above NAK udtsend(NAK) rdtrcv(rcvpkt) isACK(rcvpkt) Wait for Λ call from below rdtrcv(rcvpkt) notcorrupt(rcvpkt) extract(rcvpkt,data) deliverdata(data) udtsend(ACK) 25 Mao W07ƒƒƒ ƒƒ rdt2.0 has a fatal flaw What happens if Handling duplicates: ACK/NAK corrupted sender adds sequence number to each pkt sender doesn’t know what happened at receiver sender retransmits current pkt if ACK/NAK garbled can’t just retransmit: possible duplicate receiver discards (doesn’t deliver up) duplicate pkt stop and wait Sender sends one packet, then waits for receiver response 26 Mao W07rdt2.1: sender, handles garbled ACK/NAKs rdtsend(data) sndpkt = makepkt(0, data, checksum) udtsend(sndpkt) rdtrcv(rcvpkt) ( corrupt(rcvpkt) Wait for Wait for isNAK(rcvpkt) ) ACK or call 0 from udtsend(sndpkt) NAK 0 above rdtrcv(rcvpkt) rdtrcv(rcvpkt) notcorrupt(rcvpkt) notcorrupt(rcvpkt) isACK(rcvpkt) isACK(rcvpkt) Λ Λ Wait for Wait for ACK or call 1 from rdtrcv(rcvpkt) NAK 1 above ( corrupt(rcvpkt) rdtsend(data) isNAK(rcvpkt) ) sndpkt = makepkt(1, data, checksum) udtsend(sndpkt) udtsend(sndpkt) 27 Mao W07rdt2.1: receiver, handles garbled ACK/NAKs rdtrcv(rcvpkt) notcorrupt(rcvpkt) hasseq0(rcvpkt) extract(rcvpkt,data) deliverdata(data) sndpkt = makepkt(ACK, chksum) udtsend(sndpkt) rdtrcv(rcvpkt) (corrupt(rcvpkt) rdtrcv(rcvpkt) (corrupt(rcvpkt) sndpkt = makepkt(NAK, chksum) sndpkt = makepkt(NAK, chksum) udtsend(sndpkt) udtsend(sndpkt) Wait for Wait for 0 from rdtrcv(rcvpkt) 1 from rdtrcv(rcvpkt) below not corrupt(rcvpkt) below not corrupt(rcvpkt) hasseq1(rcvpkt) hasseq0(rcvpkt) sndpkt = makepkt(ACK, chksum) sndpkt = makepkt(ACK, chksum) udtsend(sndpkt) udtsend(sndpkt) rdtrcv(rcvpkt) notcorrupt(rcvpkt) hasseq1(rcvpkt) extract(rcvpkt,data) deliverdata(data) sndpkt = makepkt(ACK, chksum) udtsend(sndpkt) 28 Mao W07ƒƒ ƒƒƒƒ rdt2.1: discussion Sender: Receiver: seq added to pkt must check if received packet is duplicate two seq. ’s (0,1) will state indicates whether 0 suffice. Why or 1 is expected pkt seq must check if received note: receiver can not ACK/NAK corrupted know if its last ACK/NAK twice as many states received OK at sender state must “remember” whether “current” pkt has 0 or 1 seq. 29 Mao W07ƒƒƒ rdt2.2: a NAKfree protocol same functionality as rdt2.1, using ACKs only instead of NAK, receiver sends ACK for last pkt received OK receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as NAK: retransmit current pkt 30 Mao W07rdt2.2: sender, receiver fragments rdtsend(data) sndpkt = makepkt(0, data, checksum) udtsend(sndpkt) rdtrcv(rcvpkt) ( corrupt(rcvpkt) Wait for Wait for isACK(rcvpkt,1) ) ACK call 0 from 0 udtsend(sndpkt) above sender FSM fragment rdtrcv(rcvpkt) notcorrupt(rcvpkt) isACK(rcvpkt,0) rdtrcv(rcvpkt) (corrupt(rcvpkt) Λ Wait for hasseq1(rcvpkt)) receiver FSM 0 from fragment udtsend(sndpkt) below rdtrcv(rcvpkt) notcorrupt(rcvpkt) hasseq1(rcvpkt) extract(rcvpkt,data) deliverdata(data) sndpkt = makepkt(ACK1, chksum) udtsend(sndpkt) 31 Mao W07ƒƒƒ rdt3.0: channels with errors and loss Approach: sender waits New assumption: underlying “reasonable” amount of time for channel can also lose ACK packets (data or ACKs) retransmits if no ACK received in checksum, seq. , ACKs, this time retransmissions will be of help, but not enough if pkt (or ACK) just delayed (not lost): retransmission will be duplicate, but use of seq. ’s already handles this receiver must specify seq of pkt being ACKed requires countdown timer 32 Mao W07rdt3.0 sender rdtsend(data) rdtrcv(rcvpkt) sndpkt = makepkt(0, data, checksum) ( corrupt(rcvpkt) udtsend(sndpkt) isACK(rcvpkt,1) ) starttimer rdtrcv(rcvpkt) Λ Λ Wait Wait for timeout for call 0from udtsend(sndpkt) ACK0 above starttimer rdtrcv(rcvpkt) notcorrupt(rcvpkt) rdtrcv(rcvpkt) isACK(rcvpkt,1) notcorrupt(rcvpkt) isACK(rcvpkt,0) stoptimer stoptimer Wait Wait for timeout for call 1 from udtsend(sndpkt) ACK1 above rdtrcv(rcvpkt) starttimer Λ rdtsend(data) rdtrcv(rcvpkt) sndpkt = makepkt(1, data, checksum) ( corrupt(rcvpkt) udtsend(sndpkt) isACK(rcvpkt,0) ) starttimer Λ 33 Mao W07rdt3.0 in action 34 Mao W07rdt3.0 in action 35 Mao W07ƒƒ Performance of rdt3.0 rdt3.0 works, but performance stinks example: 1 Gbps link, 15 ms ee prop. delay, 1KB packet: L (packet length in bits) 8kb/pkt T = = = 8 microsec transmit R (transmission rate, bps) 109 b/sec .008 L / R U 0.00027 = = = sender 30.008 RTT + L / R U : utilization – fraction of time sender busy sending sender 1KB pkt every 30 msec 33kB/sec thruput over 1 Gbps link network protocol limits use of physical resources 36 Mao W07rdt3.0: stopandwait operation sender receiver first packet bit transmitted, t = 0 last packet bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK ACK arrives, send next packet, t = RTT + L / R .008 L / R U 0.00027 = = = sender 30.008 RTT + L / R 37 Mao W07ƒ Pipelined protocols Pipelining: sender allows multiple, “inflight”, yettobe acknowledged pkts range of sequence numbers must be increased buffering at sender and/or receiver Two generic forms of pipelined protocols: goBackN, selective repeat 38 Mao W07Pipelining: increased utilization sender receiver first packet bit transmitted, t = 0 last bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK nd last bit of 2 packet arrives, send ACK rd last bit of 3 packet arrives, send ACK ACK arrives, send next packet, t = RTT + L / R Increase utilization by a factor of 3 .024 3 L / R U 0.0008 = = = sender 30.008 RTT + L / R 39 Mao W07ƒƒ ƒƒƒ GoBackN Sender: kbit seq in pkt header “window” of up to N, consecutive unack’ed pkts allowed ACK(n): ACKs all pkts up to, including seq n “cumulative ACK” may deceive duplicate ACKs (see receiver) timer for each inflight pkt timeout(n): retransmit pkt n and all higher seq pkts in window 40 Mao W07GBN: sender extended FSM rdtsend(data) if (nextseqnum base+N) sndpktnextseqnum = makepkt(nextseqnum,data,chksum) udtsend(sndpktnextseqnum) if (base == nextseqnum) starttimer nextseqnum++ else Λ refusedata(data) base=1 nextseqnum=1 timeout starttimer Wait udtsend(sndpktbase) udtsend(sndpktbase+1) rdtrcv(rcvpkt) … corrupt(rcvpkt) udtsend(sndpktnextseqnum1) rdtrcv(rcvpkt) notcorrupt(rcvpkt) base = getacknum(rcvpkt)+1 If (base == nextseqnum) stoptimer else starttimer 41 Mao W07ƒ GBN: receiver extended FSM default udtsend(sndpkt) rdtrcv(rcvpkt) notcurrupt(rcvpkt) hasseqnum(rcvpkt,expectedseqnum) Λ Wait extract(rcvpkt,data) expectedseqnum=1 deliverdata(data) sndpkt = sndpkt = makepkt(expectedseqnum,ACK,chksum) makepkt(expectedseqnum,ACK,chksum) udtsend(sndpkt) expectedseqnum++ ACKonly: always send ACK for correctlyreceived pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnum outoforder pkt: discard (don’t buffer) no receiver buffering ReACK pkt with highest inorder seq 42 Mao W07GBN in action 43 Mao W07ƒƒƒ Selective Repeat receiver individually acknowledges all correctly received pkts buffers pkts, as needed, for eventual inorder delivery to upper layer sender only resends pkts for which ACK not received sender timer for each unACKed pkt sender window N consecutive seq ’s again limits seq s of sent, unACKed pkts 44 Mao W07Selective repeat: sender, receiver windows 45 Mao W07ƒƒƒƒƒ ƒƒƒƒ Selective repeat receiver sender pkt n in rcvbase, rcvbase+N1 data from above : send ACK(n) if next available seq in outoforder: buffer window, send pkt inorder: deliver (also deliver timeout(n): buffered, inorder pkts), advance window to next notyetreceived resend pkt n, restart timer pkt ACK(n) in pkt n in rcvbaseN,rcvbase1 sendbase,sendbase+N: ACK(n) mark pkt n as received otherwise: if n smallest unACKed pkt, ignore advance window base to next unACKed seq 46 Mao W07Selective repeat in action 47 Mao W07
Website URL
Comment