You are on page 1of 22

Transport Layer

A. Basic Concepts 1. The purpose of the transport layer is to provide "efficient, reliable, and cost-effective" service to its users (processes in the application layer). Essentially the machine-to-machine connection should be transparent; so, the process has the same level of reliability that it would have talking to another process on the same machine. In particular, it should not have to know anything about the network. 2. Users "own" the transport layer and can demand from it the reliability that data transfer normally needs. a. What users want from the connection is specified through QoS (quality of service) parameters, and it is the transport layer's responsibility to negotiate the best quality of service that it can get from the network layer (at a price that the users are willing to pay) and to maintain that quality. b. Some QoS parameters are: connection establishment delay throughput residual error ratio priority 3. Transport Primitives a. While few users processes "see" the network layer, many will see the transport layer. Consequently, the transport primitives (the basic commands through which applications access the transport services) must be few and simple. b. Tanenbaum uses the following terminology: frame=entity sent by the data link layer; packet=entity sent by the network layer; TPDU (transport protocol data unit)=entity sent by the transport layer.

Fig. 6-4 Nesting of TPDU's, packets, and frames. c. A bare bones set of primitives are shown.

Fig. 6-3 The primitives for a simple transport service. The sequence starts with a server application like FTP or TELNET telling the transport entity (a piece of hardware or software) through a library callable routine that it is ready to accept clients. This is LISTEN. A client can then request a connection by sending its transport entity a CONNECT request. The transport entity will respond by sending out a connection request TPDU. The server would send back a connection accepted TPDU to establish the connection. Once the connection is established, data can be sent back and forth using the SEND and RECEIVE primitives. [Note: This is STOP-AND-WAIT--the simplest transmission procedure.] Disconnects can be symmetric or assymetric. Assymetric: Either party can send a disconnect releasing the connection (like the phone system). Symmetric: When one party sends a disconnect, that means that it has no more data to send. The connection isn't released until both parties send a disconnect. d. Transport layer protocols can be very complex and state diagrams (like the one above) are used to understand them.

Fig. 6-5 A state diagram for a simple connection management scheme. Transitions labeled in italics are caused by packet arrivals. The solid lines show the client's state sequence. The dashed lines show the server's state sequence. 4. Berkeley Sockets

Fig. 6-6 The socket primitives for TCP a. The primitives used in Berkeley UNIX for TCP are a bit more complex. b. Primitives--Socket: This opens a connection between the application (server or client) and its transport entity. Bind: This gives the socket an address (a port number). Some services like FTP and TELNET have "well-known numbers" (specified in the Internet RFC's) and do not require BIND. Listen: Sets aside space for multiple incoming calls, but does not ready the server

5.

to accept them (block). That is done by ACCEPT. The other primitives follow our simple example. Relation of transport protocols to data link protocols:

Fig. 6-7 (a) Environment of the data link layer. (b) Environment of the transport layer. a. Like data link protocols, transport protocols must deal with error control, flow control, and sequencing. b. Unlike data link protocols, transport protocols must deal with (possibly) large variations in arrival times and multiple connections whose number varies with time. c. One consequence is that transport protocols tend to be quite complex. [If you look, for example, at Tanenbaum's example protocol, pp. 514-517, you will see that it is as long as the longest of the example data link protocols, although it assumes a reliable, connection-oriented network service.] Addressing a. The first problem to be addressed is that it is not machines that need to talk to each other; it is processes inside the machines. Indeed, each process, like FTP, that is meant to talk to multiple users must be capable of spawning subprocesses or threads. In the Internet, a complex address is: IP address, local port. In ATM, these addresses are AAL-SAP (AAL Service Access Points). Tanenbaum usesTSAP=transport service access point to identify this generically. b. A key question: How do clients that wish to connect to a process on another host know the port number? Example: In host 2, a time-of-day server is attached to TSAP 122 (via LISTEN or SOCKET+BIND or something else). Possible answers: TSAP 122 could be "well-known" (like port 21 for FTP on the Internet).

6.

7.

If the port number is known, but the process only runs when there is a user (to save RAM), the client will connect to a process server that starts up the application (time-of-day server) and hands the connection to it. IF the port number is unknown, then a special process (which must be well-known) called a name server looks up the port number corresponding to a service and returns it to the application. c. In some networks, (especially LAN's), while the port may be known, the machine on which the service is offered may not be. In that case, one must: (1) Use a name server to specify the machine, or (2) Send a special packet requesting the service to all machines. Connecting a. In an error-free system, the client simply sends a connection request and the server responds with a connection acknowledgment, BUT, real subnets can lose packets for indefinite periods of time.

Worst Case: Due to congestion or router failures, a complete transmission is repeated twice and is executed twice in the appropriate order (e.g., a bank transfer is done twice). b. Bad Methods to Deal with Delayed Duplicates Change transport addresses with each transmission: This costs too much overhead. Give each connection a connection identifier-allowing to check for duplicates: When a host crashes, it loses memory of the identifiers. A Method That Works Assure that each packet has a finite time to live (restrict maximum length, use hop counter, or use a time stamp). Some time T after this (to account for acknowledgments) the packet is dead. Number each TPDU sequentially. Each host has a digital clock (not synchronized) with at least as many bits as in the sequence number. This clock does not fail when the host crashes. The low order k bits of one of the clocks, e.g., the client's, is used to initialize the sequence number and is agreed on by both hosts. The number k should be large enough to avoid wrap-around. At this point a sliding window protocol may be used. To avoid problems with crashes (e.g., starting with a sequence number 70; going past 80; and then

c.

having an old TPDU with sequence number 80 show up from before the crash). There are forbidden sequence numbers at any time T. The width of the forbidden region is determined by the maximum lifetime of a packet. The protocol must avoid entering the forbidden region from above or below.

Fig. 6-10 (a) TPDU's may not enter the forbidden region. (b) The resynchronization problem
To avoid the initial setup of sequence numbers going awry, one must use a three-way handshake in which both the client and server send out independent sequence numbers. No combination of duplicate connection requests and accepts can make the system go astray. Disconnecting a. Disconnecting is simpler in principle than connecting but has, nonetheless, some real difficulties associated with it. b. Assymmetric disconnects read to data loss when one side has not finished its transmission. c. Symmetric disconnects are better--but not foolproof because of the two-army problem: The two armies on the hills want to synchronize their attack on the army in the valley.

8.

Commander #1 says: Let's attack at 10am. Commander #2 says: Fine. How does Commander #2 know his message got through? Commander #1 could send an acknowledgment, but how does he know that got through and so on... The last message must be unacknowledged and is inherently unreliable.

9.

d. The same issue holds for breaking connections and that too is never completely reliable, i.e., the connection might not be broken when it should be. To resolve this situation, one must use timers. For example, every time a TPDU arrives, one can restart a timer. After a certain (very large) amount of time, one can close a connection even if a disconnect request (or its acknowledgment) is not received. Flow Control and Buffering a. Like in the data link layer, the connection is point-topoint and the goal is to avoid overloading the receiver. Unlike the data link layer, a transport layer will typically have a large number of connections. Tying a buffer to each sequence number in a sliding window protocol (as, for example, in Tanenbaum's protocol 6) requires a large amount of buffering--HOW CAN WE REDUCE THE LOAD? b. One approach is for the receiver to hold a dynamic pool of buffers. If there is no buffer available, then the TPDU is discarded and must be re-sent (which implies that the sender must also buffer what it is sending). Conversely, with reliable transmission, it is possible to eliminate the transmission buffers. c. Variable size buffers depending on TPDU size can also be used but require complicated protocols. The protocol can also manage variable buffer sizes that are dynamically increased or reduced, depending upon the amount of traffic. This approach requires control packets (or piggy-backed control information) and leads to more complicated protocols. connection-less d. The sender must also be aware of the sub-net's carrying capacity (which can also vary dynamically) and adjust its flow rate so as not to overwhelm its subnet. [This issue does not affect the data link layer because the physical layer has a fixed transmission rate.] Multiplexing

10.

Fig. 6-17 (a) Upward multiplexing. (b) Downward multiplexing. a. Upward multiplexing is used when multiple transport processes are connected to the same host. This would be useful if a virtual circuit (or worse yet, a physical circuit) is being underutilized. b. Downward multiplexing is the inverse scenario where a single network connection cannot handle the traffic from the transport layer process. By dividing the traffic among different network connections (e.g., different satellite connections), it is possible to get better throughput. Crash Recovery a. It is the transport layers responsibility to deal with router crashes. With service, TPDU's get lost all the time anyway. The crash is transparent. With connection-oriented service, one must reestablish a virtual circuit and determine which TPDU's have been received. One must then re-send lost TPDU's. connection-less b. Host crashes cannot be dealt with completely successfully at the transport layer. The reason is that acknowledging a transmission and writing it to the application layer are discrete acts, and it is possible for an ACK to be sent with no WRITE being made or vice versa. Thus, asking the other host for its status does not solve the problem. Example: A simple stop-and-wait protocol is being used. The host queries the client and finds that it is in state S0 (no TPDU's outstanding) or in state S1 (one TPDU outstanding). There are

11.

four different possible transmission strategies, and all of them fail under some circumstances. Making the protocol more complicated doesn't help. B. The Internet--Transport Layer (TCP & VDP) 1. The principal transport layer protocol that is used by the Internet is TCP (transmission control protocol). It is used to achieve a reliable byte stream over an unreliable internetwork consisting of many different networks. It is well-designed and highly robust. There is a protocol called UDP (user data protocol) which is just IP with a short header and is used mostly for control. 2. Basic TCP Properties a. It is responsible for: (1) re-ordering out-of-order packets, (2)eliminating duplicates, and (3)getting error-ed or missing packets retransmitted. b. TCP takes a byte stream and breaks them into IP datagrams of up to 64 k bytes (but typically 1500K). On the opposite end, it reassembles them into a byte stream. It does not preserve message boundaries from the applications. It might get four 512K messages and bundle them into one 2048K message on the far end. In this sense, it acts like UNIX pipe. c. TCP connects between sockets, which consist of an IP address and a port. Port numbers below 256 are well-known (21=FTP, 23=TELNET). TCP is point-to-point; multi-casting and broadcasting are not possible. [Hence the need of UDP for server-to-all-clients control.] d. There are some special flags. TPDU (Tanenbaum) --> segment (TCP/IP). It contains a 20 byte fixed header, optional size header, and data. Each network has a maximum transfer unit (MTU) and the IP packet must be less than that (typically 2-3K bytes and always less than 65,535 bytes). We note that TSAP (Tanenbaum) --> socket (TCP/IP). TCP Segment Header

3.

Fig. 6-24 The TCP header. a. Source port and destination port give the beginning and end ports for the sockets. b. Sequence number and acknowledgment number have the usual meaning. Note that all bytes are numbered sequentially and the acknowledgment number is the next byte expected. c. TCP header length gives the number of 32 bit words in the TCP header. d. Six empty bits and six 1-bit indicators. VR6 = 1 if the Urgent pointer is in use. That signals an offset for urgent data. ACK = 1 if there is a piggy-backed acknowledgment. PSH = 1 if the data in this segment should be pushed (not buffered). RST = 1 indicates a reset due to a host crash, an invalid segment (NAK) or to refuse a connection). SYN = 1 and ACK = 0 requests a connection. SYN = 1 and ACK = 1 acknowledges the request. FIN = 1 closes one side of the connection. e. TCP uses variable window sizes. Zero is allowed when a host does not want to receive new data for a while. To start receiving again, it must send a packet with the right sequence number and a non-zero window. f. A checksum is provided for "extreme reliability." It is over the header and data and "conceptual pseudo-header" (additional network layer information extracted from IP which

includes source and destination addresses and thus helps pick up misdirected packets). It is calculated by: Filling the checksum with zero. If the number of bytes is odd, padding it out with one zero byte to make it even. Adding all 16 bit words in 1's complement. Putting the complement in the checksum. g. Options Maximum Size Specification: Each host proposes at connection and the minimum wins. The absolute minimum that the Internet specifications require is 556 bytes. Increased Window Size: A 2^16 byte window = 64K bytes is a bit small for present-day systems. An option allows the users to have 2^32K byte windows. Selective Repeat: TCP was built using go back n originally. One can request to use selective repeat instead. There are others... Connection Management a. TCP uses the three-way handshake to establish connections.

4.

Fig. 6-26 (a)TCP connection establishment in the normal case. (b)Call collision. The server side attacks to a port using the LISTEN or ACCEPT primitives.

5.

The client executes a CONNECT primitive which leads to the TCP layer sending a segment with SYN=1 and ACK=0 and SEQ=x (occupying 1 byte of sequence space). If the connection is rejected (there is no host listening or the server is too busy, a segment with RST=1 is returned. If the connection is accepted, an acknowledgment segment with SEQ=y and ACK=x+1 is returned. The "third" handshake is then made establishing the connection. This still works even if both ends try to establish the connection independently. b. TCP uses symmetric disconnects along with timers to avoid the two-array problem. At the end of its transmission, each side sends a segment with FIN=1 which must be acknowledged. When both sides have done that, the connection is released. If there is no response to the FIN, after two packet lifetimes (2T) the sender releases its connection. Eventually, the other host times out as well since no one is out there. TCP Transmission Control

Fig. 6-29 Window management in TCP. a. In contrast to data link layer protocols, window management is not tied to acknowledgments only because data can remain in the buffer after it is received. Example: An application writes 2K into a 4K buffer. The receiver advertises a 2K window which is all it has left. Once the application on the receiver end reads the data, the TCP entity on that end must send a new segment to advertise the new window to have more data sent. b. There are two exceptions to no transmission with a 0 window: (1)Urgent data may be sent; (2)A one-byte segment can be sent to force the receiver to advertise its window size (to avoid deadlock if the window announcement is lost). c. To optimize bandwidth, both transmitters and receivers can wait to send segments or acknowledgments respectively until the buffers are relatively full or empty respectively. Wait 500 msec before acknowledging to look for a piggy-back. Nagle's Algorithm: Buffer incoming data between acknowledgments.

6.

Clark's Algorithm: (to deal with applications that read one byte at a time so that TCP advertises one byte at a time; this is "silly window syndrome") No acknowledgments are sent out until it can either handle either its advertised window size or half its buffer size. The receiver TCP can force its applications by blocking READ requests to accept data in larger chunks. If buffer space is at a premium, it can discard outof-order segments. d. Note that this window management can be quite complex (depending on the details of the implementation). TCP Congestion Control a. While the network layer deals with network congestion, it is also a transport layer issue because the best solution is to slow down transmission. b. First Issue: How is congestion detected? A timeout caused by a lost packet could be due to: (1)noise on the line or (2)congestion. In modern fiber-optic links, the former is rare. Modern TCP algorithms assume that timeouts are due to congestion. THIS IS A PROBLEM FOR WIRELESS IMPLEMENTATIONS where the opposite is true. [See Tanenbaum, pp. 543-545 for how to deal with this case.] c. In addition to a receiver window that is controlled by the receiver, each TCP sender maintains a congestion window and can only send the minimum allowed by the two windows. d. Congestion Control Algorithm:

Fig. 6-32 An example of the Internet congestion algorithm. The sender initially sets the congestion window size to the maximum segment size on the network(in this example, 1024). One then repeatedly doubles the size of the congestion window until a timeout occurs or the receiver window size is reached. This is called slow start. There is also a threshold which is set at 64K initially. Whenever a timeout is hit, the threshold is cut in half and slow start re-initializes. After the congestion window hits threshold, it only grows linearly until there is a timeout or the receiver window is reached. An ICMP SOURCE QUENCH is treated like a timeout. TCP Timer Management a. TCP uses a variety of timers to do its work, and these must be carefully managed for good performance. b. Retransmission Timer:

7.

Fig. 6-33 (a)Probability density of acknowledgment arrival times in the data link layer. (b)Probability density of acknowledgment arrival times for TCP. The most critical of the various timers is the retransmission timer. When a segment is sent, this timer is set while waiting for an acknowledgment. When it goes off, the segment is presumed lost and retransmitted. There is something familiar in the data link layer, but since there is no fixed line in the transport layer, the variance is larger. Additionally, the mean of the round

trip time varies dynamically with congestion. If the timeout interval is too short (T1) the bandwidth is clogged with useless packets; if it is too long, then time is wasted before retransmission. A dynamic algorithm is used to determine the retransmission time. (1)One has a variable RTT which is the best current estimate of the round trip time. On each successful round trip, data sent-->acknowledgment received, is reset using the formula: RTT = A*RTT + (1-A*M) , where M is the actual round trip time. (2)Given RTT, one wants to set the transmission timer to B*RTT. What should B be? This should be dynamic as well since the standard deviation can change. We estimate it using the formula: D = A*D + (1-A)*|RTT-M| We then set Timeout = RTT + 4*D How do we deal with retransmissions? Including them messes up the estimate of RTT. Answer: We don't include them. We just keep doubling RTT until the transmission gets through the first time. This is Karn's algorithm. c. Some other timers. Persistence Timer: This is set when the receiver says it has a zero size window. When it goes off, the sender checks to see if the receiver is still there. Keep-Alive Timer: This timer goes off if a connection has been idle for some period of time (not all implementations have it). Timed Wait Timer: This is set to twice the maximum lifetime and is used to make sure that all packets have been received before closing a connection; i.e., between the CLOSE/FIN command and actually deleting the connection, one waits 2T. UDP (User Data Protocol)

8.

Fig. 6-34 The UDP header. UDP is a very simple protocol with a very simple header. The checksum is calculated just as in TCP and can be shut off by storing all zeros. A true computed zero is all ones in ones complement notation. C. ATM-Transport Layer (AAL Protocols) 1. We recall that the structure of ATM is different from the OSI reference model. AAL (ATM Adaptation Layer) sits on top of ATM (which is like a network layer) but it does not provide reliable end-toend transport unlike TCP. However, one of its protocols (AAL5 is like UDP). What is the function of AAL5? a. To provide the interface between ATM (no error control, no flow control) and applications. b. It was meant to deal with a variety of possible services and had a tortuous history (which is unfortunately still evolving and still relevant). c. It was designed by phone companies originally, and computer companies didn't start participating until late in the day. History of AAL Protocols

2.

Fig. 6-36 Original service classes supported by AAL (now obsolete). a. Four classes of traffic among eight possibilities were thought to be of interest:

A = time-sensitive, constant bit rate, connection-oriented (voice, uncompressed video) B = time-sensitive, variable bit rate, connection-oriented (compressed video) C = time-insensitive, variable bit rate, connection-oriented (data) D = time-insensitive, variable bit rate, connection-less (data) b. A-->AAL1, B-->AAL2, C & D-->AAL3/4 because there is no real difference between C & D. c. The data community didn't like AAL3/4 because it is inefficient; so, they came up with AAL5. Structure of the ATM Adaptation Layer

3.

Fig. 6-37 The ATM model showing the ATM adaptation layer and its sub-layers. a. The ATM layer has two parts: (1)The convergence sublayer which interfaces to the application. The upper part is specific to the application. The lower part is specific to the protocol. (2)The lower part is the SAR (segmentation and reassembly sub-layer). It adds headers and trailers, depending on the protocol and then segments/reassembles messages in cells. b. The convergence sub-layer deals with messages; the SAR deals with cells. c. Communication Primitives The AAL uses the primitives discussed by Tanenbaum on pp. 25-27. There are four classes: (1)Request-application requests service from transport entity; (2)Indication-

transport entity tells application about request; (3)Response-application tells transport entity what it wants to do; (4)Confirm-transport entity tells application that there is a response. Eight Service Primitives: CONNECT.request: request a connection, dial Aunt Millie's number CONNECT.indication: signal the called party, the phone rings, Aunt Millie knows there is a call CONNECT.response: Accept or reject call, Aunt Millie picks up the phone (or decides not to) CONNECT.confirm: Tell the caller whether accepted, the ringing stops if accepted DATA.request: request that data is sent, invite Aunt Millie to tea DATA.indication: signal the arrival of data, Aunt Millie hears you DATA.request (2): Aunt Millie accepts DATA.indication (2): you hear Aunt Millie DISCONNECT.request: request connection release, you hang up the phone DISCONNECT.indication: let the peer know Aunt Millie hears you hang up d. The generic behavior of the sub-layers is shown in the attached figure.

Fig. 6-38 The headers and trailers that can be added to a message in an ATM network.

4.

AAL1: a. AAL1 is for time-sensitive service (including voice). Phone companies have a lot of experience with voice; so, it is reasonable that this is well-done. b. The Convergence Sub-layer detects lost of misinserted (misdirected) cells breaks messages into 46 or 47 byte units (46 byte units are used with pointers for messages boundaries) does not add headers or trailers

Fig. 6-39 The AAL1 cell format. The SAR It has a one-byte header with a 3-bit sequence number (SN) and a 3-bit CRC checksum (SNP) using x^3 + x + 1. An even parity bit covers the whole header. Messages are not necessarily aligned to cell boundaries. P-cells are used to point out the start of the next message. Only even cells can be P-cells; so the pointer is in the range 0-92. Partially killed cells can be used to avoid collection delays (e.g., digitizing 1 byte every 125 usec -> 5.875 msec delays which may be too long). 5. AAL2: This was originally meant for compressed video in which the data rate is highly variable. In its current form, it is broken (doesn't work). See Tanenbaum, pp. 549-550 for details. 6. AAL3/4: a. This protocol was originally designed for data. It has been replaced in practice by AAL5 but is still frequently referred to. b. It has two modes-stream and message-and the latter respects message boundaries. It also allows multiplexing. c. The Convergence Sub-layer: c.

Fig. 6-42 AAL3/4 convergence sub-layer message format. It adds both a header and a trailer to messages that are as long as 65,535 bytes (these are padded to a multiple of 4 bytes). CPI (Common Part Indicator) is the message type. BTAG and ETAG: These are used for framing and are incremented by one on every message. BA size: This is used to allocate a buffer. It is the same as length in message mode, but it can be different in stream mode. The SAR

d.

Fig. 6-43 The AAL3/4 cell format. This has a 44 byte payload (including the convergence sublayer data). ST: indicates where the cell falls in the message SN: sequence number MID (multiplexing id): A session number for use with multiplexing. CRC: 10 bit checksum e. Note that 8 bytes are added to every message and 4 to every cell. This is a very large overhead. AAL5: a. The computer industry was unhappy with AAL3/4 because of (1) its large overhead and (2)its small checksum. Some researchers came up with SEAL (simple efficient adaptation layer) that was eventually adopted by the ATM forum as AAL5.

7.

b. Reliable and unreliable service; unicast and multicast (always unreliable) are options; it has a message mode and a stream mode. c. Messages of up to 65,535 bytes can be passed to the AAL layer; these are then padded to multiples of 48 bytes. d. Convergence Sub-layer It adds no header and an eight byte trailer. UU (user-to-user): This is reserved for the applications or the upper part of the convergence sublayer. Length: actual length of the payload, not including the padding CRC: standard 32 bit checksum e. SAR: (1)The SAR adds NO OVERHEAD; (2)It sets a bit in the PTI field of the ATM layer in the last cell to preserve message boundaries. 8. Tanenbaum has a detailed critique of all these layers. My take: AAL1 and AAL5 are the only ones that are active; AAL1 will be used for voice; AAL5 will be used for data. Tanenbaum notes that AAL5 could be easily modified to avoid putting TCP/IP on top of it. That is right but TCP/IP is well-developed and needed for inter-networking anyway. 9. SSCOP (Service Specific Connection-Oriented Protocol): AAL does have an end-to-end reliable protocol, but it is only for control-not for data. [Something like this is clearly needed to deal with signaling.]

You might also like