Friday, September 24, 2010

z/VSE TCP/IP Throughput Rates

I receive a number of queries about TCP/IP transfer (or throughput) rates every year. So, I thought I would make some comments about the issue.

At a high level, the concept is simple. The sending application sends data to a receiving application. Great! So, how much data can we send and how fast?

Basically, we would like to send as much data as possible in every request. Sounds simple, so lets start there. I allocate a large buffer, perhaps 1MB, in my application, fill it with data and send it all to the TCP/IP stack in a single request.

OK, the TCP/IP stack probably can not handle all that data at once, so lets look at how the request is broken down. First, the application's data is queued into a transmit buffer within the TCP/IP stack partition. The TCP/IP transmit buffer is probably limited to 32K to 64K in size. So, the size of the transmit buffer is the 1st limitation.

Once the data has been queued into the transmit buffer, the stack can begin the process of creating a packet to transmit the data. The 2nd limitation is the MTU (Maximum Transmission Unit) of the network interface being used. On a typical Ethernet network this is probably 1500 bytes. If you have z/VSE connected to a Gb (Gigabit) network you can take advantage of jumbo Ethernet frames which have an MTU size of 9000 bytes. If you are using a Hipersocket interface then the MTU size can move up to as much as 64K.

Well, assuming a 32K transmit buffer and an OSA Express QDIO Gb network interface, the stack will take about 9000 bytes (less headers, etc.) from the transmit buffer to create a packet. But, wait, can we really send 9000 bytes? Maybe, there are 2 more factors to consider.

The 3rd limitation is the MSS (Maximum Segment Size) negotiated by the local and remote host's TCP/IP stack when the socket was created. For example, if the sending TCP/IP stack supports an 8K MSS and the receiving TCP/IP stack supports only a 1500 byte MSS ... Guess what? The 1500 byte MSS wins.

The 4th limitation is the amount of space (number of bytes) available in the remote host's TCP Receive Window. TCP uses a 64K window to manage data transmission. Up to 64K of data can be transmitted to the remote host without waiting for an acknowledgement. Each byte of data sent must be acknowledged by the remote host. When an ACK packet is sent the sequence number of the last byte of data received and size of the current TCP Receive Window included. The sending TCP/IP stack can not send more data than will fit into the currently advertised TCP Receive Window.

Wow. OK, we started with 1MB of data being sent to the stack, 32K was queued into the transmit buffer. Now, of the 32K available in the transmit buffer the amount of data sent in a single packet is the smaller of the MTU size, Maximum Segment Size and the TCP Receive Window size.

As an example, if the MTU is 9000, TCP Receive Window 64K and MSS 1500, guess what? The amount of data sent in a single packet is 1500 bytes (less headers, etc.). Our 32K transmit buffer full of data will take around 22 packets to transmit.

As the packets are transmitted, the local TCP/IP stack is also watching for ACK packets from the remote host TCP/IP stack. The sooner the ACK packet arrives from the remote host, the sooner the local stack can begin or continue transferring data. The big factor here? Network speed and the number of hops needed to get the packet to its destination. In a word? Latency.

Having a high speed network is wonderful but latency can kill performance. You are always at the mercy of the slowest link.

I will discuss this more in another posting when I revisit using Hipersockets in z/VSE and consider some of the performance issues involved in optimizing throughput over a Hipersocket network interface.




No comments: