Monday, October 11, 2010

Optimizing Hipersockets in z/VSE

Let's review a bit about the z/VSE Hipersockets network interface ...

Hipersockets is a synchronous method of transferring data. The sending data transfer does not complete until the data is received by the destination. This is a big factor. If the destination system is busy this can have a large impact on throughput.

Hipersockets is very much is a CPU function. In general, the faster the CPU, the faster the Hipersocket interface can operate. Limiting CPU on an LPAR or virtual machine (VM) limits throughput.

Are all Hipersocket hosts using the same large MTU? The maximum MTU size of a Hipersockets network interface is determined by the frame size specified in the IOCP Configuration of the IQD CHPID through the OS parameter.

OS Frame Size vs. Maximum MTU:
OS = 00 : MTU =  8KB
OS = 40 : MTU = 16KB
OS = 80 : MTU = 32KB
OS = C0 : MTU = 64KB


Choose the OS parameter carefully. The value you choose will be transferred for every transfer even if there is only 1 byte to transfer.

And, now to continue ...

If you read my last posting about TCP/IP throughput rates you know that the first limitation to throughput is the size of the TCP/IP stack's transmit buffer. BSI customers can adjust this value using the SBSIZE stack startup parameter. Setting the SBSIZE to 65024 (SBSIZE 65024, the maximum value) may help throughput using a Hipersocket network interface. Warning: Changing this value may actually reduce throughput. Your mileage will vary and testing is required to optimize this value. Do not change this value without contacting BSI first.

Set the MTU size of the Hipersocket network interface to 57216. This is the maximum value allowed and is specified on the LINK statement in the stack startup commands.

Now we have ...
A 64K transmit buffer
A 56K MTU

The TCP Receive Window and Maximum Segment Size are not directly controllable by customers. However, when using Hipersockets, the TCP Receive Windows should be 64K and the MSS will be slightly less than the MTU size. All of this provides the framework for having the maximum possible data sent in each packet.

In my posting on 'Care and Feeding of Your z/VSE TCP/IP Stack' I talk about keeping the stack fed. You keep the stack fed by making sure that when it wants data to send, there is data in the transmit buffer. One way to do this is to use large application buffers (for example 1MB in size). Fill a large buffer with data and issue a single socket send request to transfer the data. When you are using a Hipersocket network interface with a 64K transmit buffer and a 56K MTU, using a 64K buffer to send data into the stack is not as efficient as using a 1M send buffer.

When you are using the BSI BSTTFTPC batch FTP application, you can tell BSTTFTPC to use large buffer socket send operations by specifying
// SETPARM MAXBUF=YES in the BSTTFTPC JCL.

Are there any other limitations on throughput? Maybe.

The IBM IJBOSA driver is used to access the Hipersocket network interface. Is IJBOSA a limitation to performance? In general, no. IJBOSA's design is good. However, understanding some of the workings of the IJBOSA routine can help you to understand overall Hipersocket performance picture.

The TCP/IP stack can send multiple packets into the IJBOSA routine in a single call. Currently the BSI TCP/IP stack supports sending up to 8 packets in a single call. Why was the number 8 chosen? Because IJBOSA provides for a default of 8 buffers (I know of no way to change the number of IJBOSA buffers used).

If the stack attempts to send packets to IJBOSA and all of the its buffers are in use, the IJBOSA routine returns a return coding indicating the Hipersocket device is busy. At this point the TCP/IP stack must wait and send the packets again. The BSI TCP/IP stack does this by simply dropping the packets and allowing normal TCP retransmission to resend the packets in 10ms to 20ms. This retransmission delay allows the buffer busy condition to clear. You can see increasing the number of IJBOSA buffers may help throughput by reducing retransmission delays. However, this is only true if the machine receiving the data does so in an efficient and timely fashion. If the receiving machine is busy for some reason increasing the number of IJBOSA buffers available will not help transfer rates.

Because TCP throughput is limited by the size of the TCP Receive Window (64K), only 2 buffers are likely being used for each bulk data socket. The IJBOSA's 8 buffers provides for 4 high speed bulk data transfer sockets operating at the same time. I suspect that for most customers, 8 buffers is plenty.

Still, the question remains, how can I tell if Hipersocket busy conditions are a problem for me?

Well, first, look for this message on the z/VSE console ...

R1 0497 0S39I ERROR DURING OSA EXPRESS PROCESSING,REASON=0046 CUU=xxxx
This message is output by the IJBOSA routine the first time a busy condition is encountered. Remember, this message is output only once and will let you know a busy condition occurred.

Since the BSI TCP/IP stack's track TCP retransmission activity, the TCP retransmit counter can be used as a proxy for determining the number of Hipersocket busy conditions. To display the TCP/IP stack statistics you can use the IP LOGSTATS command.


For example, 


- MSG BSTTINET,D=IP LOGSTATS
- Run your job that uses the Hipersocket network interface
- MSG BSTTINET,D=IP LOGSTATS
- MSG BSTTINET,D=SEGMENT * $$ LST CLASS=...

Look in the BSTTINET SYSLST log output and locate these messages ...

From the 1st IP LOGSTATS command ...
01-Oct-2010 10:33:15 F6 BSTT613I   TcpOutSegs:          0
01-Oct-2010 10:33:15 F6 BSTT613I   TcpRetransSegs:     0

From the 2nd IP LOGSTATS command ...
01-Oct-2010 10:33:15 F6 BSTT613I   TcpOutSegs:    1563218

01-Oct-2010 10:33:15 F6 BSTT613I   TcpRetransSegs:   4493 


The TCP retransmission rate is about 0.28%. Any rate under 1% is probably acceptable. Having 4493 segments retransmitted results in a delay of about 45 and 90 seconds. This probably sounds like a lot of time but to achieve this level of retransmission activity I had to run 4 batch FTP jobs concurrently, each running 18 minutes. Eliminating the retransmission would reduce the run time of each batch FTP by only about 15 to 20 seconds.




Well, there you have it, an optimized Hipersockets network interface.