Tuesday, October 28, 2008

CICS TS Performance

In the process of debugging a problem for a customer I found that CICS TS was doing this ...

. VTAM RECEIVE ANY is issued for a length of 256 bytes

. At some point later the receive is completed and the RPL EXIT scheduled
. RPL EXIT executes issuing a VTAM CHECK Macro
. RPL EXIT processes the 1st 256 bytes of data
. RPL EXIT allocates a buffer for the remaining input bytes
. RPL EXIT issues a VTAM RECEIVE SPECific to read the remaining bytes.
. RPL EXIT completes

. VTAM has data available to complete RECEIVE Specific
. RPL EXIT is scheduled
. RPL EXIT executes issuing a VTAM CHECK macro
. RPL EXIT processed the remaining data and posts CICS
. RPL EXIT issues a RESETSR to change the CS/CA mode back to Continue Any
. RPL EXIT completes

Thinking that this process was messy and a lot of CPU overhead I discovered the RAPOOL and RAMAX SIT parameters. The RAPOOL parameter controls the number of RECEIVE ANY RPL CICS has available. The default is 50 and is fine for most systems. The RAMAX value is the size of the RECEIVE ANY buffer. The default is 256 and is terrible. Since these buffers are allocated out of 31-bit storage using a value of 8096 is much better. 8096 is a good value for a 3270 receive buffer. Virtually any 3270 input will fit into this buffer with a single RECEIVE request. And, after all 50 x 8K is only 400K of 31-bit buffer space.

After changing RAPOOL=50,RAMAX=8096 I looked into the performance again.

. VTAM RECEIVE ANY is issued for a length of 8096 bytes

. At some point later the receive is completed and the RPL EXIT scheduled
. RPL EXIT executes issuing a VTAM CHECK Macro
. RPL EXIT processes the all of the input data and posts CICS
. RPL EXIT completes

You can see the process is much shorter, fewer VTAM macros are used and several passes through the dispatcher and waits are eliminated. I would estimate a 60% reduction (or more) in CPU usage for this process too.

For those of you still running CICS/VSE 2.3 try RAPOOL=10,RAMAX=8096 and you will still see the improvement.

Enjoy.

Regards,
Jeff

Tuesday, August 19, 2008

Simple Network Tuning for the Mainframe Sysprog

There have been some interesting postings on the VSE-L list this week about throughput to Windows based PCs. So, I thought I would cover some basic network tuning points.

First, use Ethereal/Wireshark (free open source packet sniffer) to get a trace of the data transfer (E.g., FTP) you are interested in tuning. This trace will tell you a lot about the data transfer.

Now look at the size of the packets being transferred. In the old days of the Internet (dial-up days) it was common to use packets of 576 bytes. In fact, all TCP/IP stacks are required to handle this size packet. However, now that Ethernet is pretty much the standard, look for 1500 byte packets. 1500 bytes includes the 40 byte IP header and 1460 bytes of data. I am ignoring the 18 byte Ethernet header in these numbers.

If the 2 hosts (mainframe and PC) have Gigabit Ethernet adapters AND they are connected to a Gigabit switch then you might expect to see jumbo Ethernet frames. Jumbo Ethernet frames are usually 9000 bytes.

In a bulk data transfer you will normally see a bunch of packets of the same size. If this size is not 1500 bytes then either the MTU size is set incorrectly or the MSS (Maximum Segment Size) is not correct. Usually it is the MSS that is incorrect. When 2 hosts establish a connection connection (socket) each host sends what it wants to use as a segment size. The smallest value wins! Windows PC's like to a segment size of 536 for some types of applications (E.g., web servers). If the MSS is set to 536 bytes it will take 2.7 of these smaller packets to send as much data as can be contained in a single packet with an MSS of 1460 bytes.

The next item to look for is the TCP Window size. This value should be something close to 64K minus one (65535). Windows always wants to use a TCP window size that is an exact multiple of the segment size. With an MSS of 1460 you can get 44 segments in a 64K TCP window. The actual size is 1460 * 44 = 64240 (x'FAF0).

These two values are set by registry entries and there are various scripts and utilities available around the net that will set these values for you.

Another source of throughput problems is TcpAckFrequency.

When Windows receives a packet it will wait for 200ms for another packet to arrive before sending an ACK. Why? Windows hopes that it can send a single ACK for both packets. Why does this slow down a transfer? 200ms is a very long time at network speeds. Why would this come into play at all for an FTP? If a host is sending data and has sent one packet but there is not enough space available in the TCP Window to send the next packet, the sender will wait for an ACK of the data already sent. At the same time, Windows is waiting for 200ms for another packet to come in. This does not have to happen very often for these 200ms delays to add up.

Couple of good web pages to look at are ...
http://smallvoid.com/article/winnt-nagle-algorithm.html
http://support.microsoft.com/kb/Q328890

If you follow the instructions on these web pages you can add a registry entry for TcpAckFrequency Full_DWORD = 1. This will disable the 200ms ACK delay Windows uses as a timer for delayed ACK processing. I have seen disabling this feature result in a 10x improvement in throughput. Your mileage will vary!

The last two items to look at for data transfer throughput are your CPU and memory. If data transfer rates are important get the fastest CPU you can afford and the maximum amount of memory your motherboard will support. If Windows has lots of memory it will cache the data from the transfer in memory and not spend time writing the data to disk until the transfer has finished.

Well, there you have basic network tuning for the mainframe system programmer.