Tuesday, October 28, 2008

CICS TS Performance

In the process of debugging a problem for a customer I found that CICS TS was doing this ...

. VTAM RECEIVE ANY is issued for a length of 256 bytes

. At some point later the receive is completed and the RPL EXIT scheduled
. RPL EXIT executes issuing a VTAM CHECK Macro
. RPL EXIT processes the 1st 256 bytes of data
. RPL EXIT allocates a buffer for the remaining input bytes
. RPL EXIT issues a VTAM RECEIVE SPECific to read the remaining bytes.
. RPL EXIT completes

. VTAM has data available to complete RECEIVE Specific
. RPL EXIT is scheduled
. RPL EXIT executes issuing a VTAM CHECK macro
. RPL EXIT processed the remaining data and posts CICS
. RPL EXIT issues a RESETSR to change the CS/CA mode back to Continue Any
. RPL EXIT completes

Thinking that this process was messy and a lot of CPU overhead I discovered the RAPOOL and RAMAX SIT parameters. The RAPOOL parameter controls the number of RECEIVE ANY RPL CICS has available. The default is 50 and is fine for most systems. The RAMAX value is the size of the RECEIVE ANY buffer. The default is 256 and is terrible. Since these buffers are allocated out of 31-bit storage using a value of 8096 is much better. 8096 is a good value for a 3270 receive buffer. Virtually any 3270 input will fit into this buffer with a single RECEIVE request. And, after all 50 x 8K is only 400K of 31-bit buffer space.

After changing RAPOOL=50,RAMAX=8096 I looked into the performance again.

. VTAM RECEIVE ANY is issued for a length of 8096 bytes

. At some point later the receive is completed and the RPL EXIT scheduled
. RPL EXIT executes issuing a VTAM CHECK Macro
. RPL EXIT processes the all of the input data and posts CICS
. RPL EXIT completes

You can see the process is much shorter, fewer VTAM macros are used and several passes through the dispatcher and waits are eliminated. I would estimate a 60% reduction (or more) in CPU usage for this process too.

For those of you still running CICS/VSE 2.3 try RAPOOL=10,RAMAX=8096 and you will still see the improvement.

Enjoy.

Regards,
Jeff

Tuesday, August 19, 2008

Simple Network Tuning for the Mainframe Sysprog

There have been some interesting postings on the VSE-L list this week about throughput to Windows based PCs. So, I thought I would cover some basic network tuning points.

First, use Ethereal/Wireshark (free open source packet sniffer) to get a trace of the data transfer (E.g., FTP) you are interested in tuning. This trace will tell you a lot about the data transfer.

Now look at the size of the packets being transferred. In the old days of the Internet (dial-up days) it was common to use packets of 576 bytes. In fact, all TCP/IP stacks are required to handle this size packet. However, now that Ethernet is pretty much the standard, look for 1500 byte packets. 1500 bytes includes the 40 byte IP header and 1460 bytes of data. I am ignoring the 18 byte Ethernet header in these numbers.

If the 2 hosts (mainframe and PC) have Gigabit Ethernet adapters AND they are connected to a Gigabit switch then you might expect to see jumbo Ethernet frames. Jumbo Ethernet frames are usually 9000 bytes.

In a bulk data transfer you will normally see a bunch of packets of the same size. If this size is not 1500 bytes then either the MTU size is set incorrectly or the MSS (Maximum Segment Size) is not correct. Usually it is the MSS that is incorrect. When 2 hosts establish a connection connection (socket) each host sends what it wants to use as a segment size. The smallest value wins! Windows PC's like to a segment size of 536 for some types of applications (E.g., web servers). If the MSS is set to 536 bytes it will take 2.7 of these smaller packets to send as much data as can be contained in a single packet with an MSS of 1460 bytes.

The next item to look for is the TCP Window size. This value should be something close to 64K minus one (65535). Windows always wants to use a TCP window size that is an exact multiple of the segment size. With an MSS of 1460 you can get 44 segments in a 64K TCP window. The actual size is 1460 * 44 = 64240 (x'FAF0).

These two values are set by registry entries and there are various scripts and utilities available around the net that will set these values for you.

Another source of throughput problems is TcpAckFrequency.

When Windows receives a packet it will wait for 200ms for another packet to arrive before sending an ACK. Why? Windows hopes that it can send a single ACK for both packets. Why does this slow down a transfer? 200ms is a very long time at network speeds. Why would this come into play at all for an FTP? If a host is sending data and has sent one packet but there is not enough space available in the TCP Window to send the next packet, the sender will wait for an ACK of the data already sent. At the same time, Windows is waiting for 200ms for another packet to come in. This does not have to happen very often for these 200ms delays to add up.

Couple of good web pages to look at are ...
http://smallvoid.com/article/winnt-nagle-algorithm.html
http://support.microsoft.com/kb/Q328890

If you follow the instructions on these web pages you can add a registry entry for TcpAckFrequency Full_DWORD = 1. This will disable the 200ms ACK delay Windows uses as a timer for delayed ACK processing. I have seen disabling this feature result in a 10x improvement in throughput. Your mileage will vary!

The last two items to look at for data transfer throughput are your CPU and memory. If data transfer rates are important get the fastest CPU you can afford and the maximum amount of memory your motherboard will support. If Windows has lots of memory it will cache the data from the transfer in memory and not spend time writing the data to disk until the transfer has finished.

Well, there you have basic network tuning for the mainframe system programmer.

Tuesday, October 9, 2007

Hipersockets in VSE/ESA and z/VSE

Hipersockets very much is a CPU function. In general, the faster the CPU, the faster the transfer.

Random thoughts ...

The processor and processor level you have. Both make a difference.
Is your microcode current for the processor you have?

Are any of the LPARs being used CPU limited?
Hipersockets is a CPU (microcode) function. Limiting CPU on an LPAR limits throughput.

Hipersockets is a synchronous method of transferring data. Sending data does not complete until the data is received. This is a big factor. If the receiving system is busy ...

QDIO is asynchronous. This does effect throughput. Testing I have done shows that QDIO OSA Express can be faster than using a Hipersocket connection.

Is z/VM in the picture?
If so, what z/VM and is it current?
There are APARs effecting Hipersocket performance under z/VM.

VSE. What version of VSE/ESA or z/VSE?
There are APARs related to performance for VSE/ESA 2.7 (DY46197 comes to mind).
This may or may not apply to z/VSE.

Are the hosts using Hipersocket devices on a separate subnet?

Are all Hipersocket hosts using the same large MTU
The maximum MTU size of a Hipersockets is determined in the IOCP Configuration of the IQD CHPID through the OS parameter.

Maximum MTU:
OS = 00 : MTU = 8K
OS = 40 : MTU = 16KB
OS = 80 : MTU = 32KB
OS = C0 : MTU = 56KB

Choose the OS parameter carefully. The value you choose will be transferred for every transfer even if there is only 1K to transfer.

What is the PRTY of the FTP client/server partition?
Is it higher PRTY than the stack?
Are other batch jobs competing for CPU time?

What is the access method being used?
POWER is slowest, VSAM fastest.

What other jobs are running on the system? LPARs?
Other activity uses CPU that could be used by Hipersockets.

Thursday, October 4, 2007

Activating CWS on a z/VSE 3.1.1 System

This blog will show the steps I went through to activate CICS Web Services on a base install z/VSE 3.1.1 system. The process is pretty simple and here is what I did ...

First, I installed z/VSE 3.1.1 from the z/VSE 3.1.1 refresh tapes. Since this was done on a z/VM system, I created a new z/VSE 3.1.1 virtual machine with 2 3390-3's to handle this. After completing the z/VSE 3.1.1 install I did some basic customization and started the POWER RDR and LST devices. Now I can submit jobs from CMS.

I submitted a job from my CMS user to define a new sublibrary called PRD2.TCPIP. Then, I submitted the INSTTOOL.JOB TCP/IP-TOOLS installion job and released the job to run. I installed TCP/IP-TOOLS into the new PRD2.TCPIP sublibrary.

To startup the TCP/IP stack, I did the usual things. Defined the network interface, Cataloged the BSTTPARM.A license member, etc. and brought up the BSTTINET TCP/IP stack. Once the stack was up and running (tested using the MSG BSTTINET,D=IP PING ... command) I brought up our BSTTFTPS FTP server.

Next I used the CMS FTP client to FTP into my new z/VSE 3.1.1 system. Once there, I FTP'ed the CICSICCF JCL from the POWER RDR queue to my CMS user. This is a simple way to get the currently running JCL.

At this point, I used xedit to edit the CICS JCL and added the * $$ JOB and * $$ EOJ which POWER does not store in its data file.

Then I added the following JCL to the CICS job ...

1) // OPTION SYSPARM='00' Stack ID
2) Added PRD2.TCPIP first in the LIBDEF SEARCH chain
3) Added the BSTTWAIT JCL

// EXEC BSTTWAIT,SIZE=BSTTWAIT
/*

Note: BSTTWAIT sets the VSE JCL return code. RC=0 indicates the BSTTINET TCP/IP stack is up and available. RC=8 indicates the TCP/IP stack was not started within 10 minutes. If you want to be able to wait more than 10 minutes using BSTTWAIT, use VSE conditional JCL to re-execute the step if the return code is equal to 8.

4) Add TCPIP=YES to the CICS SYSIPT overrides on both // EXEC DFHSIP steps

Now your CICS TS JCL is ready to go. However, a couple of more things to do before you bring up CICS TS CWS for the first time.

Now logon to your current CICSICCF system to copy and activate the TCPIP CEDA entries.

From a CEDA display ...

COPY GROUP(DFH$SOT) TCPIPSERVICE(HTTPNSSL) TO(BSI$SOT)
ALTER GROUP(BSI$SOT) TCPIPSERVICE(HTTPNSSL)
Change the Portnumber if desired (I used 8080)

And finally
ADD GROUP(BSI$SOT) LIST(VSELIST) AFTER(TCPIP)

At this point we need to create an ASCII/EBCDIC conversion table. This JCL was taken directly from the IBM manual. You will need to modify it as necessary for you system.

* $$ JOB JNM=DFHCNV,CLASS=8,LDEST=(,BARNARD),PDEST=(,BARNARD)        
* $$ LST JSEP=0,CLASS=O
* $$ PUN JSEP=0,CLASS=O
// JOB DFHCNV
// LIBDEF *,CATALOG=PRD2.CONFIG
// LIBDEF SOURCE,SEARCH=(PRD1.BASE,PRD1.MACLIB)
// OPTION CATAL,LIST
// EXEC ASMA90,SIZE=(ASMA90,64K),PARM='EXIT(LIBEXIT(EDECKXIT)),SIZE(MAXC
-200K,ABOVE)'
DFHCNV TYPE=INITIAL
DFHCNV TYPE=ENTRY,RTYPE=PC,RNAME=DFHWBHH,USREXIT=NO, X
SRVERCP=037,CLINTCP=437
DFHCNV TYPE=SELECT,OPTION=DEFAULT
DFHCNV TYPE=FIELD,OFFSET=0,DATATYP=CHARACTER,DATALEN=32767, X
LAST=YES
DFHCNV TYPE=ENTRY,RTYPE=PC,RNAME=DFHWBUD,USREXIT=NO, X
SRVERCP=037,CLINTCP=437
DFHCNV TYPE=SELECT,OPTION=DEFAULT
DFHCNV TYPE=FIELD,OFFSET=0,DATATYP=CHARACTER,DATALEN=32767, X
LAST=YES
DFHCNV TYPE=FINAL
END
/*
// IF $MRC GT 4 THEN
// GOTO NOLINK
// EXEC LNKEDT
/. NOLINK
/*
/&
* $$ EOJ


Now you are ready to shutdown your original CICSICCF partition and bring up your new CICSICCF JCL that will include CWS support.


To test the CWS interface using this command from your web browser ...
http://192.168.1.228:8080/CICS/CWBA/DFH$WB1A
Of course, you will have to change the IP address to reflect your system's IP address.

When I entered the above URL in my Firefox browser I received this response ...

DFH$WB1A on system DBDCCICS successfully invoked through CICS Web Support on CICS Transaction Server for VSE/ESA.

Enjoy!

Barnard Software, Inc. Blog

This is the first post to the BSI blog. I plan to post here from time to time with thoughts, hints and tips for using BSI products. I will also comment on VSE/ESA and z/VSE topics too.

As a first thought, I am going to install CICS TS Web Services (CWS) on a z/VSE 3.1.1 base system from scratch and writeup the steps I go through to do it.

I welcome comments and suggestions on the subjects I write about and ideas for future subjects.