Sunday, March 20, 2016

CICS TS Workload Management Using OPTI-WORKLOAD

Workload Management

Recently we had a customer request that we update our OPTI-WORKLOAD product to support the latest versions of z/VSE and CICS TS. The updates for supporting the latest z/VSE version was fairly easy and the customer reports that the z/VSE batch workload management works very well on their systems. The customer has a very large number of batch jobs that run throughout the day (and night). Using OPTI-WORKLOAD's facilities allowed them to have multiple balance groups and to have each group balanced based on the resource usage of each job.

However, updating the CICS/VSE Workload Manager to support CICS TS proved to be more difficult. And, in the process, we learned a bit so we thought we would pass on the information.

Before diving into CICS TS Workload Management I will review some of the basic features of Barnard Software, Inc.'s OPTI-WORKLOAD product.

VSE/ESA and z/VSE Balancing

The priority of the partitions/dynamic classes can be set and changed by using the AR PRTY command. When a dynamic class contains more than one allocated dynamic partition, the partitions within the dynamic class are balanced (time sliced). The time slice value can be modified via the MSECS command.

Partition Balancing

The partition balancing routine inspects the CPU time for each partition of the balancing group and decides how to rearrange the priority of the partitions. VSE/ESA 2.1 (and newer) running the Turbo dispatcher offers a new balancing algorithm. If partition balancing is specified for static and dynamic classes (via equal signs in the PRTY command), static and dynamic partitions will receive the same time slice. Without the Turbo dispatcher a dynamic class (with all its partitions) got the same time slice as a static partition.

OPTI-WORKLOAD Workload Management


OPTI-WORKLOAD introduces the concept of performance groups to VSE/ESA 1.3 (or higher) systems. Performance groups are a group of static partitions and/or dynamic classes with similar performance goals. Once a performance group has been identified, OPTI-WORKLOAD monitors the velocity of work being done by each partition and class. At specified intervals the partition or class achieving the lowest velocity of work is given the highest dispatching priority of those partitions and classes in the performance group. This results in a more equitable allocation of resources and greater overall system throughput.

Velocity of Work

Velocity of work is calculated by monitoring the number of times the partition is actively using the CPU, actively waiting on I/O, delayed waiting on the CPU and delayed waiting on other resources. Velocity is expressed as the percentage of the time a partition has the resources that it needs available to it. Velocity is not a measure of the quality of the work being done, it is simply a measure of how much of the time resources needed by a partition are available to it.

CICS Workload Management


OPTI-WORKLOAD provides a special CICS Workload Management feature. The CICS Workload Manager uses Velocity calculations to manage CICS transaction priorities. Proper CICS Workload Management results in dramatic improvements in CICS response times and throughput.

The CICS Workload Manager uses Velocity calculations to increase the priority of CICS transactions with low velocity-of-work (or transactions ‘starving’ for resources) and to decrease the priority of CICS transactions with high velocity-of-work (or transactions monopolizing resources).

The CICS Workload Manager defines a transaction as the work done between sync point requests and
measures the amount of work done by counting the number of KCP requests made by the transaction. On CICS systems where applications do not perform sync point requests for themselves, CICS will automatically issue a sync point request at end of task or when the transaction enters a terminal wait. If a transaction uses more than the specified number of work units (the CICSINT threshold value), it is moved down by one priority increment. At the same time, if a transaction is delayed by a higher priority transaction using all available resources, the delayed transaction is moved up by one priority increment. As transactions enter the CICS system their initial priority defines the its starting position in the CICS Workload Managers management scheme.

The CICSINT threshold value is normally set to a value about 50% higher than the number of work units used by an average transaction. In this way most CICS transactions enter and leave the system with little, if any, management required. However, when longer running transactions enter the system, the priority of these transactions will float downward while the priorities of transactions competing for resources will float upward. By managing the priorities of the various CICS transaction active in a system, the CICS Workload Manager maximizes throughput resulting in reduced and more consistent response times.

CICS Workload Management Thresholds


Five OPTI-WORKLOAD THRESHOLD commands are used to customize the CICS Workload
Management feature. The CICS Workload Manager extracts these threshold values from the OPTI-
WORKLOAD partition. Therefore, the OPTI-WORKLOAD partition should be active before the CICS Workload Management feature is initialized. If the OPTI-WORKLOAD partition is not active initialization will complete and the CICS Workload Manager will enter ‘quiet’ or ‘inactive’ mode. The CICS Workload Manager will synchronize with the OPTI-WORKLOAD partition when it becomes active.

The THRESHOLD CICSINT nnnn specifies the CICS Workload interval. The default value is 100 work units. Work units are defined as the work done between DFHKCP requests.

The THRESHOLD PRTYMIN nnn specifies the minimum transaction priority to be managed by the CICS Workload Management feature. Any transaction with a priority less than the minimum will not be managed. The default value is 2.

The THRESHOLD PRTYMAX nnn specifies the maximum transaction priority to be managed by the CICS Workload Management feature. Any transaction with a priority greater than the maximum will not be managed. The default value is 4.

The THRESHOLD RUNAWAY nnn specifies the Workload Runaway Task multiplier. Any transaction using more than the RUNAWAY * CICSINT work units is considered a Workload Runaway Task. For example, if the CICSINT is 250 work units and the RUNAWAY is 100, any transaction using more than 25000 work units would be a Workload Runaway Task. The default is 100.

The THRESHOLD TRIVIAL nnn specifies the trivial transaction work unit limit. Any transaction using less work units than the specification is considered trivial. Trivial transaction are excluded from the non-trivial transaction statistics displayed in the CICS Workload Manager termination statistics. The default is zero.

CICS TS Workload Management

To begin, the CICS TS Performance Guide describes CICS TS transaction priorities like this ...

The overall priority is determined by summing the priorities in all three definitions for any given task, with the maximum priority being 255.

Priority = Terminal priority + Operation priority + Transaction Priority

The value of the system initialization parameter, PRTYAGE also influences
the dispatching order, for example, PRTYAGE=1000 causes the task's
priority to increase by 1 every 1000ms it spends on the ready queue.

This is a nice description and basically incorrect.

With the help of Eugene (Gene) Hudders, a CICS TS internals expert and author the C/Trek CICS TS performance monitor, I found out basically how the CICS TS dispatcher actually works.

The one-byte task priority is initially calculated as described above. When PRTYAGE=0 (disabled) the 1-byte task priority is converted to an 8-byte priority field with the 1-byte priority in the low order byte. With PRTYAGE=nnnn the 1-byte priority is converted in some way and combined with other information (E.g., transaction arrival time, etc) and stored as an 8-byte priority field.

This 8-byte priority field becomes the value used to control dispatching of the transaction for the transactions remaining lifetime.

The CICS TS dispatcher keeps 3 queues of tasks within a CICS TS image. The 1st queue is the Ready queue (also called the Private queue). This is a queue of ready to run tasks in priority sequence. The 2nd queue is the Dispatchable queue (also called the Public queue). This is a queue of dispatchable tasks ordered by queue arrival time. The 3rd queue is the overall task queue. Basically this queue is all tasks that are in the system.

The CICS TS dispatcher will dispatch tasks from the Ready queue in priority sequence until the queue is empty and "from time to time" (IBM's term) it will add tasks from the Dispatchable queue as it looks for more work. When a task on the waiting/suspend queue is ready to run, it is added to the Dispatchable queue. And the dispatch process continues ...

Now, this is my own interpretation of how the dispatcher works and some might argue that the 3rd queue is not really a queue at all or that the 2nd queue (Public) is not used on z/VSE (since it has only one QR TCB) but I think it is pretty close to the process the dispatcher uses.

One more note about the 8-byte transaction priority field, this value can, at times, be modified by CICS TS to ensure quick or delayed dispatching. For example, when a task is first attached (started), its priority is set to a 'very high' value to ensure the initial dispatch of the task is done quickly. This ensures the attach process, including the calls to any Task Related User Exit (Start of Task TRUE) is completed very quickly. After this the task priority is returned to normal. At other times, the dispatcher may add a delay to the dispatch time. For example, if available storage becomes less than certain values, the dispatcher may add a task to the Dispatchable queue but add some number of milli-seconds to its arrival time (remember the Dispatchable queue is ordered by arrival time).  This effectively delays dispatching.

Are there problems with this design?

Overall the design is quite good. Certainly far better than the old CICS/VSE dispatcher. But there are a couple of issues ...

One issue with CICS TS PRTYAGE is that is only increases the priority of a task. To be effective a workload manager must also understand that heavy resource usage tasks (that should be batch applications) do exist and when running must have their priority reduced to allow other light resource tasks a chance to run. 

Also you need to use a very low setting for the PRTYAGE parameter because if you use the default setting of 32768 (32.768 seconds) then the task's priority would not be increased for 32.768 seconds. If a task had to wait to be dispatched for 32 seconds, you have a bigger problem than the PRTYAGE parameter is going to help. In fact, in z/OS this parameter's default value has been lowered from 32768 to 1000.  I would go further and suggest a value between 250 and 500.

Remember, as the CICS TS dispatcher ages a transaction the priority of the transaction increases until the next successful dispatch of the task. At that point the dispatcher resets the priority back to normal. The affect on the priority by PRTYAGE is temporary. While this makes sense (at least to me) it does show that PRTYAGE has little affect on dispatching unless the CICS TS system is very busy and transactions are backed up waiting to run.


OPTI-WORKLOAD to the Rescue

The CICS Workload Manager in OPTI-WORKLOAD will both decrease and increase the priority of a CICS transaction based, not on time, but on the amount of resources used (work done) by the transaction. These changes in the transaction's priority are permanent until or unless the CICS Workload Manager decides to change the priority again.

The CICS Workload Manager can manage the workload with or without PRTYAGE active. 

When using the CICS Workload Manager you specify the number of work units that are considered normal. This is the CICSINT value. If a transaction uses less than CICSINT work units nothing is done to its priority. If the transaction does exceed CICSINT work units and its initial priority falls in the management range (PRTYMIN to PRTYMAX) then the transaction will have its priority reduced by one. Over time a long running or heavy resource usage transaction will see its priority gradually lowered until it reaches the PRTYMIN value. At the same time, transactions that are waiting to run but unable to run because high resource usage transactions will see their priority gradually increased until it reaches the PRTYMAX value. The net result is transactions within CICS will have their priority managed up or down based on resource usage achieving maximum throughput. 

With the CICS Workload Manager active long running (batch like) transactions no longer block or 'lock out' normal or short running transactions.

Defining Transaction Priorities

Priority              Comment
_____________________________________________________________
0                     Lowest
1
...
...
PRTYMIN            \  Minimum managed by the Workload Manager
...                 \
...                  |- Range managed by the Workload Manager
...                 /
PRTYMAX            /  Maximum managed by the Workload Manager
...
...
254
255                   Highest 

There are many ways to define the initial priority of transaction within CICS. Some basic recommendations are ...

#1, Avoid 0 and 255. These priorities are the lowest and highest. And, often used by CICS housekeeping and utility transactions.

#2, Long running, high resource usage transactions should be defined with a low priority.

#3, Short running, low resource usage transactions should be defined with a high priority.

#4, Mixed usage transactions are difficult to manage without help.

Sometimes obeying these rules is easy. For example, a bank might have a Savings Balance (SBAL) transaction that simply reads a single customer record and displays some information about the customer's account. This is a transaction that should have a high priority. On the other hand, when the branch manager is balancing all the tellers with a TBAL transaction that reads the entire teller file, you have a transaction the needs a low priority.

Sometimes obeying the rules is not so easy. Perhaps you have a 'master' MENU transaction that invokes many functions and may even transfer control to many other transactions. Yet all of these transaction run with the priority of the 'master' MENU transaction. Another example might be a TCP/IP socket transaction that listens for connections. When a connection is made the socket is given to a child transaction for processing. Yet the child transaction might have the same transaction ID as the listener transaction. With these transactions, some are low resource usage transactions and some are high resource usage transactions. Proper management of their priorities is very difficult unless you have a CICS Workload Manager.

When using the OPTI-WORKLOAD CICS Workload Manager the whole process becomes quite simple. For example, define your CICSINT typical transaction work units, PRTYMIN (perhaps 50) and PRTYMAX (perhaps 150).

Rule #2, These transactions get a priority between 1-49
Rule #3, These transactions get a priority between 151-254
Rule #4, These transactions get a priority of 100 and are managed.

With proper CICS Workload Management CICS transactions response times are reduced, stabilized and far more consistent. 

Well there you have it. Enjoy.

Jeff Barnard
Barnard Software, Inc.










Thursday, January 21, 2016

Comparing SSL/TLS Facilities for z/VSE

Comparing SSL/TLS Facilities for z/VSE


We have recently received a number of questions about the SSL/TLS functionality available for IPv6/VSE. These questions are generally about the differences between IPv6/VSE SSL/TLS functionality and the SSL/TLS functionality available in the SSL for VSE feature of TCP/IP for VSE.

The question of SSL/TLS support is far more complex that just having an SSL/TLS client that is able to connect to z/VSE.

Most of the information presented here is also in the IBM publication z/VSE 6.1 TCP/IP Support SC34-2706-00 which provides information about SSL/TLS features available for both IPv6/VSE and TCP/IP for VSE. However, I will try to layout the information in a way that will allow you to easily compare the types of available SSL/TLS support.

For reference, when we talk about SSL/TLS sockets we are referring to a secure and trusted connection. The word secure means the connection is encrypted. The word trusted means the connection has been authenticated.

Authentication itself means the certificate presented to the SSL/TLS client by the server has been checked against the Certificate Authority's certificate to verify correctness, expiration, etc. Optionally, the SSL/TLS client can present a certificate to the SSL/TLS server allowing the server to authenticate the client too. This means that the SSL/TLS client and the SSL/TLS server are known to each other and have verified they are authorized to establish a connection.

The article will also refer to "Control Program Assist for Cryptographic Function" (CPACF). This is a feature of System z machines. The feature is disabled by default. After ordering and installing a System z machine, you much contact IBM to have this feature enabled. The feature provides hardware micro/millicode assist instructions to dramatically improve the performance of encryption and message digest/hash functions. CPACF features discussed in this article require using a z10 or newer machine.

Certification


To begin, lets look at the support itself. IPv6/VSE uses the IBM z/VSE port of the Open Source package OpenSSL. The OpenSSL package used by IBM has US government FIPS 142-2 certification and IBM itself updates the OpenSSL port with security patches and makes these updates available to customers in a timely fashion. US government certification of OpenSSL is a time consuming and expensive process where every part of the OpenSSL code is tested for correctness.

The SSL for VSE feature of TCP/IP for VSE was developed by Connectivity Systems, Inc. (CSI) and is exclusive to TCP/IP for VSE. I could find no information about the certification status of SSL for VSE.

A certified SSL/TLS solution is important because the certification process tests to make sure the SSL/TLS connections are made correctly and that no information is leaked in the process. Certification is far more that just being able to establish a connection or having someone tell you "Hey! I got connected!".

SSL/TLS Connection Types

IPv6/VSE supports SSLv3, TLSv1 and TLSv1.2 using RSA and Diffie-Hellmann key exchange.
SSL for VSE supports SSLv3 and TLSv1 using RSA key exchange.

While SSLv3 is supported, it should not be used. SSLv3 is completely broken and is little, if any, better than using plain text socket connections. The IBM OpenSSL port plans to discontinue SSLv3 support in the near future.

While TLSv1 is secure, IPv6/VSE supports TLSv1.2 connections. TLSv1.2 is the current TLS standard and is more secure than TLSv1.

While both SSL/TLS solutions support RSA keys, IPv6/VSE supports RSA keys sizes up to 4096 bits in length in both software and hardware.

While the SSL for VSE feature of TCP/IP for VSE Version 2.1 supports these RSA key sizes too, RSA keys sizes of 2048 and 4096 bits require Crypto Express2 (or later) adapter hardware. The SSL for VSE feature of TCP/IP for VSE Version 1.4/1.5 does not support RSA key sizes greater than 1024 bits.

Since RSA keys sizes less than 2048 bits are now deprecated and insecure, the SSL for VSE feature of TCP/IP for VSE Version 2.1 and a Cyrpto Express2 (or later) adapter hardware is required for secure sockets using SSL for VSE.

IPv6/VSE also supports using the more secure Diffie-Hellmann (DH) key exchange. DH key exchange is different method of managing session keys and is more secure than RSA key exchange. SSL for VSE does not support this type of key exchange. 

Why is DH Important?

The following diagram shows an SSL session key exchange using RSA.

The weak point is that the secret session is part of the SSL session data. The key is sent from the client to the server, encrypted via the public server RSA key. If the private server RSA key is compromised, stolen, or broken, the session key is no longer secure and it is possible to decrypt and read the complete session data. 

Another method of exchanging the session key, is by using Diffie-Hellman. Using Diffie-Hellman, the session key is never sent over the network and is therefore never part of the network session data. The following example shows how the session key is negotiated using DH. Therefore, the common term is not “exchanging a session key”, but rather “agreeing on a common session key” through the DH key agreement process.


The important point is that the encryption key is created independently on both sides. Therefore, it is not possible to reveal the session key from a given recorded network session later. For simplification, the example does not show how the two communication partners are authenticated. In practice, DH is mostly used together with certificate authentication by using RSA.

DH key exchange is the most secure method currently available and is fully supported by IPv6/VSE. Again, SSL for VSE does not support DH key exchange.

Encryption Methods

Supported by SSL for VSE ...
01 SSL_RSA_WITH_NULL_MD5
02 SSL_RSA_WITH_NULL_SHA 
08 SSL_RSA_EXPORT_WITH_DES40_CBC_SHA 
09 SSL_RSA_WITH_DES_CBC_SHA 
0A SSL_RSA_WITH_3DES_EDE_CBC_SHA 
2F TLS_RSA_WITH_AES_128_CBC_SHA 
35 TLS_RSA_WITH_AES_256_CBC_SHA 

Of these encryption methods 01, 02, 08 and 09 should not be used because they are insecure.

Supported by IPv6/VSE ...
0A DES-CBC3-SHA 
2F AES128-SHA 
35 AES256-SHA 
3C AES128-SHA256 
3D AES256-SHA256 
16 EDH-RSA-DES-CBC3-SHA
33 DHE-RSA-AES128-SHA
39 DHE-RSA-AES256-SHA 
67 DHE-RSA-AES128-SHA256
6B DHE-RSA-AES256-SHA256 
C012 ECDHE-RSA-DES-CBC3-SHA
C013 ECDHE-RSA-AES128-SHA
C014 ECDHE-RSA-AES256-SHA
C027 ECDHE-RSA-AES128-SHA256

Note: The z/VSE 6.1 TCP/IP Support manual states the IPv6/VSE supports method 09 and C011 but these algorithms are insecure and no longer supported.

IPv6/VSE supports far more encryption methods using both software and hardware.

Message Digest/Hash Algorithms

Each message (packet) transferred using SSL/TLS includes a message digest or hash value. This value is used to verify the data has not been altered. A Message Digest is also used to secure certificates and ensure the certificate has not been altered. Therefore the algorithm used to calculate this value is very important.

The message digest MD5 is completely broken/insecure and should no longer be used.
The SHA1 family of message digests can be used for secure sockets but should not be used to authenticate certificates. The difference is how long the hash value is used. For secure sockets the hash value is used only for the duration of an SSL/TLS transfer. For certificates the value is used for a long time, perhaps years. This gives hackers far to much time to hack/break the digest key.
The SHA2 family of message digests are current and secure.
The SHA3 family of message digests are also secure and the industry is moving to SHA3.

IPv6/VSE supports using SHA1 and SHA2. It is planned to add SHA3 in the near future. These values are supported using either software or hardware (CPACF).
SSL for VSE supports using SHA1 and with the introduction of TCP/IP for VSE 2.1 under z/VSE 6.1 SHA2 was added. However, the CPACF hardware on a z10 (or newer machine) must be available and enabled for SHA2 to be used.

The industry is now using SHA2 and moving to SHA3 to secure certificates and communications.

Sample Connection

Google Chrome was used to create this blog entry. I used the information available from Chrome to display the type of SSL/TLS connection.


So, Chrome created a TLSv1.2 secure socket using ECDHE_ECDSA. This is an Elliptic Curve cryptographic session created using DH key exchange. After the key exchange process completed a 128-bit AES key is being used to encrypt each message. IPv6/VSE fully supports this type of TLS session while SSL for VSE does not.

Viewing the certificate information shows that the certificate's fingerprint or message digest hash uses both SHA-1 and SHA-2 (SHA-256). This allows both older (less secure) and newer (more secure) clients and servers to authenticate the certificate.



Summary

Maybe the real question is simple.

Can I get a secure and trusted socket using IPv6/VSE? Yes, easily. Many options exist to provide secure communications using both software and hardware.

Can I get a secure and trusted socket using the SSL for VSE feature of TCP/IP for VSE?

Maybe. Only if you ...
  1. Are using TCP/IP for VSE 2.1 and
  2. Running on a z10 (or newer) machine and
  3. Have CPACF  available and enabled and
  4. Have a Crypto Express2 (or later) adapter

Basically, if you need truly secure and trusted socket support I strongly recommend you move to IPv6/VSE.



Credits

IBM publication z/VSE 6.1 TCP/IP Support SC34-2706-00

The diagrams in the "Why is DH Important?" section and the text around them was lifted directly from the IBM manual with minor edits to the text by me. Actual credit here is likely to be to Joerg Schmidbauer at the IBM z/VSE lab in Germany. Joerg is the person who did the actual port of OpenSSL to z/VSE. Very impressive stuff if you ask me.