Friday, December 6, 2013

z/VSE GbE Throughput Rates

z/VSE GbE Throughput Rates

Your new z box just arrived, was installed and now you are up and running on that amazing new z box.

Wait! Now we have OSA Express Gigabit Ethernet (GbE) adapters connected to our GbE switch! Lets see how fast the new network will run!

And then disappointment. Sending data to the FTP server on your network isn't much, if any, faster than it was before. What do I do?

Before you start remember the 1st rule of GbE networking ... Everyone does GbE or no one does. Wait, what does that mean? It means that a GbE networking roll out requires effort and planning because even if you have a new GbE network adapter and GbE switch if you are connected to someone that does not have a GbE network adapter, you must talk at their level (10Mb or 100Mb).

What do I need to think about?
Switches, Routers, cables and network adapters (just to start).

OK, what else? What else affects throughput on a GbE network?

TCP Window Size

The TCP receive window size (rwin) is the amount of data that can be sent to a remote host before the local TCP/IP stack must stop transmitting and wait for an acknowledgement. Believe it or not it is common in the Windows PC world to use 8K or 16K as a receive window size. Normal windows sizes range up to 64K. The solution to this issue is to use TCP Window scaling. z/OS, z/VM, Linux, Windows and IPv6/VSE on z/VSE all support this. However, a Windows PC will use TCP window scaling only if the remote host requests it and the application has requested it (SO_RCVBUF size of the setsockopt() call). On other platforms including Linux, using TCP window scaling is automatic.

The benefit? With TCP window scaling window sizes can be much larger. 256KB to 8MB are commonly used. This allows far more data to be transmitted before the transmitting host must stop and wait for an ACK from the remote host.

TCP Packet Size

Typical Ethernet packets are 1500 bytes in length. Sending 1GB of data takes 685,000 packets. Can't we send more data in every Ethernet Frame? Yes, GbE network interfaces support Jumbo Ethernet Frames. Jumbo Ethernet Frames can be up to 9000 bytes in length. That is the official story. The truth is some GbE adapters only support 'Baby' Jumbo Frames. Baby frames are up to 7000 bytes in length. More truth? Lots of adapters support Jumbo frames but of smaller sizes. For example, I have an HP laptop with an R8168 GbE adapter in it. It turns out the R8168 only supports Jumbo Frames of up to 4096 bytes in size. You also have to check the switches used by the GbE adapters, they too must support Jumbo Frames.

Needless to say, you must check your adapters, everyone of them, before attempting to use Jumbo Ethernet Frames.

So, why go through all that work? Well, sending 9000 byte frames means that you will send 1/6th as many frames (or about 83% less packets). Sending 9000 bytes frames takes only slightly more time than a sending 1500 byte frames. More data in about the same time means higher throughput. And, since 1/6th as many packets are used to send the same amount of data, less CPU is used to handle the transfer.

What about my older network adapters? Do I have to change them all? No. Absolutely not. When a host using Jumbo Ethernet Frame establishes a connection to a host that supports only 1500 byte frames (standard frames), smallest wins and the connection will use 1500 byte frames.

Latency

Latency is a big deal. If you are sending packets through a switch that adds 2ms of latency to each packet, you have a big problem. You are always at the mercy of the slowest hop. Large companies often daisy chain switches for several levels. Each link in the daisy chain adds to the latency involved in sending a packet from one host to another.

In general this is not a problem but for applications that require bulk data transfer (FTP servers, database servers, etc.) it is best to ensure they are all connected to the same high speed switch.

TCP Retransmission

TCP re-transmits always slow a transfer. TCP re-transmits at GbE speeds are horrible. Historically network folks would say that as long as you have less than 3% re-transmission you are OK. Not true at GbE speeds.

1Gb of data is 125,000,000 bytes or 83,333 packets of 1500 bytes each. To transfer that much data in 1 second you much send a packet every 12us (micro-seconds). Since re-transmission is time based, and usually the timer is in the 2ms to 20ms range, even 2ms is a very long time a GbE speeds.

If your re-transmission rate is 0.1%, which sounds good, you are re-transmitting 83 packets every second. At 2ms per re-transmit you are spending 15-20% of your transfer time waiting on timers instead of transmitting. That alone will reduce your throughput dramatically.

Fortunately re-transmission on a local subnet should be almost non-existent. Check you network statistics. If you are seeing any re-transmission at all on a local subnet I would suggest looking into the issue. I have seen cases where a failing network adapter was causing a number of problems with network throughput. Dropped packets and re-transmission resulting in throughput issues were the only real indicators of a problem.

Ethtool, netstat and wireshark will be your friends in helping to locate possible problems in this area.

CPU Speed and Availability

Until fairly recently there was a one-to-one relationship between a processor and an OS image (Windows, Linux, etc.) running on a PC. With the arrival of virtualization on PC hardware there are now hypervisors (Linux KVM, VMWare, Zen, Hyper-V, etc.) running on the physical hardware and the OS is running inside a virtual machine. The virtual machine actually uses virtual hardware with a virtual CPU, memory, disk and network adapters.

It is important to remember that a virtual device, and more specifically, a virtual network device is basically a CPU function. The hardware is at some point being emulated by the hypervisor using the physical CPU. All of this emulation takes CPU away from the OS reducing the ability of the OS to get work done.

Often a hypervisor will be running several OS images requiring the hypervisor to create a virtual network switch with virtual network adapters connected to it. All of this is actually a CPU function. So, emulating a virtual GbE network environment may take more CPU overhead than you might expect. Knowing this, don't be surprised to find that your virtual network adapter is defined as a 100Base-TX adapter instead of a GbE adapter. And, this will effect your throughput.

Those of us in the mainframe world have been doing this for upwards of 40 years using z/VM, VSwitch, Guest Lan, etc. but this is all fairly new to the PC world.

Another CPU issue to take into consideration is other activity on the sending or receiving host. If a file transfer is sending data to a remote host and the file transfer is running at a lower priority than other applications on the host there may be CPU availability issues. After all, while a file transfer application uses very little CPU itself, it is important that it have access to the CPU when it is needed to keep the flow data moving. If the file transfer application wants to run but other, higher priority, applications are using the available CPU then the flow of data will slow. The same concept applies to the receiving end of the data transfer too.

A Simple Network


+z box---------------------------+
|  z/VSE --- OSA Express GbE     | 
+--------------------------------+
                    |
                +--------------+
                |  GbE Switch  |
                +--------------+
                        |
                    +PC box--------------+
                    |  Windows           |
                    |  Linux             |
                    +--------------------+


What could go wrong?

Its a pretty simple picture. Two machine connected to a high speed GbE switch. Here is another thing to think about. Virtualization. The very simple network diagram above may have many more levels if the z box and the PC box are using virtualization. The z box may be running z/VM and the OSA Express GbE adapter may be the physical part of a VSwitch (virtual switch) being shared by many z/VSE, z/OS, zLinux and CMS images. At the other end of the diagram the same concept may be true of the PC box. It may be running a hypervisor with a virtual switch connecting many OS images (Windows, Linux, etc.) using virtual Network interfaces.

All of this must be taken into consideration when determining potential network bottlenecks.

Questions to ask ...
Are all the devices (including virtual devices) used from beginning to end GbE? (Slowest wins)
What TCP window size is being used? (Smallest wins)
What MTU size is being used? (Smallest wins)
Is latency a problem? (Check your ping timings)
Is TCP re-transmission a problem? (Check your network statistics)
Is the application sending data getting the CPU it needs to achieve maximum throughput? (Check priorities)

Well, there you have it. A few items to think about if your GbE throughput isn't what you want or expect it to be.


No comments: