Techdirt Insigit Community Share your feedback on the rapidly evolving
Storage Area Network (SAN) market.
Powered by the Techdirt Insight Community.

Common SAN Performance Issues

 

Lukas Kubin Submitted by Lukas Kubin on June 5th, 2008

Performance of storage devices often becomes a subject of controversy between storage reseller and the customer. There are always some false expectations, wrong setups and hot heads when it comes to performance. I decided to write about the trouble sources I’m meeting most often. I believe enumerating and discussing them might not only help storage beginners, but hopefully it will inspire storage vendors to improve existing features or develop new ones.

Issue 1 - NAS Protocols’ Single Client Limits

Many of us see this every now and then: Having a new shiny array attached to a Windows server, an unlucky new owner benchmarks copying some iso file between his laptop and a Windows server’s share. While passing 40 MBps we start celebrations, thus making the man wonder even more.

More seriously now. Don’t expect any common NAS client to get much above 40-50 MBps. File sharing protocols like SMB/CIFS (Windows sharing) or NFS (Unix file sharing) are designed to serve multiple parallel clients. A single client is not able to get the most out of it and thus to prove the SAN performance. No, even copying multiple files in parallel with a single client doesn’t help.

If you really need to prove a SAN performance through NAS protocol, consider deploying more NAS clients, more laptops. Some 10 to 15 might suffice to fill in the uplink from array to NAS appliance.

Future?: Is single client performance of NAS protocols something needing improvement for the future? I’m quite sure it is. The file sizes are increasing, users and even application backends need to transfer larger files between machines. The backend block-level storage infrastructure is speeding up too, we got 10gbps there now. The traditional protocols’ improvements don’t copy this trend however. There are few vendors like F5 or Ibrix developing their ways how to obey these limits. Their coverage is minor, however, compared to how many users depend on CIFS. There is a lot of room for improvements in my opinion.

Issue 2 - Random vs. Sequential vs. Rotating Drives

Many people believe new 10Gbps Ethernet or 8Gbps FC infrastructure will help their applications run faster. Will they? To a large extent they won’t. How large an extent? Well, as large as many database-like applications they run. The core of the problem is a common misunderstanding based on an expectation that all applications need big pipes to perform “fast”.

I’ve just come to the important point, which is to realize what traffic pattern each application generates. Whether it is close to random (relational databases, mail systems, ERP applications) or sequential (file server, streaming applications). As long as we think of conventional rotating drives, different drives are differently suitable for each type of traffic. Perhaps everyone heard SAS was for databases, SATA for files. The difference is in drive’s design, latency, seeking algorithms, rotations - this set of features allow SAS (or SCSI, FC) drives for more head movement and more sectors reached throughout the drive to serve the database. Database applications are eager for multiple small blocks residing on various places of the drive’s plates. That’s why SAS drives are able to feed them faster but almost never at the speed of sequential transfer.

Today, the simplest performance solution for sequential traffic is to use “big pipes” while for random traffic - except of choosing the right drive type - the number of drives helps to gain more I/O operations.

Future?: I’m quite sure the future will bring a great change to the conventional concept. As memory based drives have no rotating parts, they have no problem with time consumtion seeking the correct sectors required by application. Theoretically, their random performance is the same as their sequential performance and it is really high, tens times faster than rotating drives. Practically however, the today’s NAND based SSD drives suffer from a big latency problem caused by the need to erase each block before it can be re-written. I believe, and there are even some notes by anonymous engineers, that disk engineers are working hard to solve this problem and thus allow flash drives to become the best performance option for database applications.

Issue 3 - IOps vs. MBps

The last performance issue I’m going to express is not a real problem, it’s just a common misunderstanding of units used to measure performance. In promotional materials, magazine articles, discussions and many other places I am reading statements that random (database) performance is measured in IOps while sequential performance in MBps. What these sources usually don’t mention is that both these units are related to each other and both of them can be used to describe any sort of traffic. It is just so that IOps better express database requirements which are operations related and MBps better express those throughput numbers used for file transfers.

The third unit which binds all this stuff together is blocksize, ie. the amount of data an application uses to form each I/O request. All these units can be put into a simple equation:

throughput (kBps) = operations (IOps) x blocksize (kBps)

How to interpret these numbers? For example take a common mail system, generating load of 500 IO operations per second. It is a database-based application, using 8k blocks. Placing these numbers in the equation above you’ll get a throughput of 4000 kilobytes per second. Such number would be really bad when seen on a file server. For that mail system however, it means 500 operations per second which should suffice for about 500 users. That is quite enough and what is more, you only employ some 4 or 5 SAS drives for it.

Look at the attached picture (sorry, I couldn’t embed it into this page). It displays results of a single IOmeter test running a random operations benchmark. The two graphs display MBps (left) and IOps (right) on their y-axis, blocksize is on the x-axis. Just notice the two graphs are two interpretations of the same test, the same performance under given blocksize.

There is no future paragraph for this issue, I just hope I helped someone to understand these units a little more ;-)

 

One Response to “Common SAN Performance Issues”

  1. Common SAN Performance Issues Says:

    […] Common SAN Performance Issues Today, the simplest performance solution for sequential traffic is to use “big pipes” while for random traffic - except of choosing the right drive type - the number of drives helps to gain more I/O operations. Future?: I’m quite sure … […]

Leave a Reply