Techdirt Insigit Community Share your feedback on the rapidly evolving
Storage Area Network (SAN) market.
Powered by the Techdirt Insight Community.

Storage decisions for virtual servers

 

Todd DiGirolamo Submitted by Todd DiGirolamo on September 5th, 2008

While these concepts and decisions should be fairly basic, I thought I would throw them out there and see if I either get some feedback from others who have gone through some of the same decision making processes, or possibly help out someone who is going through it right now.

As I worked on the design of my upcoming server virtualization infrastructure, obviously storage was one of my primary decisions. I already had my primary storage platform in place with various SAN arrays sitting behind a storage virtualization appliance, and I pretty much knew that I would just be expanding on that to support the new virtual machines. Plus, that fit in perfectly with the HA design that was already decided for the VM host machines.  I rolled out two quad proc quad core hosts with 64 GB of RAM each, and then 1 smaller 2 proc quad core machine with some DASD and a decent amount of RAM for a staging box for my P to V migrations.

Next I had to decide on the arrays. I pretty much knew that any physical box that was currently RAID 10, would be RAID 10 as a VM. What I had to decide was if I would just go RAID 10 across the board. Many physical machines were RAID 5, but virtualization adds its own bit of overhead, and there’s never anything wrong with boosting performance on anything. It really wasn’t cost prohibitive in buying the physical disks, but it did present a problem in maxing the physical disk count per controller a little quicker than I liked. Also, my IOPS per $$$ looked better with going all 10, but again, it was going to lead to bringing in additional controllers to support the number of physical disks that were going to be required. So in the end, I did a server by server worksheet, and ended up about half and half. I simply have TIER 1 and TIER 2 disk groups in my storage arrays to support the 2 groups of servers I decided on. Everything is very manageable and scalable, and I felt like I got the best bang for my buck.


We started our P to V migrations, and quickly had our first dozen or so servers up and going. We monitored performance, tested failover, and felt comfortable with what we had put in place. We were ready to proceed aggressively at this point. We knew we couldn’t virtualize everything we had in the data center (about 130 servers), as we needed to stay under 50% utilization on the cluster to maintain high availability, but we wanted to get a good number of physical boxes out of there. Soon we were near 40 VM’s, and all was running great. Disk performance was more than satisfactory, including the VM’s on the large shared RAID 5 arrays.

Next came the part that I’m sure many admins of virtualization run in to. Besides virtualizing production boxes, our developers (around 100 or so) had little test servers all over the place. Some were retired production boxes, some were just PC’s running Server, etc. Time to clean house, get all these things virtualized, and give them the benefit of snap shots and so on. It all went great as we piled up the carcasses. But all of a sudden, our utilization was spiking up near 60% on CPU and memory. While no single one of those was a hog, the group of them took a significant amount of disk, processor, and memory. Now I was stuck in the position of not being able to virtualize anything the rest of this year.

Then I thought, while it was great to virtualize all those little test machines, why am I chewing up the resources on my HA cluster, and throwing high end SAN disk at them. What to do. Then I thought, now that the bulk of the P to V migrations are done, and we are totally comfortable doing them going forward, that staging server sure does just sit idle now. With all I push for centralized SAN disk, moving those low end test machines on to that staging server and its DASD array, sure looked attractive. And that’s exactly what I did. 8 cores, 24 GB of RAM, and over ½ TB of 15K DASD. A perfect home for those machines, and now my HA cluster has breathing room. I worried a little about losing HA on those, but then again, they’re test (no SLA for test machines!). So, test machines, snapshots, clones etc., are all happily existing on that box. A small risk, and it would be minimal pain if we lose it.

So, in the end, my virtual infrastructure ended up a combination of large RAID 5 arrays, pooled up smaller RAID 10 arrays for IO intensive machines, and some good old DASD for Dev/Test. I nearly went into this with the “what’s good for one is good for all” mentality, but am very happy with the mixed environment I ended up with.

 

Leave a Reply