July 29, 2008

Vmware Virtualization and VDI with NetApp FlexClone and De-Duplication: Which solution and when.

All views my own, not of employer, mileage may vary, first born may be sacrificed, value of investments may go very down and not just down)

When it comes to virtualizing your servers and workstations you are going to spend quite a bit more money on the SAN instead of a bunch of local disks. What's the point of buying 300GB disks when the OS and any applications you have use up 10% or so. Virtualization is all nice and Green because it optimizes disk utilization on the SAN and saves kilowatt hours on the server, aircon and all the rest of the Green marketing story.

The problem.
The problem is that unless you’re careful you might end up with far more spinning stuff on the SAN than you really wanted. Remember, every storage platform has a disk or terabyte limit before you get a cordial invitation to participate in the next model. With NetApp there is a simple process of replacing the 2000 series for a 3000 series and then for a 6000 series, or any one of the models within that range. With EMC you might end up tearing out your Clariion for a Symmetrix which isn’t the simple migration process you might think.
Fortunately, salvation is at hand. The first thing to understand is what the heck FlexClone and De-duplication are.

What’s what.
FlexClone is a feature, unique to NetApp that allows you to take a LUN on the SAN (NetApp people go with me, this is a gross oversimplification for illustrative purposes) and clone it so that another server can use it as if it was a plain, normal, conventional LUN. The clever part is that the clone, or FlexClone since NetApp market just about everything as FlexThis or SnapThat, takes up practically zero space on disk. Because nothing has (yet) changed on the clone there is no changed data between the original and the clone. When the new server comes up it (obviously) gets a new IP, name and other configuration changes. This means changed disk blocks so the FlexClone starts to grow in size, but by mere kilobytes at a time.

De-Duplication is another unique NetApp feature. Oooohhhh no it isn’t, I hear you all cry. True, but nobody else can do de-duplication on their primary data and not have it run like a dog as a result. NetApp can, and how, so yes, it’s unique. De-duplication works in a different manner depending on who the SAN vendor is and I am nowhere near intelligent enough to talk about the intricacies and how duplicates are identified. Anyway, the algorithms look for identical blocks on disk, not simply identical files etc, and take the duplicate information away, leaving pointers. As an aside, a typical file share will be 90% duplicate data so you see the scope for space savings. When the data is called up it is constructed from de-duplicated blocks and presented seamlessly and without performance impact to the user.


So what?
Using FlexClone allows you to have dozens of servers, either booted from SAN or as VMDKs (yes, and VHDs) using a single core image and a lot of extremely small difference based LUNs.
Using De-duplication allows you to have a whole load of normal sized LUNs (or so you think) and for NetApp to scour the place for duplicates. Since every Windows server has largely the same core set of dlls, exe’s and all the rest of it you’ll see a huge scope for reduction in physical disk space consumed on the SAN.
The same goes for workstations that are presented to clients using the VDI technology. All the core workstation files are identical but the trick is to know what technology to use, when and why.


Scenario, anyone?
Let’s take a look at Windows server virtualization first. De-Duplication is the way to go here. A server is going to be there for five years (the support life of Exchange or SQL for example) and you will need to service pack it with jolly large files. Whilst FlexClone is perfectly acceptable you will understand that each SP will increase the size of each cloned LUN by (more than) the size of the service pack. Since the service pack contains all the same information you are duplicating a lot of information, hence the suggestion to use De-duplication technologies to reduce the space. Another reason is that if you need to “take the server image somewhere else” it’s quicker to “re-inflate” the server rather than split the clone, which NetApp will let you do pretty easily anyway, but that’s not the point.

Workstations are a different issue. If you have planned your VDI workstation deployment properly (and of course you have) there will be no user data on those workstation images. Folder redirection will allow the users to save to the network (on a NetApp CIFS share naturally) and group policies etc. will have essentially denied users write access to what is seen as the C drive.


All your applications and their configurations will be available through SMS and its successor product set. Profiles (pretty desktops etc) will be stored in home or profile directories, again on that nice flexible NetApp based CIFS share. The thing I’m trying to get you to walk away with is that the desktop image is just about the most dispensable bit of infrastructure you own. In this case FlexClone is the right thing to do. You have a gold image and then clone off as many desktop images as you need. If you need to install a service pack you can make a new gold image and deploy that to the users as required using FlexClone all over again. User or department specific applications are installed as and when required. The beauty is that you’re not destroying the original set of data, you create an entirely parallel set of images and clones. When the users are happy with their new workstation you can destroy the original set. Unless you have duplicate PCs for everyone you cannot give that kind of service in the physical world. Over the course of a week or so you eventually eliminate all of the original FlexClones and the users exclusively access their new FlexCloned workstations. You are left with a single image of the service packed workstation and a bunch of small clones. Better yet, you don't end up with the old "my workstation is slow but I haven't done anything to it" story; everything is refreshed every year or so.


Can you mix and match? Heck, sure. There are plenty of times when you can FlexClone and de-duplicate. Anything that isn’t a Windows server is perfect because they don’t generally suffer from multi-hundred-megabyte-service-pack-itis. A bunch of Linux servers running as a web farm would be cloned from a single image and then de-duplicated as well. The initial de-duplication wouldn’t reduce much but as the content management system shoved out data to them the overall storage would increase only marginally as NetApp captures and eliminates duplicate information.


What a load of nonsense, my users have all got different requirements! Err, no, chances are that they don’t. If you’re investing in innovative technologies such as VDI you really should have your house in order and are using policies, profiles and application deployment technologies as well. There’s no point doing half the job. If you have several thousand users on a site and you end up with a dozen or so full sized (almost) images, that’s better than several thousand images. Of course, another option is to start with the base build of Windows and Office and then clone that out, one clone for each department for example. Each department gets their own set of generic applications and you then FlexClone the result out hundreds of times (Did I mention you can FlexClone FlexClones without any issue?). You end up with one image that’s 10GB in size, a dozen that are a gigabyte or so in size and then many thousand that are a few megabytes in size.


Are there pictures? Naturally. We do pictures. I’ll get a few together.

2 comments:

Jeff said...

I hear all these buzz surrounding flexclones and data de-dupe with Netapp. But has anyone ever considered the performance penalties incurred with such an architecture? Thin provision will save you disk space, but you will get penalized in performance if there are different blocks being written out to disks regularly, and Netapp is especially susceptible to fragmentation with their WAFL system. So unless you understand your access patterns well, you might be putting the rest of your production LUNs at risk by turning flexclones and de-dupe on just to save disk space.

Mark Arnold said...

Maintaining performance is at the very top of the list of considerations. It is always important to ensure that there are enough disks to provide sufficient I/O in order to meet user and application demands. If certain criteria are not met then FlexCloning and DeDuplication simply are not recommended. Remember, just because the technology is there does not make it either necessary nor appropriate in every case.

The WAFL works does not make NetApp any more susceptible to fragmentation than anyone else. It's a topic for a deep-down specialist to give but suffice to say the fragging is an item of FUD spread by NetApp competitors. Everyone spreads it, there's an element of truth in everything but you certainly shouldn't jump on the narrow interpretation.

Key take away: necessary and appropriate.