January 24, 2011

Exchange "Evolution"

Thousands of years ago I rambled on about where MS had missed the boat a couple of times in their evolution of the flagship server product, but given what's happened with other products and that I seem to have pruned out some old posts the topic bears a redux and an update.
Ye Olde News (1):
Exchange 5.5: One 16GB store or one 'unlimited' store (not that large really since Windows couldn't cope in terms of either memory or raw storage and the size of backup tapes were very small, even in the large libraries. 50GB was about as large as many dared go) The RTO in hours of Exchange could never a be number less than three.
Exchange 2000: 20 stores
Exchange 2003: 20 stores (16GB for a while and then up to 75GB but Microsoft never wanted you above 40GB)
Exchange 2007: 50 'unlimited' stores. Unlimited here largely meant 50GB except CCR where 200GB was oft mooted. I can point to endless customers who wouldn't take it above 40GB.
Exchange 2010: 100 unlimited stores. Here is better in that 2TB is the number. Imagine if you were backing that up and didn't have snapshot protection to get it restored in a number less than five minutes.

Ye Olde News (2) - and some newish thoughts.......
Exchange 5.5: Single copy clustering and the number of people who couldn't keep a cluster up for more than five minutes was scary.
Exchange 2000: Ditto. In some cases more so! SAN a requirement because of the increased storage capability.
Exchange 2003: Same storage story, lots more stability. Exchange 2003 was more than stable enough for one copy and the availability requirements of most businesses. Those that deployed clusters never really got a huge uplift in availability because the failover always used to take a little while and, maddeningly, admins always wanted to fail it back for some wacky reason, mainly because one box was called mail01 and the other was called mail02. Heaven forbid that they run on mail02, regardless of the fact that the cluster was called mail.
Exchange 2007: This is where it all went wrong.
There was nothing wrong with the availability model in 2003 but because the people who looked after the storage in the Exchange 2003 days did something "sub-optimal" and because they did that sub-optimal thing on a platform that wasn't forgiving of sub-optimal and was badly configured so that it couldn't handle sub-optimal anyway, they decided to take the storage out of the equation as a potential source of trouble. Exchange 2007 brought in two cool sounding but ultimately ill-conceived things; LCR and CCR. Nobody used LCR and everybody wanted CCR because it was the new thing. Nobody really needed CCR and in fact few NEEDED it's sister; SCC. But because it was new the admins just had to have it. MCS consultants were pushing CCR like someone in North Philly pushes little plastic bags and their rock-like contents. Now, SCR was a pretty cool invention. 9/10 for innovation but 2/10 for execution. Storage vendors that had replication solutions that did not force a full resync after a site failure had nothing to worry about. SCR had its place but it was pretty niche. Then of course there was the fact that the Exchange PG and MCS tried to shift everyone to DAS right at the point that everyone with architectural vision wanted to virtualize and/or implement blades to reduce their data centre footprints. Good luck with DAS on a blade or VMware without doing something inefficient like putting a storage chassis into your blades. #fail. Suddenly you had three copies of the storage running on three active servers when only a couple of years ago there was only one active server and some storage in the DR site. You went from a decent level of storage efficiency down to 33% or less. Remember that you also changed RAID types from RAID5 down to RAID1+0 (0+1) (10) etc. so there were more disks for less overall data. Overall efficiency is now probably in the 20% area compared to Exchange 2003.

Having just seen the industry fly off at a tangent to their own world view the Exchange product group looked at Exchange 2010 and decided what they had done in 2007 wasn't nearly enough. Firstly they improved the IO again and then said to use SATA rather than SAS (three versions of the product, three disk types). That wasn't so bad but what came next was, err, umm, interesting. Not content with ramping up from one copy to three active copies from 2003 to 2007, Microsoft went from three copies to four copies in 2010. The original idea was for that fourth copy to be an hour out-of-date and have logic to bring things up to date as and when necessary. Papers came out citing a fifth copy that was even further (a day, two, etc) out of date in order to eliminate backup. It's a credit that they have now* stepped back a little bit but I assure you the number of customers who are planning a fourth copy where there is zero need is amazing. Luckily I am usually able to talk them down from that particular cloud (ahh, cloud, that reminds me - more later) So where n=1 with Exchange 2003, n=4 with Exchange 2010 (25% efficiency) - except that it wasn't. As part of the IO reduction effort the database was flattened and single instancing had to go, although compression did come in. With a lagged copy you were into 20% efficient on storage and 25 to 33% efficient on servers.

So, where are we?
Exchange 5.5 was DAS because SANs were expensive.
Exchange 2000 and 2003 was SAN because the storage requirements of Exchange exceeded the available DAS chassis availability in a lot of cases.
Exchange 2007 was suddenly local SAS because MS wanted nothing other than to eliminate the SAN because they couldn't manage one particular model themselves and assumed that everyone else was less competent than them.**
Exchange 2010 was suddenly local SATA rather than SAS because MS made some serious improvements to the IO model.
Pro DAS Exchange 2007 was released at the point where many customers were looking at virtualization. Exchange 2007 single-handedly delayed the tipping point of virtualization. It is perhaps no coincidence that Microsoft did not have a production-grade virtualization product at the time and didn't really know enough about their own product to accurately predict the IO profile in a virtual environment.
Pro DAS Exchange 2010 was released at the tipping point to virtualization but because Microsoft had missed the boat on Hyper-V they still didn't want you operating a virtual environment because that meant putting money into the hands of VMware. Same blocking factor, different day. Only this time people aren't wearing it. Virtual environments are winning in the dynamic data centre and centralized storage is back where it should be; efficiently managing the integrity of customer data.

What's coming?
What's coming might be what's happening with other Microsoft products. The SharePoint people have hit an absolute home-run with 2010 and the RBS. Having a small database containing metadata and a big old BLOB store on a file share is the next big thing. In fact, it's the now, not the next. But there's a problem. If Exchange 20xx did this, right now, the use of DAS is useless. The use of Windows file servers is useless (Windows & SMB (CIFS) is not an optimal file-serving platform). What is SharePoint looking like in the SAN world? It's small databases stored on the SAN with the disk metadata on SSD based victim-cache. It's data files on NAS and the disk metadata on that same SSD. It's database and data files on SATA disk, SAS at worst. It's high density, high efficiency, highly performing storage. Exchange 2010 is this thoroughly annoying step-child in the corner that everyone hates. The data centre people hate it because of the footprint. The server teams hate it because they have to manage physical resources in a time where most of their assets are virtual guests. The storage people hate them because they are lowering the ROI of the SAN and increasing overall TCO for storage and support across the board. The finance people hate the Exchange people because the Exchange people blindly follow what Microsoft tell them and insist on three or four copies and bloat the cost of their environment. It's a hard life being an Exchange operations guy in 2011 when you don't see the writing on the wall because there are too many of your own servers in between you and the wall!

To summarise.
Exchange had it almost right with Exchange 2003. They went just that little bit too far with 2007 and missed virtualization and the new data centre. They compounded that problem with 2010 and hurt their customers bottom lines, their own DPM and Hyper-V product groups and their own credibility. They have positioned themselves at the opposite pole to the coming raging success within Microsoft; SharePoint.

The bottom line here is that anyone who looks back more than one year into the past should see what a success the Exchange product group has had at seeing the trends and exploiting storage capabilities. Right about now the numbers you are looking for are (1) 408 822 6000. (2) 866 438 3622 (3) 877 426 2233.

*edited 'not' for 'now' - my bad.
** I will point out that not having a SAN in use with such high turnover environments such as Windows / Exchange builds and dog-fooding is not a bad thing. HBA drivers and firmware are all on a particular support matrix for Windows builds. If there's a version of SP of Windows or Exchange on the box there could very well be problems until the HBA vendor updates drivers or firmware. The same applies, of course, to system boards, disks and chip sets but to a lesser and more manageable degree.

2 comments:

Quazex13 said...

No problem on your first two phone numbers, the first is for NetApp, the second is for EMC, but the third is for Busseys, a bed and breakfast in Glen Rose TX. Not exactly Exchange related.

Mark Arnold said...

Hey, that's a super-secret storage vendor. It's like Universal Exports in the Bond movies.