Disaster Recovery on a Shoestring

At Loughborough we're currently gearing up to roll out the Terminal Four Site Manager CMS. We're looking at quite a nice hosting environment, with multiple front end web server VMs at separate locations with Linux Virtual Server for load balancing and failover.

This setup is great for handling problems like localised hardware failures and operating system bugs, but what happens in the event of a catastrophic failure such as the fire that destroyed the School of Electronics and Computer Science at Southampton University?  (Ballardian picture above from Dr John Bullas)

I'll blog about our wider institutional emergency planning separately.  For this post let's consider what we could do to maintain a Web presence if circumstances conspire to cut off our Internet connection, or there's a major IT systems failure.  We live in straitened times, so I'll frame this in monetary terms!



Option 1 - Dedicated server with hosting company (~£6,000/year)

Let's take a look at some sample pricing from one of the market leaders, RackSpace, for a dedicated Linux server.  This works out as around £500/month for a fairly basic server (Quad core 2.5GHz Opteron with 2GB RAM and 2 x 250GB mirrored SATA drives, with a monthly bandwidth allocation of 1TB).  So, for some £6,000/year we could could have our own DR server in the clouds.  However, our actual bandwidth use is of the order of 3.5TB/month peak outbound, so the bandwidth figure would likely be higher. I'll also note that those SATA drives might have trouble keeping up with our peak loads of some 2.5m URL requests/day.

That's if we tried to replicate full our web presence - probably a key question here.

An emergency website could consist of a small subset of the normal presence, and key information has already been identified as part of the University's emergency planning work.  However, I contend that it would be difficult to predict exactly which additional information would be needed in an emergency, and that a minimal version of the institutional site would only be appropriate for a very short period.  We also need to keep in mind that in a disaster scenario the demand for the website is likely to be significantly greater than on a typical day.

Option 2 - Community based hosting deal (£250/year to £6,500/year)

We'll look at JANET Web Hosting here although there are other community based options, notably Eduserv Hosting.  For an annual fee of some £250 per virtual machine the JANET hosting contract with RM provides 5GB of storage on a virtualized RedHat Linux or Windows platform.  There are no bandwidth charges for this service and there is no bandwidth quota at present.

However, each additional 5GB of storage is chargeable at a rate of £200/5GB. This is where the discussion about whether to replicate the full site or just specific content becomes more significant.  It's interesting to note that Loughborough's 160GB of web content would take the bill up to around £6,500/year, comparable with a dedicated server.

Option 3 - Reciprocal arrangement with peer institution (~£3,500/year)

At the recent Institutional Web Managers' Workshop 2010, UCL's Jeremy Speller spoke about emergency communications.  In addition to some trenchant observations about the potential of technology such as Twitter in an emergency, Jeremy noted that there would be some logic in institutions working together on a bilateral basis to host backup servers and services for each other. There is a degree to which this already happens with infrastructure services such as DNS secondaries and NTP peers for time synchronisation.  Some organizations have already gone further.  For example, we have a long standing agreement with several other institutions to come to each others' aid in a disaster situation. The expectation is that this would be likely to include everything from technical assistance to Internet connectivity and server hosting.

At first sight, this reciprocal option might seem like the most cost effective route to take - most of the infrastructure is already in place and paid for, after all.  However, if we are talking about a physical server then this will need to be networked, powered and cooled.  It's often observed that these ancilliary costs can exceed the price of the server hardware.  For our purposes a suitably spec'd enterprise class server (e.g. HP DL380) could bought for around £8,000, with an expected five year lifespan.  So, let's call the total cost £16,000 over five years, or some £3,500/year (£7,000 if you consider both institutions). In a bilateral agreement such as this both parties would ultimately be contributing a four figure sum, even if this was difficult to quantify due to the vagaries of utility charging, power metering etc.

Of course this figure could be reduced by re-using old hardware or using cheaper hardware, but there would be a concomitant increase in the costs to both institutions of dealing with with failures.  Staffing costs associated with a flaky system could easily dwarf facilities costs for server hardware and hosting.

The above notwithstanding, momentum in the industry is very much towards server virtualization.  This potentially offers much lower ongoing costs for hosting, and many institutions have already virtualized large proportions of their server estate.  However, one would expect that dedicated cloud hosting providers would enjoy the best economies of scale and be able to pass these on to their customers.  Let's see how this could work out in practice...  [And don't forget that we are only expecting to run our DR website live for a few days while normal service is restored!]

Option 4 - "Best of breed" cloud hosting (£100 to £5,500)

Cloud hosting tends to come in one of three flavours:
  • Software as a Service, such as Google Apps, Microsoft Live@edu and Salesforce.com - typically a subscription service delivered via a website
  • Platform as a Service, such as Google App Engine or Microsoft Windows Azure - giving you an API to write against for hosting your application in the cloud
  • Infrastructure as a Service, such as Amazon's Elastic Compute Cloud (EC2) or Rackspace Cloud - giving you a virtual machine from a library of operating systems and preconfigured appliances. Typically you are charged on a pay-as-you-go basis for bandwidth and CPU capacity, though you often have the option of pre-paying for anticipated usage at lower rates 
For this blog we'll assume that access to the underlying operating system is required in order to run scripts and manage the more complex aspects of the web server config.  This leads us in turn to Infrastructure as a Service.  We'll use Amazon EC2 as our example for this one.

A "small" EC2 Linux instance specification has enough storage (160GB) to be comparable with our main web server, although it may be resource constrained in other areas - 1.4GB RAM, CPU resource equivalent to a single core Xeon clocked at 1.2GHz.  This costs $0.11/hour while it's active.  It's presently free to upload material to EC2, but outgoing bandwidth at our typical usage rates would be charged for at $0.18/GB.  So, if we had uploaded all 160GB of content to an EC2 instance, and had to run off this site for DR purposes for about a week, our bill would be some $18 for CPU usage and $160 for bandwidth (assuming around a quarter of our monthly 3.5TB data transferred) - or just over £100 at current exchange rates.

Now for comparative purposes let's imagine that we wanted to run our web server off EC2 full time, 24x7x365.  It's possible to pay Amazon upfront for a "reserved instance", which dramatically reduces the cost per CPU hour.  If our peak time bandwidth requirements were maintained, the total for this would be around £5,500/year, so comparable with (if not slightly cheaper than) a more traditional hosting approach.  [All that's missing from the picture here is an academic discount rate ;-]

I'll note in passing that exchange rates could change dramatically, in our favour or against us, and that the true picture for Amazon is a little more complex - e.g. for persistence, storage would likely be done via Elastic Block Store.   The Amazon offering is also particularly interesting given the recent availability of the Amazon Virtual Private Cloud, which allows you to host institutional IP addresses in the cloud via an IPSEC tunnel between Amazon and your organization.

We'll be aiming to trial some of these options in the near future, so watch this space for further developments...