Picking an IaaS technology

Three years ago I was faced with an interesting challenge: How to build an AWS-compatible IaaS system that would be suitable for public and private cloud deployments, and could be delivered as a turnkey hardware-software solution? At the time, there was only one game in town: Eucalyptus. Unfortunately the 1.x code was what a colleague described as “student project” quality, and there was considerable confusion about licensing and governace.

Flash forward to today: what’s the situation? There are three major open source initiatives – Eucalyptus, OpenStack and CloudStack – together with a number of commercial products. Yet there is still no obvious solution for my problem. Ideally I want a cloud management system with the following characterics:

  • Solid software engineering. Robust implementation, clean language-independent internal APIs, versioning, continuous integration with 100% test coverage, upgradable without downtime.
  • Forward-looking. The system should work well with the best compute, networking and storage technologies that we can expect to become popular over the next 3-5 years. That implies pluggability at the right points.
  • Designed for scale now. One way to deliver a turnkey solution is using containerized data centers, which start at around 10,000 cores, so that’s a good minimum. And AWS-grade availability requires multiple locations. I’d prefer to deploy a subset of the total system in one data center rather than have to refactor and rearchitect to retrofit wide-area capabilities.
  • Native binary API compatibility with AWS. A common response to this is that it’s easy to work around incompatibilities using proxies or libraries, and there are certainly plenty to choose from. While there are technical objections to this approach, the fundamental issue is one of behavioral divergence. I may be able to use an AWSOME proxy to map an EC2 call into an OpenStack call; this doesn’t help me if the fundamental semantics of volume snapshots are different. The best way to avoid semantic incompatibility is to adopt the same API.
  • Integratable. (Apologies for the ugly neologism.) A commercial cloud management system is going to have to be integrated into a variety of customer environments with different requirements and practices, from billing to identity management. In some cases the integration points will align with existing API and plugin interfaces; in others, it may be necessary to rework or replace a major subsystem. Obviously this will depend on the software engineering methodology, but it will also be affected by the choice of open-source license. (Yes, GPL is a problem.)
  • Supports the refactored operational model that accompanies IaaS. As Brad wrote, this means that:

    …cloud architects and the devops crowd will gain primacy and control over the network. This trend is reflected already in the press releases from Nicira. […] It changes the customer power dynamic, putting the cloud architects and the programmers in the driver’s seat, effectively placing the network under their control. (Jason Edelman has begun thinking about what the rise of SDN means for the network engineer.) In this model, the network eventually gets subsumed under the broader rubric of computing and becomes just another flexible piece of cloud infrastructure.

    There’s a lot more to it than that, of course. The system must support the automation of infrastructure lifecycle management practices, compliance, RBAC, key management, and many other operational features.

So what are the choices?

  • I was at the OpenStack Conference in San Francisco last week, and OpenStack is still extraordinarily immature. It doesn’t even come close to meeting my software engineering requirements. The presence of HP and IBM will certainly improve things in this regards, but it’s unclear what this will do to the community. OpenStack is certainly forward looking – at least in the area of networking – and addresses many operational issues. However it’s not (yet) ready to scale, and the API story is unfortunate. The licensing model is such that one could build off the OpenStack source, but it would be difficult to contemplate this before the internal API and data model issues are resolved.
  • CloudStack is written in Java (good), but as Randy Bias wrote “1999 called and wants its application architecture back”. As Rich Wolski of Eucalyptus has pointed out, refactoring and packaging a large scale Java distributed system for Linux is a lot of work. The licensing through Apache is the best that money could buy, and it’s unclear how the community will evolve. (Will there be multiple commercial distros based on the Apache source? Citrix plans to fix the API issue, but the functionality has slipped.
  • Eucalyptus is back: they’ve done a good job of refactoring and reimplementing to enterprise standards. With the recent agreement with Amazon, the API story is as good as it can possibly be (which doesn’t necessarily mean as good as one might want). The GPL licensing is problematic: it seems unlikely that an independent distribution could get traction. (This is clearly MySQL rather than Hadoop.) Perhaps the biggest concern is that their focus seems to be on fairly small deployments into legacy data centers with minimal disruption. This may satisfy the existing customer base who want a safe incremental solution, but it means that optimizing the IaaS through infrastructure innovation will take a back seat.

There are some notable omissions in this piece. VMware? I’m sorry, vCloud doesn’t provide the kind of abstraction that’s needed for IaaS. An application shouldn’t know or care what kind of hypervisor, CPU or storage fabric it’s running over. Nimbula, Surgient, Enomaly, Cloupia… actually you can read the whole list at Cloudbzz, and as John says there “I hope you’ll pardon my dubious take, but I can’t possibly understand how most of these will survive”. (Some have already gone since he last updated the list.) I have good friends at many of these companies, and I hope they’re successful, but numbers – customers, partners, dollars – are important.

So what’s a turnkey cloud architect to do? I’m still mulling that over….

Comments are closed.