The case for an OpenStack Public Cloud WG

When OpenStack was launched five years ago, public and private clouds were equally important. The first two users were NASA and Rackspace, representing the private and public use cases, and AWS API compatibility was an important feature. People were starting to (mis)use the H word, referring to “hybrid clouds” when they really meant “hybrid applications”, and OpenStack held out the promise of hybrid solutions based on public and private OpenStack clouds. A year later, this was one of the reasons advanced by Rackspace and others for deprecating AWS compatibility in favor of a “more advanced” OpenStack API model.

Fast forward to last week, when the OpenStack Operators’ Midcycle meeting took place in Palo Alto. I borrowed 15 minutes from the session of the Large-scale Deployment Working Group to argue for the creation of a group — full-blown WG or SIG — focussed on Public Clouds. I based this on the fact that there are a number of important use cases and functional requirements which are specific to public clouds, and which are not represented in any other working groups. Here are four examples:

  • Legal tenancy. In a public cloud, the tenants are generally independent legal entities from the cloud service provider (CSP). What is the contractual model for tenancy — who has what rights over what resources? What happens if a tenant is a “bad actor”, if their activities attract the attention of law enforcement or other agencies? What does the CSP have to do about lawful intercept, digital forensics, sequestration, or other actions? And even if OpenStack isn’t going to implement such things, do we need to (e.g.) extend the life cycle models for instances, volumes, and other resources to support them?
  • Multitenancy. Most public cloud customers want to run multiple applications in the cloud, sharing resources between them, and to do so in a way that is completely hidden from other customers. There is also growing interest in cloud service resale and brokerage, and in the use of federation to support multiregion and hybrid deployments. The hierarchical multitenancy (HMT) work in Keystone looks on the surface to be ideal for this purpose, supporting multiple projects per domain. Unfortunately the work is incomplete — the resource name spaces don’t support arbitrary hierarchies, and administrative delegation is broken — and none of the other OpenStack projects have incorporated public cloud style HMT in their plans.
  • Service assurance. OpenStack supports a variety of test and certification frameworks, from Rally and Tempest to Refstack. These are great for acceptance testing, but none of them is suitable for the kind of continuous service assurance needed for a public cloud. Rather than reinventing things from scratch, it would be very useful if existing tests could be integrated into a framework that could be run continuously, from a tenant’s perspective (i.e. outside the firewall), providing real-time information on service availability and latency for both CSPs and users.
  • Billing. When Ceilometer was introduced, it promised the ability to capture both billing data (resource consumption metrics) and near-real-time behavioral data (for use in elastic provisioning, load balancing, and application monitoring). Unfortunately, this “converged” approach overlooked the significantly different requirements for each. For billing data, we need to emphasize completeness and accuracy, together with long term storage supporting audits with non-repudiation. Behavioral data is latency sensitive, bursty, and ephemeral. The highest priority is to route the data to the control system which consumes it, so that the system can respond quickly. Late data is useless. These requirements are sufficiently different that no single system can adequately support them both, particularly at scale. Ceilometer might be sufficient in a private cloud, where “billing” is based on best-effort chargebacks using virtual money, but in a public cloud we’re dealing with real money and legal contracts.

It’s important to remember that all of the CSPs based on OpenStack have already done a lot of work to address these issues, as well as many others. Unfortunately they’ve all had to solve them in isolation, often by wrapping OpenStack mechanisms in proprietary software systems or by forking OpenStack. There has been little attention paid to these issues in the developer community. This is not a unique situation; it has historically been difficult for the “voice of the customer” to get a hearing in the OpenStack community. But things are changing. The User Committee and its associated working groups are becoming more active, and the Product Working Group (of which I’m a member) is making progress in building a roadmap by capturing requirements and transforming them into blueprints and resource commitments. It’s tempting to see this as a “pivot” from a developer-centric community to one that includes all of the stake-holders; we’ll see how it goes.

In any case, the reaction to my proposal for the creation of a Public Cloud group was uniformly positive, and most people recommended that we structure it as a full-blown “WG” under the User Committee. So I’m inviting everyone to discuss this over the next two months, so that we can submit a proposal to the User Committee at the Tokyo Summit.

Today, there is a widespread view that the future of cloud computing is hybrid: distributed applications, most developed using PaaS frameworks, incorporating various SaaS services, and continuously deployed using container technology into public and private IaaS infrastructure. Many of these deployments will be heterogeneous, using different technologies. However there is significant opportunity for innovation and advantage — especially in connectivity, agility, and security — when the same stack is used by different participants. Cisco’s Intercloud architecture provides a compelling vision using public clouds, federated partner clouds, and managed private clouds. All of this is a great reason for making sure that OpenStack can support state-of-the-art public clouds.

Two essential talks from the Vancouver OpenStack Summit

If you care about the evolution of OpenStack, there are two talks from the Vancouver Summit which you need to watch. First, Randy Bias’s State of the Stack:

Then Openstack Is Doomed And It Is Your Fault by @termie (Andy Smith):

This is not about what’s right or wrong (although both Randy and Andy get a lot of things right). It’s more about:

  • What did we set out to do?
  • What are we actually doing?
  • What do we tell ourselves (and others) that we’re doing?
  • What should we be doing?

Hybrid is the new normal

This morning I took part in a panel discussion on the subject “Cable’s cloud forecast: More apps and infrastructure“. It was held at the annual cable industry engineering forum, the SCTE Expo, in Denver, which meant that the audience was very heterogeneous. (Far more so than most software conferences.) The moderator, Comcast CTO Tony Werner, mentioned that I was wearing my Google Glass, so of course I had to take a picture of the audience:

During the discussion I emphasized the fact that hybrid application patterns were going to be the norm, and that the biggest challenge would be adapting both business and operational decision making and governance to catch up with the speed of the cloud.

CED posted a summary of the panel discussion here. It seemed to go down pretty well.

What I’m doing at Cisco: unpacking CCS

Last December I joined Cisco, and over the last nine months I’ve frequently been asked what my role is here. I didn’t say much about it, mostly because I was still figuring things out. However at this point everything looks pretty stable, and I’m pretty happy about nailing my colors to the mast.

In one sentence: I’m the OpenStack architect in Cisco Cloud Services (CCS), which is a Federated, Multi-tenant, Intercloud Service. Let me unpack that mouthful, from right to left.

First, Service. I’m working in the Cisco Services organization, and CCS is first and foremost a service, built and operated by Cisco. Other parts of the company are working on cloud-related products, including our new joint initiative with Red Hat. Still others work on upstreaming OpenStack plugins and drivers for various Cisco networking products. Our group is laser-focussed on building and operating a service, not selling products.

Second, Intercloud. CCS is a cloud service, similar to that provided by other cloud service providers. It’s based on the OpenStack IaaS architecture, to which we are adding various capabilities and services to meet the Cisco Intercloud hybrid cloud vision described by Rob Lloyd and Faiyaz Sharpurwala earlier this year. We’re using Cisco’s UCS converged infrastructure together with the Application Centric Infrastructure fabric from Insieme. And we’re building a cloud application marketplace which will provide access to CCS, partner applications, and Cisco SaaS services for our partners.

Third, Multi-tenant. Originally CCS was developed to support Cisco SaaS applications such as WebEx and EnergyWise. This involved building out a private cloud service in several global data centers, with a shared backbone network, while leveraging Cisco IT services. In March, we pivoted, extending CCS to include a variety of non-Cisco partners. Some will use CCS to extend their own hybrid cloud operations; others plan to resell a “white label” CCS to their own customers. Some CCS regions will be deployed in Cisco data centers and others in the facilities of our partner, such as Telstra, but they will all be owned and operated by Cisco. Every region will be fully multi-tenant, hosting workloads from any of our partners. Virtual machines from Telstra and its customers will run side-by-side with VMs from WebEx, with full security and compliance.

Fourth, Federated. To make all of this work requires a deep integration with our partners. Hybrid operations are complex, especially in the areas of network integration, global scale, service assurance, capacity management, OSS/BSS and identity management. Cisco and its technology partners are investing heavily in delivering these capabilities, which go far beyond what a generic OpenStack cloud provides.

So we’re building a state-of-the-art cloud service. We’re using Cisco technologies, and collaborating with Cisco partners such as Red Hat and Citrix, but at the end of the day our goals is to deliver a world-class service as a “black box”. As the Cisco CTO, Padmasree Warrior, made clear 5 years ago, we are not going head to head with Amazon. You can’t go to and sign up for public cloud services from Cisco. But almost everybody will wind up consuming CCS services through our partners, leveraging the global reach, federated integration, and network capabilities that we’ll bring to bear. And because of our business model, CCS has to deliver all of the capabilities of a public cloud, and then some.

Why are we doing this? Isn’t the global cloud business pretty much sewn up? I don’t believe so. True hybrid clouds – “Interclouds” – are challenging, and most of the complexity lies in the networking. I think that Cisco has a huge opportunity, because enterprises and service providers view us as a trusted partner who can help them to solve the problems of hybrid integration, and do so in ways that other cloud service providers cannot. In earlier blog posts, I came to the conclusion that the sweet spot for OpenStack was in supporting SaaS workloads. CCS starts with this and builds on it in a way that I think has compelling business value. So that’s what we’re doing. And it’s insanely exciting.

Cloud M&A: Irrational exuberance vs. analysis

So Eucalyptus is to be swallowed up by HP, in what remains one of the most ambiguous deals in the fashion industry of cloud computing. As I tweeted in response to several contradictory assessments:

@geoffarnold: @hui_kenneth @brianmccallion That’s why the studied ambiguity in the HP presser was so interesting. Should generate many tweets….

Probably the most extreme interpretation was from Simon Wardley (why am I not surprised), who saw it as validating his assessment of OpenStack:

I never expected HP to have the wits for this… that’s such a blinding move. Very impressed…This is a good play by HP – a lot better than relying on OpenStack… you know my opinion on OpenStack, it hasn’t changed – collective prisoner dilemma etc.

Others hailed this as a brilliant move by HP, even though there was absolutely no information provided on how (or if) HP was going to use the Eucalyptus technology. Some assumed that HP would offer both Helion and Eucalyptus; others such as Ben Kepes concluded that it was an acqui-hire that signalled the failure of Eucalyptus.

But will HP have the freedom to keep going with Eucalyptus, either as a parallel effort or as a source of AWS compatibility features for Helion? Barb Darrow explored that, and found varying opinions on whether the AWS API license that Amazon granted Eucalyptus would survive the takeover. Lydia Leong seems to think that HP has some latitude in this regard.

Personally, I think that this is likely to turn out as a pure acqui-hire. Marten is an excellent choice to lead HP’s cloud efforts, particularly after Biri’s departure and the reboot that we saw at the Atlanta Summit. The idea of adding significant AWS compatibility to OpenStack is an idea whose time has past. Readers of this blog will know that I was a strong supporter of this, but it would have required a community-wide commitment to limit semantic divergence from AWS. (Replicating the syntax of an API is easy; it’s the semantics that cause the problems.) I suppose it’s possible that HP might try to contribute a new Euca-based AWS compatibility project to OpenStack, but I doubt that the community would be very receptive…

PS For me, the biggest surprise is that it was HP that made this move. I half expected IBM to grab Eucalyptus and use it to transform SoftLayer into an AWS-compatible hybrid of Eucalyptus and CloudStack, rather than the current hybrid OpenStack-CloudStack. I guess I should stick to my day job.

UPDATE: Check out this GigaOm piece, including a podcast with an interview with Marten. I’m listening to it now.

UPDATE: Marten’s positioning Eucalyptus as a value-added contribution to OpenStack. And within a couple of minutes he (a) said that one of the values he brings is that he’s not afraid to point out the weaknesses in OpenStack, and (b) declined to express any criticism. Oh well. And now the previous Martin (Fink) is hand-waving about AWS as a design pattern. Sigh.

“Don’t give up on OpenStack”? I’m trying not to…

Over at Information Week, Andrew Froehlich pleads “Don’t Give Up On OpenStack“. And I’m not. But as I commented, the change that is needed conflicts with one of the deepest impulses in any community-based activity, from parent-teacher organizations to open source software projects:

The hardest thing for an open source project to establish is a way of saying “no” to contributors, whether they be individual True Believers in Open Source, or vast commercial enterprises seeking an architectural advantage over their competitors. The impulse is always to do things which increase the number of participants, and it is assumed that saying “no” will have the opposite effect. In a consensus-driven community, this is a hard issue to resolve.

(My emphasis.)

“Ubiquitous standard” vs. “open source”

Last week I wrote the following on FaceBook:

I need to write a blog post about open source: specifically about those people who respond to any criticism of a project by saying, “The community is open to all – if you want to influence the project, start contributing, write code.” Because, seriously, if you want your project to be really successful, that attitude simply won’t work. It doesn’t scale. You want to have more users than implementors, and those users need a voice. (And at scale, implementors are lousy proxies for users.)

Maybe this weekend.

This is that blog post. It has been helped by the many friends who added their comments to that FaceBook post, and I’ll be quoting from some of them.

Over the last few months, I’ve been trying to figure out why I’m uncomfortable with the present state of OpenStack, and whether the problem is with me or with OpenStack. Various issues have surfaced – the status of AWS APIs, the “OpenStack Core” project, the state of Neutron, the debate over Solum – and they all seem to come down to a pair of related questions: what is OpenStack trying to do, and who’s making the decisions? And my discomfort arises from my fear that, right now, the two answers are mutually incompatible.

First, the what. That’s pretty simple on the surface. We can look at the language on the OpenStack website, and listen to the Summit keynotes, and (since actions speak truer, if not louder, than words) see the kind of projects which are being accepted. OpenStack is all about building a ubiquitous cloud operating system. There is to be one code base, with an open source version of every single function. And although the system will be delivered through many channels to a wide variety of customers, the intent is that all deployed OpenStack clouds will be fundamentally interoperable; that to use the name OpenStack one must pass a black-box compatibility test. (That’s the easy bit; the more fuzzy requirement is that your implementation must be based on some minimum set of the open source code.) This interoperability is motivated by two goals: first, to avoid the emergence of (probably closed) forks, and second to enable the creation of a strong hybrid cloud marketplace, with brokers, load-bursting, and so forth.

This means that the OpenStack APIs (and, in some less well defined sense, the code) are intended to become a de facto standard. In the public cloud space, while Rackspace, HP, IBM and others will compete on price, support, and added-value services, they are all expected to offer services with binary API compatibility. This means that their customers, presumably numbering in the tens or hundreds of thousands, are all users of the OpenStack APIs. They have a huge stake in the governance and quality of those APIs, and will have their opinions about how they should evolve.

How are their voices heard? What role do they play in making the decisions?

Today, the answers are “they’re not” and “none”. Because that’s not how open source projects work. As Simon Phipps wrote, “…open source communities are not there to serve end users. They are there to serve the needs of the people who show up to collaborate.” And he went on to say,

What has never worked anywhere I have watched has been users dictating function to developers. They get the response Geoff gave in his original remarks, but what that response really means is “the people who are entitled to tell me what to do are the people who pay me or who are the target of my volunteerism — not you with your sense of entitlement who don’t even help with documentation, bug reports or FAQ editing let alone coding.”

As I see it, there are two basic problems with this thinking. First, it doesn’t scale. If you have hundreds of thousands of users, of whom a small percentage want to help, and a couple of dozen core committers, the logistics simply won’t work. (And developers are often very bad at understanding the requirements of real users.) Maybe you can use the big cloud operators and distribution vendors as proxies for the users, but open source communities can be just as intolerant of corporate muscle as they are of users who don’t get involved. Some of that can be managed — much of the resources that Simon terms “volunteerism” actually comes from corporations — but not all.

But the second problem is that the expectation of collaborator primacy is incompatible with the goal of creating a standard. The users of OpenStack have a choice, and the standard will be what they choose. If the OpenStack community wants their technology to become that standard, they must find a way to respect the needs and expectations of the users. And that, quid pro quo, means giving up some control, which will affect what the community gets to do — or, more often, not do.

This is a much bigger issue for OpenStack than for other open source projects, for a couple of reasons. Several of the most successful projects chose to implement existing de facto or de jure standards — POSIX, x86, X Windows, SQL, SMB, TCP/IP. For these projects, the users knew what to expect, there were alternative (non-FOSS) alternatives, and the open source communities accepted the restrictions involved in not compromising the standards. Other projects were structured around a relatively compact functional idea — Hadoop, MongoDB, Xen, MemCache, etc. — and the target audience was relatively small: like-minded developers. OpenStack is neither compact, nor (by choice) standards-based. It has a large number of components, with a very large number of pluggable interfaces. The only comparable open source project is OpenOffice, which had a similarly ambitious mission statement and has also had its share of governance issues.

One obvious step towards addressing the problem would be to distinguish between the “what” and the “how” of OpenStack. Users are interested in the APIs that they use to launch and kill VMs, but they are unlikely to know or care about the way in which Nova uses Neutron services to set up and tear down the network interfaces for those VMs. Good software engineering principles call for a clear distinction between different kinds of interfaces in a distributed system, together with clear policies about evolution, backward compatibility, and deprecation for each type. This is not simply a matter of governance. For example, a user-facing interface may have specific requirements in areas such as security, load balancing, metering, and DNS visibility that are not applicable to intrasystem interfaces. There is also the matter of consistency. Large systems typically have multiple APIs — process management, storage, networking, security, and so forth — and it is important that the various APIs exhibit consistent semantics in matters that cut across the different domains.

The adoption of some kind of interface taxonomy and governance model seems necessary for OpenStack, so that even if community members have to relinquish some control over the user-facing interfaces (the “what”), they still have unfettered freedom with the internal implementation (the “how”). Today, however, we are a long way from that. OpenStack consists of a collection of services, each with its own API and complex interdependencies. There is no clear distinction between internal and external interfaces, and there are significant inconsistencies between the different service APIs. At the last OpenStack Summit I was appalled to read several BluePrints (project proposals) which described changes to user-facing APIs without providing any kind of external use-case justification.

The present situation seems untenable. If OpenStack wants to become an industry standard for cloud computing, it will have to accept that there are multiple stakeholders involved in such a process, and that the “people who show up to collaborate” can’t simply do whatever they want. At a minimum, this will affect the governance — consistency, stability — of the user-facing interfaces; in practice it will also drive functional requirements. Without this, traditional enterprise software vendors delivering OpenStack-based products and services will have a hard time reconciling their customers’ expectations of stability with the volatility of a governance-free project. The bottom line: either the community will evolve to meet these new realities, or OpenStack will fail to meet its ambitious goals.

UPDATE: Rob Hirschfeld seems to want it both ways in his latest piece. Resolving the tension between ubiquitous success and participatory primacy doesn’t necessarily require a “benevolent dictator”, but that’s one way of getting the community to agree on a durable governance model.

Whither OpenStack: A footnote on scale

My blog yesterday Whither OpenStack was already too long, but I wish I’d included some of the points from this outstanding piece by Bernard Golden on AWS vs. CSPs: Hardware Infrastructure. You should read the whole thing, but the key message is this:

Amazon, however, appears to hold the view that it is not operating an extension to a well-established industry, in which effort is best expended in marketing and sales activities pursued in an attempt to build a customer base that can be defended through brand, selective financial incentives, and personal attention. Instead, it seems to view cloud computing as a new technology platform with unique characteristics — and, in turn, has decided that leveraging established, standardized designs is inappropriate. This decision, in turn, has begotten a decision that Amazon will create its own integrated cloud environment incorporating hardware components designed to its own specifications.

As you read Bernard’s piece, think about the architecture of the software that transforms Amazon’s custom hardware into the set of services which AWS users experience. It has about as much in common with, say, OpenStack’s DevStack (or, too be fair, Eucalyptus FastStart – sorry, Mårten!) as does a supertanker to a powerboat.
Container Ship-1
In this world, you can’t start small and then “add scale”; the characteristics needed to operate at extreme scale become fundamental technical (and business) requirements that drive the systems architecture.

This is the challenge that Rackspace, HP and others face. Their core software was designed – is, continually, being evolved – by a community that doesn’t have those architectural requirements. This is not a criticism; it’s simply the reality of working in a world defined by things like:

  • Tempest should be able to run against any OpenStack cloud, be it a one node devstack install, a 20 node lxc cloud, or a 1000 node kvm cloud.

and this piece on OpenStack HA (read through to see all of the caveats). I know the guys at HP and Rackspace: they are smart, creative engineers, and I’m sure they can build software as good as AWS. The question is, can they do so while remaining coupled to an open source community that doesn’t have the same requirements?