This is long. For the “tl;dr” crowd: Seagate just reinvented the disk interface using Ethernet, TCP/IP, Protocol Buffers, and key-value objects. And it’s really, really cool.
But if you have a few minutes to spare…
When I first got involved in data center networking in the early 1980s, there were several competing technologies. The two leaders were Ethernet and Token Ring, and although Bob Metcalfe had invented Ethernet, his first company 3Com actually sold both. Within a couple of years, economics, obstinacy by IBM, and a patent troll had taken Token Ring out of the picture, and Ethernet ruled. It quickly evolved from its shared-media topology: in 1987 SynOptics introduced the first Ethernet Hub, and two years later Kalpana broke the mold with the first Ethernet switch. Many of us concluded that whatever the future LAN technologies might look like, they would be called Ethernet.
The history of protocol stacks roughly paralleled that of LAN technology. In the early 1980s there were many candidates – NetWare, XNS, ARCnet, NETBEUI, OSI, AppleTalk, and others, as well as TCP/IP. By the end of the decade, TCP/IP had won. Some companies rehosted their application protocols on top of TCP/IP (I’m ashamed to say that my name is on the RFCs for NetBIOS-over-TCP), but most disappeared or pivoted away, like Novell.
Over the last 20 years, we’ve seen a steady process of convergence around Ethernet and TCP/IP. (Metro Ethernet is a fascinating and unexpected example.) Fibre channel was introduced in 1988 as a replacement for HIPPI in storage area networking. Twenty years later some companies tried to layer the FC protocols directly over Ethernet (FCoE). Most regard this as a failed experiment: although it slightly simplified cabling, the FC protocols were too inflexible to work well in a noisy LAN, and the lack of routability conflicted with data center networking practices. Instead, people started to experiment with storage protocols running over TCP/IP: iSCSI for block access, S3-like HTTP-based protocols for moving large objects around, and the perennial NFS and CIFS for file access.
One area that has so far remained untouched by this process of convergence is the connection between storage devices and computers. Even though the actual technologies have evolved – IDE, ATA, ATAPI, PATA, SCSI, ESDI, SATA, eSATA – the most common storage interconnection topologies are pretty much the same that IBM introduced with the S/360 mainframe in 1964: a controller device integrated into the computer, communicating with a small number of storage devices over a private short-range interconnect. The “private” bit is important; although various techniques have been created for shared (multi-master) access to the interconnect, all were relatively expensive, and none are supported by the consumer-grade drives which are often used for scale-out storage systems.
Historically, storage servers have been constructed as “black box” turnkey systems, from the Auspex NFS servers in the 1980s to the storage arrays from vendors like EMC and NetApp. More recently, people have been constructing interesting scale-out storage services from commodity hardware, using an x86 with a tray of consumer grade disks as a building block. However these architectures are constrained by the single point of failure and performance bottleneck introduced by the private interconnect between CPU and disks. (One odd consequence is that it is often hard to put together a economic “proof of concept” system, because the scale-out algorithms perform poorly with a small number of nodes.)
Over the years there have been various attempts at re-inventing this pattern. Most of these are based on the idea of moving more of the processing to the disk itself, taking advantage of the fact that every disk already has a certain amount of processing capacity to do things like bad sector remapping. Up until now, these efforts have been unsuccessful because of cost or architectural mis-match. But that’s about to change.
Yesterday Seagate introduced its Kinetic Open Storage Platform, and I’m simply blown away by it. It’s a truly elegant design, “as simple as possible, but no simpler”. The physical interconnect to the disk drive is now Ethernet. The interface is a simple key-value object oriented access scheme, implemented using Google Protocol Buffers. It supports key-based CRUD (create, read, update and delete); it also implements third-party transfers (“transfer the objects with keys X, Y and Z to the drive with IP address 22.214.171.124”). Configuration is based on DHCP, and everything can be authenticated and encrypted. The system supports a variety of key schemas to make it easy for various storage services to shard the data across multiple drives.
I love this design.
Don’t fall into the trap of thinking that this means we’ll see thousand upon thousands of individual smart disks on the data center LANs. That’s not the goal. (Or I don’t think it is.) EMC or NetApp can still use these drives to build big honking storage arrays, if they want to. The difference is that they have much more freedom in designing the internals of those arrays, because they don’t have to use one kind of (severely constrained) technology for one kind of traffic (disk data) and a completely different kind of technology for their internal HA traffic. They’re free to develop new kinds of internal topologies based on Ethernet, and to implement their services more efficiently using the Kinetic API.
For those vendors who are building out commodity-based scale-out storage, things are even more exciting. It becomes possible to build extremely scalable, highly-available configurations using commodity Ethernet switches. And the servers used to implement the external storage service – Swift, Gluster, Ceph, NFS – are likely to change, too: CPU, RAM for caching, multiple NICs, little or no PCI, a little SSD, – no moving parts. Perhaps someone will integrate one into a top-of-rack switch, to produce a very efficient dense array for cool or cold storage.
A bunch of very smart engineers at Seagate have developed this system (that’s Jim Hughes, allowing me to touch a prototype unit), but they know it won’t be accepted if it’s proprietary. So they’re opening up the protocol, the clients, a simulator for design verification. If everything works out, this will become the new standard interface for disk drives. (And, well, any kind of mass storage.)
This is going to be fun. “Disruptive” seems inadequate.