Announcing AutoClone: zero-copy data distribution for Docker

Efficient automated data distribution for containers

Containers are a great way to isolate application run-time dependencies and state from the host platform. They drastically simplify application deployment and provide efficient resource utilization when compared to virtual machines. As a result, containers are quickly becoming standard building blocks for scale-out applications and microservices-based architecture. However, this “new stack” approach re-introduces many traditional computer science problems.

One of the more vexing classes of problems involves coordinated state between containers. Here are a few frequently asked questions:

how can I atomically update configuration for distributed containers?
how can I distribute large data sets to distributed containers?
how can I revert container state to a previous point in time?

Some coordinated state problems can be solved with a distributed key-value store like etcd (e.g., service discovery). However, managing a clustered system introduces a new level of complexity. Also, many distributed solutions have trouble dealing with large datasets and complex transactional updates. For many applications, there is a better approach: thin clones.

What is a thin clone?

A thin clone is a virtual disk that records changes relative to a basis disk. If you are new-school, think of a git branch. If you’re old-school, think transparency paper on an overhead projector.

A thinly provisioned clone provides optimal resource utilization and low provisioning latency: storage overhead is nil and creation takes milliseconds. Clones also have the added benefit of allowing an application to make isolated changes that can be reverted instantaneously. And, in the case of Blockbridge, these changes are also cryptographically isolated from the basis itself!

An application example

Suppose you have a producer/consumer application model where the producer renders a database of inventory or market data at scheduled intervals. You can easily implement a publish/subscribe architecture for distributing this data to your web-tier applications. At scheduled intervals, just restart your web-application container to get the most recently published data. Using clones, it doesn’t matter if the data sits in a filesystem, a flat file or a database. Also, data size isn’t a factor: a MiB or a PiB, it’s all the same.

Introducing AutoClone

AutoClone is the latest addition to our “Storage As A Container” portfolio. AutoClone technology is now integrated into our open source Docker volume driver. Clones are administered like any other persistent volume: natively from the docker runtime. No additional software needed.

Stay Tuned: video demonstrations with OpenStack, Machine, Swarm and Compose are on the way.