Better support for bare metal server persistent storage in swarm mode

I need to run various databases using local NVMe SSD storage (for performance) in swarm mode. The support for bare metal volumes for docker services is very minimal in 1.12. I have been thinking how Docker could support host volumes better in swarm mode, and I have a few suggestions:

  1. Add support for creating a named volume (call them host volumes) on a specified node at a specified path using docker volume create (e.g., host=<hostname>,path=<hostpath>).

  2. Add support for adding environment variables (maybe as labels? or an env file stored in the volume root?) attached to host volumes. Maybe just look for a .dockerenvs in the volume root that are added automatically to any service that mounts the host volume.

  3. Add support to the swarm scheduler to run new tasks for a service that mounts a host volume only on the node that hosts the volume. If the service mounts more than one host volume where the volumes are allocated on different nodes, the tasks are not run, but fail with an error status.

  4. Services add the environment variables attached to any mounted host volume on task startup (so the container may configure itself using these environment variables). I want to standardize on how environment variables are attached to host volumes so Docker Official Images (like mysql) can be used and configured via the host volume rather than on the docker service create command line.

  5. Services that use host volumes only support replicas=1 (e.g., single task service scheduled on the node containing the host volume).

My use case for host volumes is to support MySQL Replication using local NVMe SSD storage. I want to be able to create a swarm of at least 3 workers and create an instance of the database on each of the 3 workers. The database replication topology would be defined by attaching different values for the environment variables that configure my MySQL image to each of the 3 host volumes (so 1 host volume starts as the replication master and the other 3 host volumes start as replication slaves).

I realize that I can achieve most of this using service constraints to schedule a single task on a specific node and run a custom image that configures itself using an env file in the mounted host directory so these new features aren’t strictly necessary.

But, I think I like the idea of defining volumes with initial container configuration data for using that volume when I create the volume and not when I define the service. Also, I like that idea that the swarm scheduler uses the node that hosts the volume to make the scheduling decision without having the command that creates the service specify which node it should run the service on. This would scale better in production as it allows the local SSD storage to be formatted/allocated to volumes when a new node is provisioned and not when the services are actually deployed on the cluster.