So, let me get this straight

I have been hearing about Docker. I am not sure I get it yet. Can I confirm a few initial impressions?

  • Most people run everything as root in a container.
  • Templates for docker containers are called “images”
  • People who want to run RoR in a container start with an image that is NAMED after ruby and the version they have
  • No one runs stuff like rvm in a container
  • The problem of credentials hasn’t really been solved
  • A container is somehow lighter than a VM, but in ways that no one really talks about ever

Am I off to a good start? Or am I having lots of crazy misconceptions?

I guess perhaps I could start by trying to forget everything I know, but I have been programming for 35 years, and I value my experience to much for that. Perhaps I am just too old for docker, and I should go get a different career.

  • Most people run everything as root in a container.

Most people is probably a bad way to start that list. It’s not best practice to run apps as root, without regard to if the app is running inside or outside of a container.

Containers are akin to FreeBSD jails or Solaris zones, etc.

If I’m running PostgreSQL, I run it as the postgres user in Docker containers and outside of them. If I’m running RabbitMQ, I run it as the rabbitmq user in Docker containers and outside of them. Nothing changes because of docker. People who run apps as root in Docker are likely to be the same people who run apps as root outside of Docker.

  • Templates for docker containers are called “images”

Given how long you’ve been programming, think of the Golden Images of 20 years ago where you’d burn an app to a CD or create a R/O floppy for an app. It’s the same thing. Dockerfiles are like shell scripts that tell docker how to make your golden image for your app. That image is then immutable. If you want to release a new version, you create and run a new image.

  • People who want to run RoR in a container start with an image that is NAMED after ruby and the version they have

Usually the image/container is the app itself. Your ruby app called foo is run on an image called foo that may extend a OS level image that has RoR installed.

  • No one runs stuff like rvm in a container

I’m not a ruby person, so I can’t say for sure. I don’t know what rvm is or why one would run it.

  • The problem of credentials hasn’t really been solved

Which problem of credentials? Storing credentials is generally a bad idea. Systems like Hashicorp’s Vault are a really keen way to go when you’re dealing with Dockerized applications since you can ask for credential leases and the authentication information is only stored in your app’s runtime, not committed to a filesystem or image.

  • A container is somehow lighter than a VM, but in ways that no one really talks about ever

Apps running in containers are isolated by the system kernel to their own process space. They are not running inside a virtual server running inside a physical server. Take these terrible ascii representations as examples:

  Application
----------------
Operating System
----------------
 Virtual Machine
----------------
Operating System
----------------
Physical Machine

vs

[ Application ]
----------------
Operating System
----------------
Physical Machine

Where [ Application ] is the containerized application that appears to be running in its own operating system but instead is just the OS Kernel representing that to the application. The kernel provides isolation for the process space and the filesystem in a similar way to a VM, but without the overhead of a VM. You can run your app in an isolated way on hardware without having to run your app in an OS on a VM that’s running in an OS on a real machine.

FWIW I read your post as if you were either angry or frustrated with the technology stack, which is why I replied.

There are lots and lots of good resources that explain these concepts. It might be good to start with the basis of the technology that Docker uses in Linux, which is based upon LXC. I’d then look for blog posts or intro articles on Docker and don’t worry about what “everyone else” is doing. The best practices regarding security, etc don’t change because of Docker.

I’m going to answer several of these questions out of order. @gavinmroy has a good answer too.

Docker containers share the host system’s kernel and all hardware access (VMs run a duplicated kernel and have to emulate hardware drivers). Most typical Docker containers run a single process, and not a full Linux distribution init system and the pile of associated daemons.

There is the problem where applications have dependencies, but different applications have different and possibly conflicting dependencies. I like to think of Docker as a convenient way of packaging single tasks as parts of applications that include all of the application code and their dependencies. Then if you have two components but one only runs on CentOS 6 and one absolutely requires Ubuntu 16.04, you can run each in their own local container and connect them together.

There are a couple of “styles” of Docker containers. Three I’m used to seeing:

  1. A Docker image contains a complete, packaged application, including all code required to run it; if you change the code you need to rebuild the image (which is pretty quick typically)
  2. A Docker image only contains an application runtime, and it depends on the user providing the actual application via the docker run -v option
  3. A Docker image does contain an init system (frequently supervisord, rarely systemd) and runs multiple processes, most often including an ssh daemon and frequently a cron daemon

IMHO the first style is the best, but browsing the forum, I see many uses of the other two.

An image is in effect a binary snapshot of what you’re trying to run. There are more pieces than that, of course. You can start with an ubuntu:16.04 image, create a second image that includes that and a standard set of libraries, and then create a third image based on that that includes your actual application code. You can then run individual container instances based on any or all of these.

Docker includes a native docker build command which includes a shell-script-like system for building your own images. That can look like (my background is all Python and not Ruby)

FROM ubuntu:16.04
RUN apt-get install python2.7 python-pip
COPY myapp.whl /
RUN pip install /myapp.whl
CMD ["myapp"]

Put the myapp.whl file in the same directory, run docker build -t myapp . to build the image, and then docker run --rm -it myapp to run it.

That’s the easiest thing to do. This is less bad than it sounds since it’s difficult to escape the container, so you can’t steal the host’s root password, or launch more network services, or change kernel settings without explicitly being given permission.

It’s straightforward to RUN adduser and USER notroot in a Dockerfile, but one-off containers frequently don’t.

In my experience there are a lot of Docker issues that are best solved by a step before the docker build runs. Most “credentials” problems fall into this space. (My specific convention is that every image has a setup.sh script which, when run in an empty directory, produces the Dockerfile and any required files; so if you needed to git pull or curl with fixed basic auth, that setup script is the place to do it without causing the hostname/username/password to be baked into docker history forever more.)

I’m a ruby user, I’ve used both RVM and RBENV on my computers. They both aim to solve convoluted dependancies, and multiple projects on the same machine.
By the nature of ‘containers’ if you use one process (or at least project) per container, you have no use for multiple ruby versions. The great thing about docker is it doesn’t do this just for ruby versions, but also any other type of software, even with raw code itself.
An example, I build one docker image with ruby with my app server (passenger) and my code for a web app. When I want to launch that app, I simply launch that container - easily inject secret keys, environment variables, or whatever at run time.

Why is this cool? My staging, testing, and development image is literally the same container (operating system, and code) with the variable parts (secrets, RAILS_ENV, PASSENGER_ENV, etc) injected at runtime. That means you should have many less things differing between builds and runs.

I think lots of these are best solved AFTER the docker build runs, when you actually RUN the container you can have a different env-file for each environment and use the --env-file=some_file on each run, there you can have non-version controlled secrets, and other configs. :slight_smile:

I guess I meant “run their app” rather than “run everything”, and I guess I should not have watched the tutorials, because they seem to run their apps as root? Is there a tutorial you can recommend that runs the app as a user other than root?

I guess it seems a lot like the software world already has a lot of definitions of the word “image” and it’s a pet peeve of mine that we use the same word for many things and also sometimes many words for the same thing.

I had not read about Hashicorp’s Vault. Thanks. I will look into it.

I guess I am confused now about what is an image and what is a container. If I have an image that has ruby in it and a Dockerfile that installs a bunch of gems, ruby is in the image, but not the gems? Does that means that different containers using that image share the ruby but not the gems? I guess all of that stuff go in the “Application” layer in your ascii diagrams? How about if I install a shell? I guess that is in the “Application” layer?

I am still confused about what you are putting in the application and in the OS. If I do a docker run with an image that is for Ubuntu 16.04, and then do another docker run with an image for CEntOS, then, I guess I have 2 of what you are including in the OS. If I do docker run 2x with Ubuntu 16.04, then does it share the OS? And if the Dockerfile has some RUN directives, are those also shared? Or just the stuff that is in the ‘image’? Also, I am on a Mac, using the new mac app that runs docker, so I guess that is a layer or two added to your diagram. If I run df in a container, that disk space is shared or separate? If I run free, is that memory shared or separate?

Is there some example in which someone has created a VirtualBox VM and a Docker Container that do the same thing and shown the difference in resource usage? What % savings is expected/normal/reasonable for some example type of app?

I guess I wouldn’t say I’m “angry”, I am having a hard time with figuring out how to use Docker, and I would like to. I have been using Vagrant with VirtualBox, but our environment no longer works on my vagrant VMs due to not enough disk space on a dynamic virtual disk (because I used a box that had, in my opinion, an anemic disk size specified, and it seems like most/all published boxes have the same problem). Before solving that problem, I though I would look into Docker.

Thanks for taking the time!

I guess I am still trying to answer the main question: If I currently have 2 VMs in Virtuabox on Mac that each run a dev env for a complicated RoR app stack (RoR, memcache, spring, nginx, mysql, sphinx, etc), how likely is it that I will save anything by running 2 containers in the Mac Docker app? I guess also the side question that has come up is that I currently run one dev env directly on my mac, and I have the Ubuntu VMs configured very similarly to my mac environment and that is very nice, so I would like the 2 virtual environments to be such that I could have them very similar to the env on the mac. I use the 2 virtual environments to be able to work on more than one thing at a time, or to run tests while doing stuff on one of the other environments.

Hi Nroose!

For the root user vs non-root user, you can refer to the documentation (I have linked directly to the part about user).

But I agree with Gavin, this has nothing to do with Docker. Many tutorials online will show you how to install MySQL, ActiveMQ or I don’t know what and will demonstrate so with the root user. Running a process as root or not is a general security concern and is not typically addressed in tutorials. However, tutorials should probably point put further readings on security topics related to their application and we would have much fewer security issues.

I think of docker images as I think of VM images. That might be an imperfect way of picturing it, but I’m fine with it so far. So: I can use VM images to instantiate virtuals machines and I can use Docker images to instantiate containers. That’s really all there is to it. From a Docker image, you can start containers. Just experiment with public images, add your stuff to it (applications, new shells, rvm, name it) and create a new image from it.

I’m not sure how one would compare containers to VM in term of an absolute % of space or memory saved. I sure would be wary of anyone saying “you will save exactly X MB of memory”. How would you compare running 4 metal servers and instead running those servers as virtual machines? Savings will depend on the distro you are running, the amount of memory allocated, if that memory if fixed or dynamic, etc. Comparing VM to Docker have similar issues and a clear line can’t be drawn. Try running 500 ruby on rails VM on your machine, it will probably fail. Run 500 RoR containers instead.

Basically, you will save space if there is any overlap at all between the OS images of the 2 VMs. Docker repositories (i.e., disk images) are stored in layers, and common layers are shared between repositories with common ancestry. So if you build two distinct repositories with the same base (e.g., “FROM ubuntu:16.04”), they will share at least those layers. If they are essentially the same OS image, even more is shared.

You’ll also save performance-wise since everything executes within the bare-metal kernel and CPU rather than having to negotiate virtualized hardware and a hypervisor.

Each container has it’s own filesystem by default. When something is installed on the container, it is stored and destroyed with that container (you can change this). However a dockerfile actually builds that ‘stuff’ like gems into IMAGE layers. You can create a shared filesystem for containers so that multiple containers share certain areas of a file system. Here is a great tutorial for data containers.

Can I recommend you start with this tutorial on docker-compose it shows you a usecase for rails and docker. It’s really easy, and for complex environments like you have it’s amazing. Also it will actually enable you to do your test/staging/even dev on the same or VERY similar environments - and easily. Here is a docker-compose example with someone useing RoR, postgres, memcached, and redis. On a production machine each image can be shared by many containers, so you could have many rails apps hooked into one memcached/redis. Here’s the cool part though, when you have your apps container’ed up - it so easy to set up and tear down these complex environments just to do simple testing and dev. Just try it and get back to us.

OS containers share the kernels of the operating system. They may also share the same image (in your case with Ubuntu 16) they share the same image but on creation they are not polluting each others namespaces, and share resources only in the way that they share the same computers resources. You can also limit resources based on container, so each of those two OS could be given different memory and CPU limits. Here is a really interesting article on OS containers, and application containers this will answer almost all the questions you asked about containers, images, sharing, etc.

Thank you all for taking the time to reply! I will look into this further over the next few weeks. I hope that I will get performance/resource benefits, but I am more sure that I will get more benefit of being able to reproduce the containers I want.

1 Like

i am having a problem after running as a root and using to deploy to aws . can you help me in that