Docker image optimization

Hi, I want to create an optimized docker image on Linux for example, at the same time I want to write an algorithm that will:

  1. retrieves information from the system on the docker image on what the program requirements are, what the system can provide.
  2. Apply patches conditionally accordingly. (It will not download packages from scratch, but will use those that are on the operating system, so as not to duplicate them).
  3. created an optimized docker image.

What do you think about such an idea, I am new to this and need guidance, advice. I created such a preliminary plan for myself and need opinions. + Is the creation of such an algorithm at a high level?

You mean you want to not install nginx in a container image if nginx is already installed on host?

What I mean is that when I install nginx in a container image, I don’t want to additionally install libraries or packages from the repository when it is already installed on the host. (It will not download packages and libraries from the repository, but will use those that are on the host so as not to duplicate them).

Even if it were possible, that is not a good idea. A container has to contain all the dependencies. Not to mention that the required libraries can be different on each distro. For example Alpine is not compatible with Debian.

We use containers to contain everything. And the only kernel namespace that you can’t disable in a container is the mount namespace, so all containers will have their own filesystem and it would be pretty hard to individually mount every single library one by one.

So this is not something that should be “optimized” in an image. Make sure you don’t install what is not necessary. Use multi-stage builds, copy the result to a final image. If it is possible, use scratch image with a single binary in it, and so on.

By the way if you are using Docker Desktop as you chose your category, your host where Doker is running is not the same where you run the docker client. So you would try to mount libraries from outside a virtual machine into a virtual machine which is even worse.

1 Like

OK, but this is my college project, so I have to complete it :wink:
this is exactly the topic:
In the practice of developing application images and containers, default settings are used to enable the application image to interoperate with the widest possible set of servers. With the popularization of containerization, this phenomenon has resulted in increased disk and memory requirements. This is a motivation to develop a system that would generate application containers based on adapting the software package to programs already available in the system that meet the dependencies of the generated container.
The work will present the development of a system enabling the customization of the created container, maximizing the use of dependencies existing in the system. This will enable the creation of customized containers that can be deployed in multi-machine environments.
Now, I am not sure it will be good for me.

Containers are used to isolate an application and supply all necessary dependencies in the right version.

When trying to use two apps, you don’t have the issue that they might require different versions of the same library. Plus up- and down-grading is usually seamless.

If you want it smaller, you could just install the application the regular way on Linux.

It is still not a good idea and probably impossible to do it correctly. If it is possible somehow, you will create a new container engine, and won’t use Docker. If you manage to do that, you will probably have a company in the end which will be bought by Docker Inc eventually :smiley:

Maybe the task is not clear enough and it made sense in the mind of the person who gave it to you, but I don’t think we can help to solve it.

I understand correctly, but can you recommend a topic that would be interesting to implement and related to optimization? :slight_smile:

There are many articles about optimizing the size of Docker images as the size of an image affects how long it takes to pull or push it which also makes a CI/CD pipeline take longer which could cost more money in multiple senses. So I don’t think I could come up with something that nobody else thought about and solved it yet. You can use multi-stage build, organize the instructions properly to benefit from the build cache, use distroless images or even install a single binary from scratch that includes all the requirements in that binary.

Maybe you could somehow make an algorithm that recognizes incorrectly ordered instructions and write a command line tool or an IDE extension that suggests changes in the Dockerfile for the user, but that would be really hard to do as well, since a command installed from a repository could use a variable which wouldn’t be obvious for the scanner, so the definition of variables couldn’t be moved after that command, but as long as the scanner suggests and doesn’t do anything automatically, it wouldn’t do harm.

Or you could also try to detect when someone tries to remove something from the filesystem in a separate RUN instruction, but that would be hard too when the deletion happens somewhere in a script or in a binary. MAybe it would by possible by monitoring the system calls while building the image. Unless the process changed since I last checked, a RUN instruction starts a new container and executes the commands in the container. So if you could monitor the system calls in that container, you could noticde that nothing happened in the container only some files were removed. Which means those files were just hidden from the next layer which doesn’t make the image larger, but it doesn’t make it smaller either. If people can be notifued about that, they could move the deletion to the previous layer where the data was generated and remove that instantly. An example is when you run apt-get install which produces a cache and you remove the cache in a separate RUN instruction instead of removing it immediately in the same instruction where apt-get install ran.

I’m not sure these ideas are enough for yo as a project, but probably a little easier than the original idea.