Way to redduce Doccker Image

After adding below 2 highlighted extract commmands after && , image size increased from 8.6 GB to 10.5 GB, is there any way to reduce it, If yes please suggest

# Get JDK's and place them into '/AzDOAgents' folder
RUN wget --http-user=$AUSER --http-password=$APASS /OpenJDK/Linux/OpenJDK11U-jdk_x64_linux_hotspot_11.0.15_10.tar.gz -P /AzDOAgents 
RUN wget --http-user=$AUSER --http-password=$APASS /OpenJDK/Linux/OpenJDK11U-jdk_x64_linux_hotspot-11.0.6_10.tar.gz -P /AzDOAgents
RUN wget --http-user=$AUSER --http-password=$APASS /OpenJDK/Linux/OpenJDK17U-jdk_x64_linux_hotspot_17.0.2_8.tar.gz -P /AzDOAgents \
&& tar -xvf /AzDOAgents/OpenJDK17U-jdk_x64_linux_hotspot_17.0.2_8.tar.gz -C /AzDOAgents
RUN wget --http-user=$AUSER --http-password=$APASS /OpenJDK/Linux/openjdk-21.0.2_linux-x64_bin.tar.gz -P /AzDOAgents \
&& tar -xvf /AzDOAgents/openjdk-21.0.2_linux-x64_bin.tar.gz -C /AzDOAgents
RUN wget --http-user=$AUSER --http-password=$APASS/OpenJDK/Linux/openlogic-openjdk-8u292-b10-linux-x64.tar.gz -P /AzDOAgents
RUN wget --http-user=$AUSER --http-password=$APASS /OpenJDK/Linux/openlogic-openjdk-8u432-b06-linux-x64.tar.gz -P /AzDOAgents

Please replace you screenshot with a code block of the actual text. Screenshots make it unnecessary hard to read the content. Personally I don’t look at them because of than. Make sure to follow this instructions while doing so:


Please, format your post according to the following guide: How to format your forum posts
In short: please, use </> button to share codes, terminal outputs, error messages or anything that can contain special characters which would be interpreted by the MarkDown filter. Use the preview feature to make sure your text is formatted as you would expect it and check your post after you have sent it so you can still fix it.

Example code block:

```
echo "I am a code."
echo "An athletic one, and I wanna run."
```

After fixing your post, please send a new comment so people are notified about the fixed content.


here is the docker file code:

# Get JDK's and place them into '/AzDOAgents' folder
RUN wget --http-user=$AUSER --http-password=$APASS /OpenJDK/Linux/OpenJDK11U-jdk_x64_linux_hotspot_11.0.15_10.tar.gz -P /AzDOAgents 
RUN wget --http-user=$AUSER --http-password=$APASS /OpenJDK/Linux/OpenJDK11U-jdk_x64_linux_hotspot-11.0.6_10.tar.gz -P /AzDOAgents
RUN wget --http-user=$AUSER --http-password=$APASS /OpenJDK/Linux/OpenJDK17U-jdk_x64_linux_hotspot_17.0.2_8.tar.gz -P /AzDOAgents \
*&& tar -xvf /AzDOAgents/OpenJDK17U-jdk_x64_linux_hotspot_17.0.2_8.tar.gz -C /AzDOAgents*
RUN wget --http-user=$AUSER --http-password=$APASS /OpenJDK/Linux/openjdk-21.0.2_linux-x64_bin.tar.gz -P /AzDOAgents \
**&& tar -xvf /AzDOAgents/openjdk-21.0.2_linux-x64_bin.tar.gz -C /AzDOAgents**
RUN wget --http-user=$AUSER --http-password=$APASS/OpenJDK/Linux/openlogic-openjdk-8u292-b10-linux-x64.tar.gz -P /AzDOAgents
RUN wget --http-user=$AUSER --http-password=$APASS /OpenJDK/Linux/openlogic-openjdk-8u432-b06-linux-x64.tar.gz -P /AzDOAgents

What do you want to achieve?

What’s the reason for downloading many JDKs in the first place, and then unzipping only a few?

1 Like

The Dockerfile includes multiple versions of Java because the associated container is used as an agent to run pipelines in Azure DevOps. This container acts as an agent to build and compile code pushed by various developers who use different Java versions. Currently, most developers are still using Java 11 and Java 17 and 21 also.

Why not one image by Java version ? (this is the standard).

If not, no problem but then what’s your problem ? I mean you’re downloading a bunch of zip files, if each are huge in size then it’s normal no ?

One Image by java version but that image also has got so many other softwares , I shared only java snippet. so that is why image size is 8.6 GB earlier after adding tar commands for extracting it became 10.5 GB.

If you download and extract many JDKs in a Docker image, it will be large.

If you want to reduce size, create different images for each JDK.

If you install all other tools first, the images might share some layers, which makes image push and pull faster.

But by creating different images for each JDK does not fulfil our purpose, because we use one image which has got other softwares along with JDK which are all required to build the code from pipeline used by many users, so in that case how do we tackle? Please suggest.

They say ”You can’t make an omelette without breaking a few eggs”.

There is no miracle to make an image small, if you download and unzip a lot of files.

Did you have already read existing tutorials ? (there are so many)

For instance, using a multi-stages image, merging multiples RUN statements, cleaning tmp folder(s), running apt-get clean,…

There are also a lot of tools that allow you to analyze each layer of an image, one by one.

Below is the list of software installed in our Dockerfile in the exact order of installation, along with the methods used:

Given this list of installed software and their installation methods, do you think there are any amendments or optimizations we can make to reduce the image size without affecting the functionality of the image? Specifically:

  • Are there alternative methods for installing any of these tools that would result in smaller image layers?
  • Should any of the tools be moved to a multi-stage build process if they are not required in the final runtime image?
  • Are there redundant or overlapping dependencies we could remove?
  • Any best practices or recommendations to further optimize this Dockerfile for size reduction?

Thank you for your suggestions and guidance!

The most obvious recommandations would be:

  • Each RUN yum install like should also clean up the index and cache by adding && yum -y clean all && rm -fr /var/cache.

  • When downloading files with curl or wget, make sure to delete the downloaded files after either extracting archives (like tar.gz files), or installing them (like rpm packages) in the same RUN instruction where the files are downloaded.

Note: files persisted in a layer can not be removed in a following layer. Deleted files will only be marked as deleted if “deleted” in a following layer.

An image that installs many tools and dependencies will never be small. If your CI tool allows setting a container image per pipeline job, separating the tools into dedicated images would be the way to go, especially since every pipeline will only use a fraction of the installed tools.

I am not sure how AzureDO works, but If the pipeline leverages multiple build nodes, each node would need to pull this huge image at least once, and of course every time the image is updated. That’s a huge waste of traffic, node storage and time needed to pull the large image.

1 Like

can we use multi stage in my case?

what is meant by after running RUN yum install like should also clean up the index cache by adding /…?

below is the command we use in the docker file? could you please suggest where exactly we need to use your command to clean up?

Hi

As mentioned by at least two persons, you should try to consolidate multiple RUN statements in once and, one of the last statement in your RUN “block” should run some cleanings like removing temporary files, deleting ZIP files (if they were unzipped). See each RUN as a single layer. Make sure to optimize them one by one.

I don’t use yum so I’ve used Ggle: there are a lot of “yum clean” commands (like yum clean packages, yum clean all, …)

So, a suggestion would be

RUN yum update -y \
   && yum install -y yum-utils jq git wget zip unzip bzip2 nc gettext buildah skopeo podman iputils lsof gcc.x86_64 \
   && yum install -y gcc-c++* make* procps \
   && yum clean packages \
   && yum clean all \
   && rm -fr /var/cache

(this to give you the idea; there are a lot of tutorials on Internet)

I got it, I have to use RUN statement in ONCE, and I will refer internet also. but just to confirm, I can use RUN once in below code also , right?

and what about using multi stage, is it possible in my scenario?

Please take a look at the last response of @cavo789 and how he chained several commands using the && operator in the same RUN instruction.

Note; Every COPY and RUN instruction creates a new image layer.

Of course you can. You could have all downloads and extraction of archives in a stage, and only copy the extracted files into the final image.

See https://docs.docker.com/get-started/docker-concepts/building-images/multi-stage-builds/ for details.