I am using an Azure VM. Since we want to migrate from one VM to another, I have taken a snapshot of the VM, created a disk from that snapshot, and then created a new VM from the disk. However, since Docker is pre-installed, it is working very slowly now."
How do you see the slowness? When trying to access a website running in a container or the commands in the container are working slowly?
when i try to run docker commands i can see that its much more time than usual.
Okay, but which part? Can you identify that?
- Pulling the images?
- Just creating the container?
- Starting an already created but stopped container?
- Waiting for the process inside the containers to be ready?
Hello,
I am using a Docker Compose file, but when I run docker-compose up
or docker-compose down
, it takes a long time. I am using the latest version of Docker. After adding DNS settings in the daemon.json
file like this:
json
"dns": ["8.8.8.8", "1.1.1.1"]
and restarting, everything seems to be working fine. Could you explain why we need to add this configuration? Also, in the old VM, I didnāt add anything, and everything was working fine.
Below is my doker info.
root@dev-vm:/home/devmesuser/Backups# cat /etc/docker/daemon.json
{
"dns": ["8.8.8.8", "1.1.1.1"],
"log-driver": "loki",
"log-opts": {
"loki-url": "http://loki:3100/loki/api/v1/push",
"loki-batch-size": "400"
}
}
root@dev-vm:/home/devmesuser/Backups# docker info
Client: Docker Engine - Community
Version: 28.0.4
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.22.0
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.34.0
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 41
Running: 30
Paused: 0
Stopped: 11
Images: 38
Server Version: 28.0.4
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: false
userxattr: false
Logging Driver: loki
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
runc version: v1.2.5-0-g59923ef
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.11.0-1012-azure
Operating System: Ubuntu 24.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.57GiB
Name: dev-vm
ID: bcc731fe-6d6b-47ee-9182-0563eb5cdf29
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
::1/128
127.0.0.0/8
Live Restore Enabled: false
Please next time make sure to format your complete post properly: I edited your post and wrapped the last two command and their outputs in code blocks.
When you created a snapshot of the Azure Disk Volume, was the compute node (aka vm) running?
Creating snapshots of a running system always has the risk that the system is modifying one or more files while the snapshot is created, which might result in inconsistent or corrupt files in the snapshot.
I have no idea how Azure handles it: but does it really take care of overriding the hostname (and whatever machine specific configurations there are).
The cleanest way to move to a new docker host, is to create a new docker host, then back up the volumes from the old system, and restore them on the new system. Then deploy everything based on the compose files.
Thank you for your response!
Regarding the snapshot, I made sure to take it while the VM was stopped to ensure data consistency. I understand that creating snapshots of running systems can sometimes result in inconsistencies, so I took precautions to avoid that.
As for the DNS warning Iām seeing in the journctl
logs:
msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers"
This message appeared after restoring the snapshot. From what I gather, it seems like the /etc/resolv.conf
file might not have been properly configured after the restore, and the system is falling back to default external DNS servers.
Below is the current content of my /etc/resolv.conf
:
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 127.0.0.53
options edns0 trust-ad
search k2vh5evtd5neffh1zfcryykifg.tx.internal.cloudapp.net
It seems like the DNS resolution is pointing to a local stub resolver (127.0.0.53
). I will check and configure the system to ensure proper DNS resolution, and Iāll make sure Dockerās DNS settings are correct as well.
Thanks again for your help!
This is most likely the system stub resolver. I usually disable the stub resolver and let netplan manage the resolver.
Docker mounts /etc/resolv.conf to all containers attached to the default bridge. User defined networks use whatever resolver is configured in /etc/resolv.conf as upstream resolver in the custom networkās internal dns server.
It is a little bit more complicated. I remembered it copied the resolv.conf
to the docker data root for the new container and mounted that copy, except when it was the stub resolver, because it always mounts something that should work in a container.
Testing that now, I see that when I have systemd and I use the stub resolver, it mounts /run/systemd/resolve/resolv.conf
and I see this in the logs:
Apr 09 18:53:52 docker-vm-noble dockerd[3887]: time="2025-04-09T18:53:52.799236585+02:00" level=info msg="detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.conf"
If I unlink the symbolic link that points to the stub resolver and use a static resolv conf containing servers pointing to a loopback interface, it generates this one for the containers:
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.
nameserver 8.8.8.8
nameserver 8.8.4.4
search .
options edns0 trust-ad
# Based on host file: '/etc/resolv.conf' (legacy)
# Used default nameservers.
# Overrides: []
And logs this:
Apr 09 19:17:58 docker-vm-noble dockerd[4784]: time="2025-04-09T19:17:58.319918805+02:00" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers"
If I use the static resolv conf but with a non-loopback (non-localhost) IP, it uses the value from the static resolv. conf, but not the original content, only the IP with new comments
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.
nameserver 192.168.100.1
search .
options edns0 trust-ad
# Based on host file: '/etc/resolv.conf' (legacy)
# Overrides: []
So if āNo non-localhostā was logged, it looks like it used Googleās DNS servers which could be slower indeed.
But the resolv conf shared by @abhishekpatwari started with this:.
Which should not have had the the ānon-localhostā log entry unless something else was broken on the system and Docker read another file, or the Docker daemon was not restarted after the resolv.conf was changed since it seems Docker reads the file when the daemon starts, not when the container is created (tested it now).
Can you please provide the steps you have taken? I will try them.
Iām not sure what steps you mean. I just described how the DNS works. Can you be more specific?
I tried unlinking the stub resolver symlink and replaced /etc/resolv.conf
with a static file containing external DNS servers (like 8.8.8.8
and 8.8.4.4
), but DNS resolution inside the container still isnāt working as expected.
Can I ask how exactly you resolved the issue in your case? Did you restart the Docker daemon after modifying /etc/resolv.conf
? Also, did you have to change any Docker daemon configs (like --dns
flags or daemon.json
)?
Thanks in advance!
Thank you for the clarification. Yes, you need to restart the Docker daemon as it will read the new config only when starting.
To find out what the DNS server is in the container, you can run a test container
docker run --rm bash cat /etc/resolv.conf
But you can also directly set the DNS if you think that helps.
docker run --help | grep dns
output
--dns list Set custom DNS servers
--dns-option list Set DNS options
--dns-search list Set custom DNS search domains
But you can do the same in a compose file
https://docs.docker.com/reference/compose-file/services/#dns
Thanks for your suggestions! I tried the steps you recommended, and hereās what Iāve done:
- Updated
/etc/resolv.conf
on the Host: I updated the hostās/etc/resolv.conf
to directly include external DNS servers (8.8.8.8 and 1.1.1.1). Hereās the current state of the hostās/etc/resolv.conf
:
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 8.8.8.8
nameserver 1.1.1.1
options edns0 trust-ad
search .
- Restarted Docker: After editing the
/etc/resolv.conf
on the host, I restarted Docker to ensure the changes were applied:
sudo systemctl restart docker
- Checked DNS in the Container: I checked the DNS configuration inside the container, and itās still showing
127.0.0.11
(the internal Docker DNS resolver). This seems to be correct, but the external DNS servers (8.8.8.8 and 1.1.1.1) are listed as fallback DNS servers:
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.
nameserver 127.0.0.11
search .
options edns0 trust-ad ndots:0
# Based on host file: '/etc/resolv.conf' (internal resolver)
# ExtServers: [host(8.8.8.8) host(1.1.1.1)]
# Overrides: []
# Option ndots from: internal
This seems to be working as expected, but I wanted to make sure everything was properly configured.
4. Slow Container Shutdowns: Despite fixing the DNS issue, Iām still facing slow shutdowns when running docker-compose down
. For example, itās taking over 10 minutes per container to stop (see the following log):
[+] Running 0/9
ā ¦ Container prism_5_plan Stopping 19.7s
ā ¦ Container prism_9_quality Stopping 19.7s
ā ¦ Container prism_1_dashboard Stopping 19.7s
ā ¦ Container prism_4_masterdata Stopping 19.7s
ā ¦ Container prism_8_execution Stopping 19.7s
ā ¦ Container prism_6_audit_notification Stopping 19.7s
ā ¦ Container prism_3_workflow_forms Stopping 19.7s
ā ¦ Container prism_7_warehouse Stopping 19.7s
ā ¦ Container prism_2_users Stopping 19.7s
Additionally, Iāve been seeing the following warnings in the Docker logs related to containers failing to exit within the expected time frame:
Apr 12 02:49:37 mes-dev-vm dockerd[9305]: time="2025-04-12T02:49:37.285702694Z" level=warning msg="Container failed to exit within 10s of kill - trying direct SIGKILL" container=c5eff2ed776987b6eb00cda8564b31bfd672463face1843605189eb63704c6ce error="context deadline exceeded"
Apr 12 02:49:37 mes-dev-vm dockerd[9305]: time="2025-04-12T02:49:37.298844653Z" level=warning msg="Container failed to exit within 10s of kill - trying direct SIGKILL" container=4a36a91d91bc651cf2278346ec08e7ec8ce691b38d5329b2cf4db3548d76061e error="context deadline exceeded"
Apr 12 02:49:37 mes-dev-vm dockerd[9305]: time="2025-04-12T02:49:37.301984403Z" level=warning msg="Container failed to exit within 10s of kill - trying direct SIGKILL" container=1205239cf1af506932983a6f72df7b02a2db450b38a3558b022f390fd21c7f66 error="context deadline exceeded"
These logs indicate that Docker is unable to gracefully stop some of the containers within the usual 10-second timeout and attempts to forcefully kill them with a direct SIGKILL
. This further suggests that something might be wrong with the container shutdown process or the Docker configuration.
The containers stop very slowly, and it doesnāt seem to be related to DNS anymore, but possibly related to Dockerās storage driver (overlay2
) or other system configurations.
Has anyone encountered similar issues with slow shutdown times, and if so, any suggestions on how to troubleshoot or improve this?
Thanks again for the help, looking forward to hearing your thoughts!
Your snapshot brought over Dockerās cached layers, but the new VM may have different IOPS limits or missing resources. You can do the following steps,
- Checking the VM size and disk type (Standard vs Premium).
- Running
docker system prune
to clean up dangling layers. - Ensuring Docker is using the right storage driver.
A fresh image without leftover cache may help if performance doesnāt bounce back.
Since I have no idea how Azure handles things, I suspect the problem to more on the lines of cloud-init, or whatever Azure uses with their vms, that handles the initial bootstrapping of a compute node.
Since the volume is based on a restored snapshot of a vm volume that already went through the initial bootstrapping, it appears that with the new vm this initial bootstrapping wasnāt done at all or only partially.
I feel this is rather a topic for an Azure community (or the Azure support) than for the docker community. Docker is just affected by the problem, not causing it.
Donāt forget that you still have a user-defined network created by compose. So Docker has to use another IP to connect to its built-in name server to resolv compose service names to their IP addresses. It will then forward the request to whatever IP address you defined on your host.
But you can see in the comments that the ExtServers are set to what you wanted.
That is antoher issue with how the process is configured in the container, but it will not make running it slower. It only means that you need to make sure the container handles signals correctly that you donāt just use a command in a bash shell as entrypoint or CMD instruction like:
#!/usr/bin/env bash
myprocess
but
#!/usr/bin/env bash
exec myprocess
i had a tutorial here:
If you let the container to be killed forcefully, in some cases that could lead to data loss or inconsistency