Hi Everyone,
Before I begin explaining my problem, I wanted to let you know that I am a new user of docker and have only recently started working on it. So please bare with me if I sound a bit naive.
Scenario:
So currently I have a python script running within a docker container. The script queries data from a DB and then imports the info into a pandas dataframe and then writes out a csv file to a folder within an nfs mount. The nfs share is mounted during runtime.
Server Properties:
fc8tdtsr@fc8tdbitmapconvs08]$ uname -r
3.10.0-957.5.1.el7.x86_64
Docker version:
fc8tdtsr@fc8tdbitmapconvs08]$ docker --version
Docker version 18.03.1-ce, build 9ee9f40
Docker Info
fc8tdtsr@fc8tdbitmapconvs08]$ docker info
Containers: 46
Running: 46
Paused: 0
Stopped: 0
Images: 459
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-957.5.1.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 128
Total Memory: 1.968TiB
Name: fc8tdbitmapconvs08
ID: NXBX:GCN7:UY6S:QWW4:RB5G:JDLW:FRMI:YJQZ:37SY:RDV5:5NO6:V2MS
Docker Root Dir: /local/docker-data-root
Debug Mode (client): false
Debug Mode (server): false
HTTPS Proxy: uswwwp1.gfoundries.com:74
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
0.0.0.0/0
127.0.0.0/8
Live Restore Enabled: false
Docker Command that I run:
docker run -d --restart=always --volume-driver=nfs -v /td-bmp:/td-bmp:rw
–memory=128g --memory-reservation=32g --cpu-shares=28
–name=worker-1 daas-worker:latest --spark-name daas1
My Problem
The csv generation takes a LONG time when I try to run the script within the docker container as compared to running it on the host.
For example in-order to generate a 5GB csv file the host takes an avg time of 30 mins (including querying the db and writing out the csv file). Whereas if I run the same scenario within the container, it takes almost 1.5 hrs to generate the same results. That is an hour more than the host.
From what I understand, the difference shouldn’t be that huge. I mean I do understand that there will be some trade offs but this 1 hr sounds real bad. Am I doing something wrong?
Please do let me know if you need anything else from me.
Thanks!