Docker Community Forums

Share and learn in the Docker community.

Killing a docker process that uses cpusets crashes the node


(Abria) #1

Dear all,

We’re running :

docker --version

Docker version 1.4.1, build 5bc2ff8/1.4.1

uname -r

2.6.32-431.29.2.el6.x86_64

cat /etc/redhat-release

Scientific Linux release 6.5 (Carbon)

We are running a batch system (UGE) and we’ve integrated docker in a
way that the core binding configuration that the batch system assigns
to a docker job is passed to docker so it can create its own cpuset and
memory limits.
Everything is working fine as the cpuset is correctly created and the
memory limit is properly set.

But when one of those jobs is killed (kill -9), the node crashes. The
vmcore-txt shows:

<6>lo: Disabled Privacy Extensions
<6>device veth369ed45 entered promiscuous mode
<6>ADDRCONF(NETDEV_UP): veth369ed45: link is not ready
<4>EXT4-fs (dm-3): warning: maximal mount count reached, running e2fsck is recommended
<6>EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts:
<6>ADDRCONF(NETDEV_CHANGE): veth369ed45: link becomes ready
<6>docker0: port 1(veth369ed45) entering forwarding state
<6>lo: Disabled Privacy Extensions
<6>device vethaa625ad entered promiscuous mode
<6>ADDRCONF(NETDEV_UP): vethaa625ad: link is not ready
<6>ADDRCONF(NETDEV_CHANGE): vethaa625ad: link becomes ready
<6>docker0: port 2(vethaa625ad) entering forwarding state
<6>device veth0457980 entered promiscuous mode
<6>ADDRCONF(NETDEV_UP): veth0457980: link is not ready
<6>ADDRCONF(NETDEV_CHANGE): veth0457980: link becomes ready
<6>docker0: port 3(veth0457980) entering forwarding state
<4>general protection fault: 0000 [#1] SMP
<4>last sysfs file: /sys/devices/virtual/net/vethaa625ad/flags
<4>CPU 0
<4>Modules linked in: veth nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipt_addrtype xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables bridge dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio ipmi_devintf 8021q garp stp llc autofs4 cpufreq_ondemand freq_table pcc_cpufreq ipv6 ext3 jbd microcode power_meter iTCO_wdt iTCO_vendor_support hpwdt hpilo bnx2x libcrc32c mdio sg serio_raw lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 2174, comm: docker Not tainted 2.6.32-431.29.2.el6.x86_64 #1 HP ProLiant BL460c G6
<4>RIP: 0010:[] [] list_del+0x10/0xa0
<4>RSP: 0018:ffff8806158e9dc8 EFLAGS: 00010092
<4>RAX: dead000000200200 RBX: ffff880c04ba2e18 RCX: 0000000000000010
<4>RDX: 0000000000000002 RSI: 0000000000000003 RDI: ffff880c04ba2e18
<4>RBP: ffff8806158e9dd8 R08: 0000000000000010 R09: 0000000000000000
<4>R10: 0000000000000000 R11: 0000000000000246 R12: ffff880c04ba2e00
<4>R13: ffff880c12fd1f18 R14: 0000000000000010 R15: 0000000000000000
<4>FS: 00007fc4893cb700(0000) GS:ffff880635400000(0000) knlGS:0000000000000000
<4>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>CR2: 00007f4e6ef00000 CR3: 0000000c1366f000 CR4: 00000000000007f0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process docker (pid: 2174, threadinfo ffff8806158e8000, task ffff8806136ad540)
<4>Stack:
<4> ffff880c12fd1f18 ffff880c04ba2e40 ffff8806158e9e08 ffffffff810c9d42
<4> 0000000100000004 ffff880c02370df8 0000000000000000 ffff880c02370e10
<4> ffff8806158e9e58 ffffffff810546b9 ffff8806158e9f58 0000000300000001
<4>Call Trace:
<4> [] cgroup_event_wake+0x42/0x70
<4> [] __wake_up_common+0x59/0x90
<4> [] __wake_up+0x48/0x70
<4> [] eventfd_release+0x2d/0x40
<4> [] __fput+0xf5/0x210
<4> [] fput+0x25/0x30
<4> [] filp_close+0x5d/0x90
<4> [] sys_close+0xa5/0x100
<4> [] system_call_fastpath+0x16/0x1b
<4>Code: 89 95 fc fe ff ff e9 ab fd ff ff 4c 8b ad e8 fe ff ff e9 db fd ff ff 90 90 90 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 47 08 <4c> 8b 00 4c 39 c7 75 39 48 8b 03 4c 8b 40 08 4c 39 c3 75 4c 48
<1>RIP [] list_del+0x10/0xa0
<4> RSP

Anyone is aware of this kind of problem? I’ve been googling for this
but found no results…

TIA,
Arnau