Shared memory performance

Hi,
I have been doing some benchmarking and noticed that shared memory is always a bit slower if run in a container.
I have got two processes which share a memory region, created with shm_open() and mapped with mmap().
The first process generates some random message and writes it into the shared memory region and the second process which runs in a container (using the hosts ipc namespace) copies this message into its own virtual address space.
I noticed that is takes just a little longer transmitting the message (this includes writing the message into the shared memory and copping it into to second processes address space), compared to running both parts as a plain old process. I ran the benchmark with a message size of 32, 1024, 2048, 4096 and 9812 Bytes each 5000 times. It takes between 25 and 60% longer (linearly growing with the message size) when the second part is run in a container.
So my question is: where does this overhead get introduced? Is some technique comparable to KSM (Kernel Samepage Merging) used probably not right?
I hope that question is not to trivial but I have a very limited understanding of cgroups, namespaces etc. so I thought I’d just ask you guys and girls because I didn’t really find anything regarding the shmem performance on docker.

Thanks a lot for even reading this.