Hi,
I am currently setting up a Docker environment for shared web hosting.
We want to segregate every websites into their own container to limit cross-sites infection in the case that a client does not upgrade his CMS (eg: Wordpress)
I’ve been playing with Docker a lot recently and built our own container to host the websites.
I’d like to know if anyone have “best practices” for that king of setup, I searched and searched, I could not find anything.
So far, here is how I’m planning to setup the environment.
First, there is a nginx proxy running on ports 80/443
Then all the sites are under /web/www.website.com
Every container is started with /web/www.website.com → /web (in the container), 80/443 ports are forwarded and setup in the nginx proxy.
Logs are centralized but that’s not important for the moment.
Inside the container there is a nginx/php-fpm setup to serve /web
Clients will be able to update their website with a ftp service running on the docker host and will be chrooted into their /web/www.website.com (I’m thinking about running the ftp service in a container also)
Am I on the right path?, has anyone tried a similar setup? What is your opinion?
Feel free to add suggestions, we’re in the design/proof of concept state
Use AppArmor / grsec / SELinux as much as possible
Restrict network access (e.g. egress) in running containers as much as possible
Look into --cap-drop and drop the caps those containers won’t need
Do not allow anyone access to the Docker API or CLI
Do not bind mount files to/from the host
Stay on top of CVEs, especially for the Linux kernel itself
It’s always a risk, you’re letting people execute arbitrary code on your servers, or at least upload arbitrary files. However if you’re willing to follow the guidelines above you’ll be in better shape than if you didn’t.
Because there seems to be no need for it and it’s a way to “break out” of the container (imagine an RCE is discovered on your host server – with a bind mount dropping a malicious script to execute onto it just got one step easier).
I’d run the FTP servers in their own containers with the content serving directory as a named volume.
Then, I’d run the actual websites themselves using the same volume from the FTP server containers. Your customers can upload their files to the FTP servers running in containers, and they’ll share this directory with the web servers.
Something like this:
$ docker volume create --name ftpserver0
ftpserver0
$ docker run -v ftpserver0:/web/www.website.com alpine sh -c 'echo Hello World >/web/www.website.com/index.html'
$ docker run -v ftpserver0:/foo alpine cat /foo/index.html
Hello World
I’m in the exact same situation as the OP (you can read what I tried here), so I’m curious: how would volumes protect you from a RCE? They are still, in essence, directories in a host’s partition.
@morpheu5 I admit I’m stretching a bit, but my two main concerns would be:
Permissions. If you haven’t configured user namespace remapping properly and are running containers as root (which is, unfortunately, common practice), files (including executable files) dropped into the container are going to show up on the host as being owned by root. Suddenly interacting with this directory on the host requires root permissions, and it really shouldn’t. To avoid this you have to use USER and make sure the shared volume is readable/writeable by both users, etc.
By virtue of requiring a bind mount instead of a normal volume it’s implied that you will be doing some kind of direct interaction with these files from the host machine, including potentially executing them. Why not do this inside of a container instead? It would be more isolated.
@morpheu5 As for your StackOverflow post, on this point:
To make things worse, I also realised that, creating a container to access all those htdocs volumes, would mean creating it with volumes_from a long list of other containers. However, all those containers would have the exact same internal mount point (/server/htdocs), so how would that work out? This is where my grand plan started to really show its limits.
Why not just re-use the named volumes for this? This is one design dilemma named volumes are intended to help with. Something like this should work, unless I’m mis-remembering:
version: '2'
volumes:
myvol:
driver: local
services:
foo:
image: alpine
volumes:
- myvol:/opt/foo
bar:
image: alpine
volumes:
- myvol:/opt/bar
Fair point. A bit of a stretch indeed, but very fair point. I suppose, given that most images implicitly run as root, I could configure namespace mapping, I’d just have to do some research as I’ve never done it before. Any pointers? Alternatively, I could just as easily impose a user when a container is created, right?
I still fail to see how /var/lib/docker/volumes/volume_name/_data would be more isolated than, say, /my/more_convenient/location_for/volume_name.
Regarding your point on my SO post, I think you got it backwards. What I meant was that, when I fire up several containers from a certain image that mounts a volume in /some/place, and then I have another container running a FTP server like
Well, look at the path names above. One is clearly something that you intend to interact with directly, whereas the other is intended to stay in Dockerland. Maybe I’m being paranoid, but I feel exposing an FS from inside the container to the host user to be directly interacted with increases risk surface area. So, if you can keep the isolation a little better, why not?
Seems like you understand volumes_from correctly, but I still don’t understand why you want to use volumes_from for this instead of named volumes. Say containers A, B, C, and D all need to share a directory but have it mounted in at arbitrary paths in each container. This is trivially expressed using named volumes, just create the volume and specify the path in each one individually. You can do pretty much whatever you want in that regard. For instance, try this example:
$ docker volume create --name myvol
myvol
$ docker run -v myvol:/var/www/html alpine \
sh -c 'echo "<p>Welcome to my sweet website</p>" >/var/www/html/index.html'
$ docker run -d -v myvol:/usr/share/nginx/html nginx
b820125db90cf46c096b836c1b0c386ff8db1e746900403e96ea2f2b672f5802
$ docker run --net container:$(docker ps -lq) \
nathanleclaire/curl curl -s localhost
<p>Welcome to my sweet website</p>
What do you need to do that isn’t covered by this feature? As you can see above, the file gets created initially at directory /var/www/html in the container using volume myvol, but the nginx container serves the content inside of /usr/share/nginx/html. If you’re sharing a volume, you can mount them in wherever you want. You don’t need volumes_from and it may even get deprecated at some point.
That’s what I’m not getting, the isolation. I suppose if an attacker can exploit a RCE to the host, there really isn’t much of a difference where the executable has been dropped into. I mean, I can easily ls into a named volumes just as much as I can do with a host mount, or even a named volume with driver local-persist with a specified mountpoint, which effectively is a proper named volume that I can mount anywhere I like in my host.
In fact, I probably was not considering the difference between volumes and volumes_from clearly enough. In my use case, I’d like to have a SFTP service that exposes the htdocs volumes from the various httpd containers. You can clearly see that having a single SFTP container mounting volumes_from would be very hard, considering that the volumes from every individual httpd container are internally mounted to the same mountpoint.
Thanks for the clarifications, I should be able to do something more sensical about my SO question now.
FROM alpine:3.3
RUN apk add --update openssh && rm -rf /var/cache/apk/*
RUN ssh-keygen -A
COPY sshd_config /etc/ssh/sshd_config
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
with sshd_config like
Port 22
Protocol 2
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_ecdsa_key
HostKey /etc/ssh/ssh_host_ed25519_key
SyslogFacility AUTHPRIV
PermitRootLogin no
PubkeyAuthentication no
AuthorizedKeysFile .ssh/authorized_keys
PasswordAuthentication yes
ChallengeResponseAuthentication yes
UsePrivilegeSeparation sandbox
AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES
AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
AcceptEnv LC_IDENTIFICATION LC_ALL LANGUAGE
AcceptEnv XMODIFIERS
Subsystem sftp internal-sftp
Match Group sftponly
ChrootDirectory %h
ForceCommand internal-sftp
AllowTcpForwarding no
Now, my plan was to have different users, each chrooted in their own home directory, and symlink their various htdocs volumes into their homes. However, I verified that the users can’t make any changes to their symlinked htdocs dirs, because of permissions. So I thought I could create a group, add every user to this group, assign every mounted htdocs volume to that group, and sgid them so new files would be created belonging to the group. I could even umask 0002 so every new file/directory would be created as g+rw(x).
Sadly, this would mess up permissions when seen from other containers, namely those that have to serve the htdocs, and potentially add, change, and delete files themselves (you know, like any other CMS-based website out there).
Question: is there a way to mount the same volumes in different containers having different owners and permissions? Or how am I supposed to deal with this? I think this would be crucial to the OP’s plan as well…
Why not have a SFTP server + application server container pair per-user? It seems like that could potentially be much easier to architect, there will be some overhead from running many containers instead of only a couple but it will be pretty small.
I’ve come across that post several times, but never dug deep.
Regarding the combo, I thought about that, but I was told that’s not “The Docker Way”, as in, it would require running two daemons through an “init” script, and you’d lose the ability to easily send signals directly to the daemons – unless you do clever things like exec killall or other signal catch-and-remapping. In extrema ratio, I’ll look into that.
I see, my bad. Well, I thought about that at the very beginning, but the thing is, how do you proxy based on the domain name? HTTP passes that information with the request, but I’m not sure FTP passes it, and SSH definitely does not do it.
BTW, until recently I was going for openssh’s internal SFTP, but I was having troubles setting umasks. I gave up on that and I’m looking into ProFTPD which should handle SFTP and make it easier to configure umasks. No luck so far, but I only had 10 minutes with it before I left the office.
Then the domain / hostname routing layer could live in a HAproxy container, regenerating the config and reloading when needed, and handing off the TCP connections to the downstream containers. Interlock does stuff like this to enable load balancing. All the containers could live on the same docker network to enable this.
That might be very interesting indeed. Would that allow me to keep a single port open on the host (say 2122) for incoming sftp connections, and have foo.com:2122 go to container foo_sftp:22, and bar.com:2122 go to container bar_sftp:22?