Understanding iptables rules added by docker

Let’s say I’ve got a server with lo and eth0 (1.1.1.1) interfaces. I’ve just installed docker (no swarm mode). When I start it, it adds the docker0 interface (172.17.0.1) and the following iptables rules:

*nat
:PREROUTING ACCEPT
:INPUT ACCEPT
:OUTPUT ACCEPT
:POSTROUTING ACCEPT
:DOCKER -

# (nat.1)
# when receiving a connection targeting a local address
# from the outside world to 1.1.1.1,
# or from a container to 172.17.0.1, 1.1.1.1
# jump to the DOCKER chain
-A PREROUTING -m addrtype --dst-type LOCAL
    -j DOCKER

# (nat.2)
# when establishing a connection from the host
# to a local address (1.1.1.1, 172.17.0.1),
# jump to the DOCKER chain
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype
    --dst-type LOCAL -j DOCKER

# (nat.3)
# when receiving a connection
# from a container to the outside world,
# or establishing from the host to 172.17.0.1
# do SNAT
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0
    -j MASQUERADE

# (nat.4)
# return if a connection is coming from a container
-A DOCKER -i docker0 -j RETURN

# here we're left with connections coming from the outside world to 1.1.1.1,
# and from the host to 1.1.1.1, 172.17.0.1
# and here's where DNAT rules will be added

*filter
:INPUT ACCEPT
:FORWARD DROP  # DROP policy
:OUTPUT ACCEPT
:DOCKER -
:DOCKER-ISOLATION-STAGE-1 -
:DOCKER-ISOLATION-STAGE-2 -
:DOCKER-USER -

# (filter.1)
-A FORWARD -j DOCKER-USER

# (filter.2)
-A FORWARD -j DOCKER-ISOLATION-STAGE-1

# (filter.3)
# accept established and related connections
# to a container
# from the outside world (in case they are forwarded, none by default),
# or from another container
-A FORWARD -o docker0 -m conntrack
    --ctstate RELATED,ESTABLISHED -j ACCEPT

# (filter.4)
# jump to the DOCKER chain
# for packets coming
# to a container
# from the outside world
# or from another container
-A FORWARD -o docker0 -j DOCKER

# (filter.5)
# accept packets coming
# from a container
# to the outside world
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT

# (filter.6)
# accept packets between containers
-A FORWARD -i docker0 -o docker0 -j ACCEPT

# (filter.7)
# jump to DOCKER-ISOLATION-STAGE-2
# for packets coming
# from a container
# to the outside world
-A DOCKER-ISOLATION-STAGE-1 -i docker0
    ! -o docker0 -j DOCKER-ISOLATION-STAGE-2

# (filter.8)
-A DOCKER-ISOLATION-STAGE-1 -j RETURN

# (filter.9)
# drop packets coming
# to a container
# from the outside world,
# or from another container
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP

# (filter.10)
-A DOCKER-ISOLATION-STAGE-2 -j RETURN

# (filter.11)
# placeholder for user rules
# https://docs.docker.com/network/iptables/
-A DOCKER-USER -j RETURN

The same in a more succinct way. And on a separate page to not scroll up and down continuously.

And so I’m trying to see the big picture.

It adds SNAT for connections coming from containers (nat.3). The case from the host to 172.17.0.1 is most likely unintentional.

Then supposedly makes provisions for DNAT rules (nat.1, nat.2, nat.4).

DOCKER-USER chain is supposedly for rules added manually by a user (filter.1, filter.11).

DOCKER-ISOLATION-STAGE-2 seems to be useless (filter.9, filter.10), at least right after the start. It drops packets coming to a container, but is jumped to for packets coming from a container (filter.7, filter.8).

filter.3, filter.4 supposedly deal with connections to containers. filter.4 is probably for rules restricting access to containers from the outside world (new connections), filter.3 allows further communication if filter.4 was passed for a new connection.

filter.5, filter.6 accept packets from containers to the outside world and between containers.

Then, when I publish a port (-p 111:222), I get a couple more rules:

 *nat
 -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
 -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
 -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE

 # SNAT connections coming from a container to itself to port 222
 # but this rule never matches (these packets don't reach the host)
+-A POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 222 -j MASQUERADE

 -A DOCKER -i docker0 -j RETURN

 # DNAT connections coming to port 111
 # from the outside world to 1.1.1.1,
 # and from the host to 1.1.1.1, 172.17.0.1
 # but actually it works without this rule from the host
 # supposedly owing to docker-proxy listening to *:111
+-A DOCKER ! -i docker0 -p tcp -m tcp --dport 111 -j DNAT --to-destination 172.17.0.2:222

 * filter
 -A FORWARD -j DOCKER-USER
 -A FORWARD -j DOCKER-ISOLATION-STAGE-1
 -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
 -A FORWARD -o docker0 -j DOCKER
 -A FORWARD -i docker0 ! -o docker0 -j ACCEPT
 -A FORWARD -i docker0 -o docker0 -j ACCEPT

# allow DNAT'ed connections from the outside world to the container
# only FORWARD's policy is DROP
+-A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 222 -j ACCEPT

When you additionally specify an IP address (-p 127.0.0.1:111:222), you get the same 3 rules but with a slight change:

 -A POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 222 -j MASQUERADE
--A DOCKER ! -i docker0 -p tcp -m tcp --dport 111 -j DNAT --to-destination 172.17.0.2:222
+-A DOCKER -d 127.0.0.1/32 ! -i docker0 -p tcp -m tcp --dport 111 -j DNAT --to-destination 172.17.0.2:222
 -A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 222 -j ACCEPT

And again it works without DNAT rule.

Adding a user-defined bridge network duplicates docker0-specific rules for the new interface:

 *nat
 -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER                              
 -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
+-A POSTROUTING -s 172.18.0.0/16 ! -o br-b215ed0febb5 -j MASQUERADE
 -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
+-A DOCKER -i br-b215ed0febb5 -j RETURN                                                              
 -A DOCKER -i docker0 -j RETURN

*filter
 -A FORWARD -j DOCKER-USER            
 -A FORWARD -j DOCKER-ISOLATION-STAGE-1                                                              
+-A FORWARD -o br-b215ed0febb5 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
+-A FORWARD -o br-b215ed0febb5 -j DOCKER
+-A FORWARD -i br-b215ed0febb5 ! -o br-b215ed0febb5 -j ACCEPT
+-A FORWARD -i br-b215ed0febb5 -o br-b215ed0febb5 -j ACCEPT
 -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
 -A FORWARD -o docker0 -j DOCKER
 -A FORWARD -i docker0 ! -o docker0 -j ACCEPT
 -A FORWARD -i docker0 -o docker0 -j ACCEPT
+-A DOCKER-ISOLATION-STAGE-1 -i br-b215ed0febb5 ! -o br-b215ed0febb5 -j DOCKER-ISOLATION-STAGE-2
 -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
 -A DOCKER-ISOLATION-STAGE-1 -j RETURN
+-A DOCKER-ISOLATION-STAGE-2 -o br-b215ed0febb5 -j DROP
 -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
 -A DOCKER-ISOLATION-STAGE-2 -j RETURN
 -A DOCKER-USER -j RETURN 

Do you have anything to add?

I noticed something interesting on different docker servers that I have.

Some contain this masquerade rule specifically for the docker_gwbridge output

-A POSTROUTING -o docker_gwbridge -m addrtype --src-type LOCAL -j MASQUERADE

Other servers contain a more generic masquerade rule for my actual ethernet interface, which to be honest I find too generic to be created by docker.

-A POSTROUTING -o eno1 -m addrtype --src-type LOCAL -j MASQUERADE

Do you also experience these different rules ? And do you have any idea when/why is 1 created instead of the other ? Because I cannot pin-point the reason yet.

Context it’s docker servers in a docker swarm, some are manager and some are not, can’t seem to see a relationship with that either.

I can remove the rule & restart docker and the specific rule will appear. For some reason its consistent for each server, even though it’s not consistent throughout all servers.

Re the isolation stages - it is not apparent when you only have a single docker network (called docker0 by default), but more apparent if you have multiple networks.

Then isolation stage 1 will send any container-originating traffic that is not destined to the same container to stage 2.
The stage 2 will reject any traffic targeting a container.

So, in effect, stage 1+2 prevents containers on different docker networks to communicate with each other (but allows containers to communicate with the outside world, at this point).

(Slightly different thing: It is possible for a container to be part of two different networks. Then docker will arrange two virtual ethernet devices inside the container, one bound to the bridge of one network, other to other, with appropriate subnet routing set up. So the network in this case will traverse under the specific bridge directly, and not leave from the bridge to the host and then say come back on the other bridge. AFAIU).

See also linux - Why does docker bypass ufw rules one time and another time not? - Server Fault on how you can generally debug what is happening, with network namespaces, packet tracing etc.