Docker/MySQL replication fails with connection error when server has second IP address

For a few days I’ve been struggling with a MySQL + Docker + ClusterIP problem. I’m trying to create a failover for my servers. If Server1 fails, Server2 should take over automatically. Both servers are running MySQL 8.0.31 (latest image from the hub) in a Docker container. These databases are synchronized with a replication (master->slave or master-master). That’s the idea…

That works quite well so far. The problem now comes when I assign the cluster IP address as the second IP address to the second server. On server1 (which is the primary server in the cluster) it works well this way.

The databases run and work wonderfully and can also be used with

mysql --host=SERVER_IP_1 --port=33060 -u replicator --password=REPLPW
mysql --host=SERVER_IP_2 --port=33060 -u replicator --password=REPLPW

from the other server. But as soon as I start the replication and display the status with

SHOW SLAVE STATUS\G;

comes after 60sec always the error message

Last_IO_Error: error connecting to master 'replicator@SERVER2:33060' - 
retry-time: 60 retries: 5 message: Can't connect to MySQL server on 'SERVER2:33060' (110)

This only happens if both computers are running in the IP cluster. Unfortunately, I do not understand the cause and am urgently looking for a solution.

I’m really guessing that it has something to do with the docker, because a replication with MariaDB which is also running on the servers but outside the docker works fine.

The very simple docker config looks like this:

docker run \
    --name dbname \
    -v /root/server-mgmt/mysql/mysql.cnf:/etc/my.cnf \
    -e MYSQL_ROOT_PASSWORD=ROOTPW \
    -e restart=unless-stopped \
    -p 33060:3306 \
    -p 33061:33060  \
    -d mysql:latest

I’m pretty new with docker and have no really an idea how to make it run, but meanwhile I’m convinced that it has something to do with letting docker run in the described environment.

How should those server instances communicate? I see that both containers are listening on 33060 instead of 3306. Is it because of replication or for security reason?

Then you have a port forwarding

It is strange to me. Why the host port 33060 goes to the container port 3306 while host port 33061 goes to container port 33060?

The last question is: what is “SERVER2”? Is that a hostname or just a placeholder for us to understand it better then just ip addresses?

Hi Akos,
MySQL 8 communicates on port 3306 and on port 33060 by default. These ports are used “in the container”. On my servers also a MariaDB is running and uses the port 3306 by default (MariaDB doesn’t use 33060 so it’s free). So the port is blocked and I need to do a port mapping. Furthermore I have also a second container running that maps to 33070 (an has the same issue).

The SERVER1 & SERVER2 stand for placeholders of the real IP-Address of the hosts. Lets say we have SERVER1 = 1.1.1.1 and SERVER2 = 2.2.2.2. The configuration is tested with:

mysql --host=1.1.1.1 --port=33060 -u replicator --password=REPLPW
mysql --host=2.2.2.2 --port=33060 -u replicator --password=REPLPW

No I’m doing (1):

mysql --host=2.2.2.2 --port=33060 -u root --password=ROOTPW

CHANGE MASTER TO 
	MASTER_HOST = '1.1.1.1', 
	MASTER_PORT = 33060, 
	MASTER_USER = 'replicator', 
	MASTER_PASSWORD = REPLPW;
START SLAVE;

This works fine and the replication is started up and running.

Now the ClusterIP (or floating IP), e.g. IP 3.3.3.3, is set as the second IP address to servers 1.1.1.1 and 2.2.2.2 and it breaks.

I’m doing the same procedure as described in (1). and after ca. 60sec I get an error message

Last_IO_Error: error connecting to master 'replicator@1.1.1.1:33060' - retry-time: 60 retries: 1 message: 
Can't connect to MySQL server on '1.1.1.1:33060' (110)

Strange thing is: as mentioned there is also a MariaDB running on 1.1.1.1 and 2.2.2.2 - but not in the Docker. Here everything works well. The replication works also when the clusterIP is set to the hosts.

For me it looks like that there is any communication problem with the Docker. But what and how to resolve?

BTW: I have also tried to use other ports and have turned of my firewalls for testing purposes.

I am 99 percent sure it is not a bug in Docker. It must be a configuration issue (mysql, firewall, host machine), but I don’t have a lot of experience with MySQL replication.Not everything works the same way in containers as directly on the host.

MariaDB or MySQL? I know they are similar, but not the same.

I don’t remember exactly how Docker forwards ports and how the IPtables rules created, but If you forward the ports from the host but the host doesn’t have the IP address yet, Docker might not be able to forward from that IP address until you recreate the container. I may be wrong, but after I read your question again, I realized the problem is not that the replication works without containers and not with containers, but the virtual IP. I missed that fact before. It is possible that I still don’t understand everything, so if it is the case, I am sorry for that :slight_smile:

Hi Akos,

you’re true, MariaDB and MySQL are similar but not the same. I used this example/scenario for testing purposes to find out where the problem comes from.

In fact the replication with MySQL works fine as long as I haven’t assigned the second IP (cluster IP) to the host. It hangs as soon my host gets assigned with this second IP.

Because all other configurations/scenarios work fine I’m guessing that it has something to do with the docker or it’s configuration. But what?

I also recreated/restarted the docker after I have assigned the second IP. I did it back and forth, but without any success. The real problem is that I have absolutely no idea where to look to debug. :face_with_raised_eyebrow:

Can you get the same results when you run both containers on the same host? Of course that would require changing the host ports.

Forget it. I relised that would not make sense :slight_smile:

Yes, I get the same fault. I’ve running two MySQL containers on server 2.2.2.2 running on port 33060 and another one on port 33070. I also tried to sync between these containers without success.

How did you assign the virtual IP to the host in this case? I edited my previous post since I thought it would not make sense to test it on the same machine because of the virtual IP. How exactly did you test it? When everything is running on the same host the floating ip cannot move anywhere.

My Server is managed with Plesk. So I don’t need to modify any /etc/… files. In Plesk a simple for adding/changing the IP addresses is provided and I followed the tutorial of my provider and from Plesk itself.

Okay, but if it doesn’t work with both mysql instance on the same machine, then the issue is not the virtual ip. I can’t imagine how thatz could be.

when do you get a “second ip” in this case?

My provider gave me that second IP. This IP is assigned to all hosts in the cluster. You have multiple hosts assigned to a cluster group and one host is “active”. As soon as the “active” hosts goes down (for what reason ever), another host from the group can do a handover. In fact it is used for failover scenarios.

However I access the replication between the different databases only via the “real” IP address of the hosts. Otherwise I would never know which host is addressed.

 mysql --host=2.2.2.2 	<--- real host IP

I feel we are starting to confuse eachother. :slight_smile: So I should probably say “I don’t know”, but I am still trying to figure it out. I know how virtual IP works (at least I know how I use it). I never used plesk, but I configured virtual IP with KeepAlived for HAProxy in a Kubernetes cluster. All of the hosts have the IP address so I can start a new service on that IP even if that is not the active host.

In case of only one host that host has to be the active host, so I don’t see how assigning the VIP to the host can have anything to do with the MySQL connection.

Yes, of course. I know that, but it is still good that you mention it, so I keep in mind.

Since the VIP only required for a client to connect to the MySQL server, can you test it locally (not on plesk) with a simple docker compose file? If you get the same result and you share the compose file and everything that is required to reproduce the issue, I can try that later.

Hi Akos,

I really really appreciate your support and help! Meanwhile I’m desperate, because not finding a reason/solution. To be honest I’m not really familiar with network setups. Let’s say I’m an ambitious user that wants to get things running. That’s why I also very happy with Plesk which helps me to prevent dealing with all the /etc/… config files :innocent:

Unfortunately, I haven’t worked with docker-compose files so much. But before I send you a not working yml file, I do it as I have done.

Docker Start

docker run \
    --name mysql-server1 \
    -v /root/server-mgmt/mysql/mysql-server-1.cnf:/etc/my.cnf \
    -v /var/lib/mysql-docker/server1:/var/lib/mysql \
    -v /var/log/mysql/server1:/var/log/mysql \
    -e MYSQL_ROOT_PASSWORD=secret \
    -e restart=unless-stopped \
    -p 33060:3306 \
    -p 33061:33060  \
    -d mysql:latest

docker run \
    --name mysql-server2 \
    -v /root/server-mgmt/mysql/mysql-server-2.cnf:/etc/my.cnf \
    -v /var/lib/mysql-docker/server2:/var/lib/mysql \
    -v /var/log/mysql/server2:/var/log/mysql \
    -e MYSQL_ROOT_PASSWORD=secret \
    -e restart=unless-stopped \
    -p 33070:3306 \
    -p 33071:33060  \
    -d mysql:latest

MySQL Config mysql-server-1.cnf

[mysqld]
skip-host-cache
skip-name-resolve
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
secure-file-priv=/var/lib/mysql-files
user=mysql

pid-file=/var/run/mysqld/mysqld.pid

sql-mode=""

server-id=1       
log_bin=mysql-bin
log_error=mysql-bin.err

binlog_do_db=demo

gtid_mode = ON
enforce-gtid-consistency = ON

log_error=/var/log/mysql/mysql_error.log
general_log_file=/var/log/mysql/mysql.log
general_log=1

[client]
socket=/var/run/mysqld/mysqld.sock

!includedir /etc/mysql/conf.d/

MySQL Config mysql-server-2.cnf

[mysqld]
skip-host-cache
skip-name-resolve
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
secure-file-priv=/var/lib/mysql-files
user=mysql

pid-file=/var/run/mysqld/mysqld.pid

sql-mode=""

server-id=2       
log_bin=mysql-bin
log_error=mysql-bin.err

binlog_do_db=demo

gtid_mode = ON
enforce-gtid-consistency = ON

log_error=/var/log/mysql/mysql_error.log
general_log_file=/var/log/mysql/mysql.log
general_log=1

[client]
socket=/var/run/mysqld/mysqld.sock

!includedir /etc/mysql/conf.d/

SQL init script on all servers

CREATE USER 'replicator'@'%' IDENTIFIED BY 'replsecrep';
ALTER USER 'replicator'@'%' IDENTIFIED WITH mysql_native_password BY 'replsecrep';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';
FLUSH PRIVILEGES;

CREATE DATABASE demo;
USE demo;
CREATE TABLE Persons (
    PersonID int,
    LastName varchar(255),
    FirstName varchar(255),
    Address varchar(255),
    City varchar(255)
);

On Server2 shell access MySQL CLI:

mysql --host=2.2.2.2 --port=33060 -u root --password=secret

RESET MASTER;
STOP SLAVE;
RESET SLAVE ALL;
CHANGE MASTER TO MASTER_HOST = '1.1.1.1', MASTER_PORT = 33060, MASTER_USER = 'replicator', MASTER_PASSWORD = 'replsecret', MASTER_AUTO_POSITION = 1;
START SLAVE;
SHOW SLAVE STATUS\G;

That’s the setup.

1 Like

Ok, I think it will be enough. I will try it soon. Or at least I wil try to try it :smiley:

1 Like

I tried your setup locally in virtual machines using multipass. In the end I ran

show slave status\G

without the semicolon because using both separators gave me an error message. \G is for showing the result in a user firendly way (if you use the mysql client) but then you don’t need the semicolon

My error message was a little bit different though:

error connecting to master 'replicator@192.168.205.34:33060' - retry-time: 60 retries: 1 message: Access denied for user 'replicator'@'192.168.205.35' (using password: YES)

Maybe I used a wrong password, but I don’t have more time to continue with the testing. The fact that I got permission denied, indicates it could access the mysql server, which yours could not. Since I installed it localy using multipass, it is possoble that the issue is related to the default setup of Plesk.

This is the same error message I get. I comes up after ca. 60sec.
What is the connection status (first line of `SHOW SLAVE STATUS\G) output.

It was your error message. “Can’t connect”. And I got “access denied”.

Sorry for the semicolon. That was a copy&paste mistake by me.

But maybe we are not that far away. What is Slave_IO_State showing to you (first line) and Last_SQL_Errno. I always had:

Slave_IO_State: Connecting to source

and after ca. 60sec I got:

Last_IO_Errno: 2003
Last_IO_Error: error connecting to master 'replicator@1.1.1.1:33060' - retry-time: 60 retries: 1 message: 
Can't connect to MySQL server on '1.1.1.1:33060' (110)

You do have different error message and even different error code. I didn’t share the whole output, but let’s compare the two errors:

The output when I have the two mysql servers running:

*************************** 1. row ***************************
               Slave_IO_State: Connecting to source
                  Master_Host: 192.168.205.34
                  Master_User: replicator
                  Master_Port: 33060
                Connect_Retry: 60
              Master_Log_File:
          Read_Master_Log_Pos: 4
               Relay_Log_File: 295080255d9c-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File:
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 337
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1045
                Last_IO_Error: error connecting to master 'replicator@192.168.205.34:33060' - retry-time: 60 retries: 1 message: Access denied for user 'replicator'@'192.168.205.35' (using password: YES)
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 0
                  Master_UUID:
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp: 221204 18:52:54
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
       Master_public_key_path:
        Get_master_public_key: 0
            Network_Namespace:
1 row in set, 1 warning (0.00 sec)

It shows that it could connect to the port which doesn’t mean it could connect to the MySQL service, since there is something wrong with my permission and it says “Access denied”. But this message or at least the erro code was returned by the remote server. It also knows that I am trying to access using the replicator@192.168.205.35 user since I am trying to connect from mysql-server2 and the host’s ip address is 192.168.205.35

My error code was 1045 and yours was 2003.

Your error message said that you could not connect the the MySQL server. This error must be thrown by the client side after a timeout. It means it is a network related problem. Let’s see what happens when I stop mysql-server1 which gave me “access denied” before and I execute show slave status\G again from mysql-server2.

*************************** 1. row ***************************
               Slave_IO_State: Connecting to source
                  Master_Host: 192.168.205.34
                  Master_User: replicator
                  Master_Port: 33060
                Connect_Retry: 60
              Master_Log_File:
          Read_Master_Log_Pos: 4
               Relay_Log_File: 295080255d9c-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File:
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 337
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2003
                Last_IO_Error: error connecting to master 'replicator@192.168.205.34:33060' - retry-time: 60 retries: 3 message: Can't connect to MySQL server on '192.168.205.34:33060' (111)
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 0
                  Master_UUID:
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp: 221204 18:54:54
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
       Master_public_key_path:
        Get_master_public_key: 0
            Network_Namespace:
1 row in set, 1 warning (0.00 sec)

Now I get the same error message and error code as you, because there is nothing to listen on the port I want to connect to.

I used two Ubuntu virtual machines created by multipass. I guess you have RadHat or CentOS or something like that and SELinux does not allow Docker to listen on that port. Or you have a default firewall configuration. I am not sure, but one thing I am sure of, that your error is network related :slight_smile: and mine isn’t.

Here is a similar issue to yours:

Thank you very much for your time and effort you spent!

I’m also using Ubuntu 22.04. The network config (firewall) was one of my first assumption (few days ago) and so I turned of the firewalls on my hosts. The result was the same.

I already now that link you sent, but I guess it doesn’t affect Ubuntu. Maybe do you have any other idea?