In order for replication to be resilient to unexpected halts of the server (sometimes described as crash-safe) it must be possible for the slave to recover its state before halting. This section describes the impact of an unexpected halt of a slave during replication and how to configure a slave for the best chance of recovery to continue replication.
After an unexpected halt of a slave, upon restart the slave's SQL
thread must recover which transactions have been executed already.
The information required for recovery is stored in the slave's
relay log info log. From MySQL 8.0, this log is created by default
as an InnoDB
table named
mysql.slave_relay_log_info
(with the system
variable
relay_log_info_repository
set to
the default of TABLE
). By using this
transactional storage engine the information is always recoverable
upon restart.
Updates to the relay log info log table are committed together
with the transactions, meaning that the slave's progress
information recorded in that log is always consistent with what
has been applied to the database, even in the event of an
unexpected server halt. Previously, this information was stored by
default in a file in the data directory that was updated after the
transaction had been applied. This held the risk of losing
synchrony with the master depending at which stage of processing a
transaction the slave halted at, or even corruption of the file
itself. The setting
relay_log_info_repository = FILE
is now deprecated, and will be removed in a future release. For
further information on the slave logs, see
Section 17.2.4, “Replication Relay and Status Logs”.
When the relay log info log is stored in the
mysql.slave_relay_log_info
table, DML
transactions and also atomic DDL make the following three updates
together, atomically:
Apply the transaction on the database.
Update the replication positions in the
mysql.slave_relay_log_info
table.Update the GTID in the mysql.gtid_executed table (when GTIDs are enabled and the binary log is disabled on the server).
In all other cases, including DDL statements that are not fully
atomic, and exempted storage engines that do not support atomic
DDL, the mysql.slave_relay_log_info
table might
be missing updates associated with replicated data if the server
halts unexpectedly. Restoring updates in this case is a manual
process. For details on atomic DDL support in MySQL
8.0, and the resulting behavior for the replication
of certain statements, see Section 13.1.1, “Atomic Data Definition Statement Support”.
Exactly how a replication slave recovers from an unexpected halt
is influenced by the chosen method of replication, whether the
slave is single-threaded or multithreaded, the setting of
variables such as
relay_log_recovery
, and whether
features such as MASTER_AUTO_POSITION
are being
used.
The following table shows the impact of these different factors on how a single-threaded slave recovers from an unexpected halt.
Table 17.3 Factors Influencing Single-threaded Replication Slave Recovery
GTID |
MASTER_AUTO_POSITION |
Crash type |
Recovery guaranteed |
Relay log impact |
||
---|---|---|---|---|---|---|
OFF |
Any |
1 |
TABLE |
Server |
Yes |
Lost |
OFF |
Any |
1 |
Any |
OS |
No |
Lost |
OFF |
Any |
0 |
TABLE |
Server |
Yes |
Remains |
OFF |
Any |
0 |
TABLE |
OS |
No |
Remains |
ON |
ON |
Any |
Any |
Any |
Yes |
Lost |
ON |
OFF |
0 |
TABLE |
Server |
Yes |
Remains |
ON |
OFF |
0 |
Any |
OS |
No |
Remains |
As the table shows, when using a single-threaded slave the following configurations are most resilient to unexpected halts:
When using GTIDs and
MASTER_AUTO_POSITION
, setrelay_log_recovery=1
. With this configuration the setting ofrelay_log_info_repository
and other variables does not impact on recovery. Note that to guarantee recovery,sync_binlog=1
(which is the default) must also be set on the slave, so that the slave's binary log is synchronized to disk at each write. Otherwise, committed transactions might not be present in the slave's binary log.When using file position based replication, set
relay_log_recovery=1
andrelay_log_info_repository=TABLE
.NoteDuring recovery the relay log is lost.
The following table shows the impact of these different factors on how a multithreaded slave recovers from an unexpected halt.
Table 17.4 Factors Influencing Multithreaded Replication Slave Recovery
GTID |
|
Crash type |
Recovery guaranteed |
Relay log impact |
|||
---|---|---|---|---|---|---|---|
OFF |
1 |
Any |
1 |
TABLE |
Any |
Yes |
Lost |
OFF |
>1 |
Any |
1 |
TABLE |
Server |
Yes |
Lost |
OFF |
>1 |
Any |
1 |
Any |
OS |
No |
Lost |
OFF |
1 |
Any |
0 |
TABLE |
Server |
Yes |
Remains |
OFF |
1 |
Any |
0 |
TABLE |
OS |
No |
Remains |
ON |
Any | ON |
Any |
Any |
Any |
Yes |
Lost |
ON |
1 |
OFF |
0 |
TABLE |
Server |
Yes |
Remains |
ON |
1 |
OFF |
0 |
Any |
OS |
No |
Remains |
As the table shows, when using a multithreaded slave the following configurations are most resilient to unexpected halts:
When using GTIDs and
MASTER_AUTO_POSITION
, setrelay_log_recovery=1
. With this configuration the setting ofrelay_log_info_repository
and other variables does not impact on recovery.When using file position based replication, set
relay_log_recovery=1
,sync_relay_log=1
, andrelay_log_info_repository=TABLE
.NoteDuring recovery the relay log is lost.
It is important to note the impact of
sync_relay_log=1
, which requires
a write of to the relay log per transaction. Although this setting
is the most resilient to an unexpected halt, with at most one
unwritten transaction being lost, it also has the potential to
greatly increase the load on storage. Without
sync_relay_log=1
, the effect of
an unexpected halt depends on how the relay log is handled by the
operating system. Also note that when
relay_log_recovery=0
, the next
time the slave is started after an unexpected halt the relay log
is processed as part of recovery. After this process completes,
the relay log is deleted.
An unexpected halt of a multithreaded replication slave using the recommended file position based replication configuration above may result in a relay log with transaction inconsistencies (gaps in the sequence of transactions) caused by the unexpected halt. See Section 17.4.1.33, “Replication and Transaction Inconsistencies”. If the relay log recovery process encounters such transaction inconsistencies they are filled and the recovery process continues automatically.
When you are using multi-source replication and
relay_log_recovery=1
, after
restarting due to an unexpected halt all replication channels go
through the relay log recovery process. Any inconsistencies found
in the relay log due to an unexpected halt of a multithreaded
slave are filled.