The utfmb4
character set has these
characteristics:
Supports BMP and supplementary characters.
Requires a maximum of four bytes per multibyte character.
utf8mb4
contrasts with the
utf8mb3
character set, which supports only
BMP characters and uses a maximum of three bytes per character:
For a BMP character,
utf8mb4
andutf8mb3
have identical storage characteristics: same code values, same encoding, same length.For a supplementary character,
utf8mb4
requires four bytes to store it, whereasutf8mb3
cannot store the character at all. When convertingutf8mb3
columns toutf8mb4
, you need not worry about converting supplementary characters because there will be none.
utf8mb4
is a superset of
utf8mb3
, so for an operation such as the
following concatenation, the result has character set
utf8mb4
and the collation of
utf8mb4_col
:
SELECT CONCAT(utf8mb3_col, utf8mb4_col);
Similarly, the following comparison in the
WHERE
clause works according to the collation
of utf8mb4_col
:
SELECT * FROM utf8mb3_tbl, utf8mb4_tbl
WHERE utf8mb3_tbl.utf8mb3_col = utf8mb4_tbl.utf8mb4_col;
For information about data type storage as it relates to multibyte character sets, see String Type Storage Requirements.