12.3.5 Column Character Set and Collation (original) (raw)
12.3.5 Column Character Set and Collation
Every “character” column (that is, a column of typeCHAR,VARCHAR, aTEXT type, or any synonym) has a column character set and a column collation. Column definition syntax for CREATE TABLE andALTER TABLE has optional clauses for specifying the column character set and collation:
col_name {CHAR | VARCHAR | TEXT} (col_length)
[CHARACTER SET charset_name]
[COLLATE collation_name]
These clauses can also be used forENUM andSET columns:
col_name {ENUM | SET} (val_list)
[CHARACTER SET charset_name]
[COLLATE collation_name]
Examples:
CREATE TABLE t1
(
col1 VARCHAR(5)
CHARACTER SET latin1
COLLATE latin1_german1_ci
);
ALTER TABLE t1 MODIFY
col1 VARCHAR(5)
CHARACTER SET latin1
COLLATE latin1_swedish_ci;
MySQL chooses the column character set and collation in the following manner:
- If both
CHARACTER SET_`charsetname`_
andCOLLATE_`collationname`_
are specified, character set_charsetname
_ and collation_collationname
_ are used.
CREATE TABLE t1
(
col1 CHAR(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
) CHARACTER SET latin1 COLLATE latin1_bin;
The character set and collation are specified for the column, so they are used. The column has character setutf8mb4
and collationutf8mb4_unicode_ci
.
- If
CHARACTER SET_`charsetname`_
is specified withoutCOLLATE
, character set_charsetname
_ and its default collation are used.
CREATE TABLE t1
(
col1 CHAR(10) CHARACTER SET utf8mb4
) CHARACTER SET latin1 COLLATE latin1_bin;
The character set is specified for the column, but the collation is not. The column has character setutf8mb4
and the default collation forutf8mb4
, which isutf8mb4_0900_ai_ci
. To see the default collation for each character set, use theSHOW CHARACTER SET statement or query the INFORMATION_SCHEMA
CHARACTER_SETS table.
- If
COLLATE_`collationname`_
is specified withoutCHARACTER SET
, the character set associated with_collationname
_ and collation_collationname
_ are used.
CREATE TABLE t1
(
col1 CHAR(10) COLLATE utf8mb4_polish_ci
) CHARACTER SET latin1 COLLATE latin1_bin;
The collation is specified for the column, but the character set is not. The column has collationutf8mb4_polish_ci
and the character set is the one associated with the collation, which isutf8mb4
.
- Otherwise (neither
CHARACTER SET
norCOLLATE
is specified), the table character set and collation are used.
CREATE TABLE t1
(
col1 CHAR(10)
) CHARACTER SET latin1 COLLATE latin1_bin;
Neither the character set nor collation is specified for the column, so the table defaults are used. The column has character set latin1
and collationlatin1_bin
.
The CHARACTER SET
andCOLLATE
clauses are standard SQL.
If you use ALTER TABLE to convert a column from one character set to another, MySQL attempts to map the data values, but if the character sets are incompatible, there may be data loss.