14.13 Encryption and Compression Functions (original) (raw)

Many encryption and compression functions return strings for which the result might contain arbitrary byte values. If you want to store these results, use a column with aVARBINARY orBLOB binary string data type. This avoids potential problems with trailing space removal or character set conversion that would change data values, such as may occur if you use a nonbinary string data type (CHAR,VARCHAR,TEXT).

Some encryption functions return strings of ASCII characters:MD5(),SHA(),SHA1(),SHA2(),STATEMENT_DIGEST(),STATEMENT_DIGEST_TEXT(). Their return value is a string that has a character set and collation determined by thecharacter_set_connection andcollation_connection system variables. This is a nonbinary string unless the character set isbinary.

If an application stores values from a function such asMD5() orSHA1() that returns a string of hex digits, more efficient storage and comparisons can be obtained by converting the hex representation to binary usingUNHEX() and storing the result in aBINARY(N) column. Each pair of hexadecimal digits requires one byte in binary form, so the value of N depends on the length of the hex string. N is 16 for an MD5() value and 20 for aSHA1() value. ForSHA2(),N ranges from 28 to 32 depending on the argument specifying the desired bit length of the result.

The size penalty for storing the hex string in aCHAR column is at least two times, up to eight times if the value is stored in a column that uses theutf8mb4 character set (where each character uses 4 bytes). Storing the string also results in slower comparisons because of the larger values and the need to take character set collation rules into account.

Suppose that an application storesMD5() string values in aCHAR(32) column:

CREATE TABLE md5_tbl (md5_val CHAR(32), ...);
INSERT INTO md5_tbl (md5_val, ...) VALUES(MD5('abcdef'), ...);

To convert hex strings to more compact form, modify the application to use UNHEX() andBINARY(16) instead as follows:

CREATE TABLE md5_tbl (md5_val BINARY(16), ...);
INSERT INTO md5_tbl (md5_val, ...) VALUES(UNHEX(MD5('abcdef')), ...);

Applications should be prepared to handle the very rare case that a hashing function produces the same value for two different input values. One way to make collisions detectable is to make the hash column a primary key.

Note

Exploits for the MD5 and SHA-1 algorithms have become known. You may wish to consider using another one-way encryption function described in this section instead, such asSHA2().

Caution

Passwords or other sensitive values supplied as arguments to encryption functions are sent as cleartext to the MySQL server unless an SSL connection is used. Also, such values appear in any MySQL logs to which they are written. To avoid these types of exposure, applications can encrypt sensitive values on the client side before sending them to the server. The same considerations apply to encryption keys. To avoid exposing these, applications can use stored procedures to encrypt and decrypt values on the server side.

16 * (trunc(string_length / 16) + 1)  

keystr
The encryption key, or the input keying material that is used as the basis for deriving a key using a key derivation function (KDF). For the same instance of data, use the same value of_keystr_ for encryption withAES_ENCRYPT() and decryption withAES_DECRYPT().
If you are using a KDF, which you can from MySQL 8.0.30,keystr can be any arbitrary information such as a password or passphrase. In the further arguments for the function, you specify the KDF name, then add further options to increase the security as appropriate for the KDF.
When you use a KDF, the function creates a cryptographically strong secret key from the information passed in keystr and any salt or additional information that you provide in the other arguments. The derived key is used to encrypt and decrypt the data, and it remains in the MySQL Server instance and is not accessible to users. Using a KDF is highly recommended, as it provides better security than specifying your own premade key or deriving it by a simpler method as you use the function.
If you are not using a KDF, for a key length of 128 bits, the most secure way to pass a key to the_keystr_ argument is to create a truly random 128-bit value and pass it as a binary value. For example:

INSERT INTO t  
VALUES (1,AES_ENCRYPT('text',UNHEX('F3229A0B371ED2D9441B830D21A390C3')));  

A passphrase can be used to generate an AES key by hashing the passphrase. For example:

INSERT INTO t  
VALUES (1,AES_ENCRYPT('text', UNHEX(SHA2('My secret passphrase',512))));  

If you exceed the maximum key length of 128 bits, a warning is returned. If you are not using a KDF, do not pass a password or passphrase directly to_keystr, hash it first. Previous versions of this documentation suggested the former approach, but it is no longer recommended as the examples shown here are more secure.
initvector
An initialization vector, for block encryption modes that require it. Theblock_encryption_mode system variable controls the mode. For the same instance of data, use the same value of
initvector_ for encryption with AES_ENCRYPT() and decryption withAES_DECRYPT().
Note
If you are using a KDF, you must specify an initialization vector or a null string for this argument, in order to access the later arguments to define the KDF.
For modes that require an initialization vector, it must be 16 bytes or longer (bytes in excess of 16 are ignored). An error occurs if_initvector_ is missing. For modes that do not require an initialization vector, it is ignored and a warning is generated if_initvector_ is specified, unless you are using a KDF.
The default value for theblock_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector.
A random string of bytes to use for the initialization vector can be produced by callingRANDOM_BYTES(16).
kdfname
The name of the key derivation function (KDF) to create a key from the input keying material passed in_keystr, and other arguments as appropriate for the KDF. This optional argument is available from MySQL 8.0.30.
For the same instance of data, use the same value of
kdfname_ for encryption withAES_ENCRYPT() and decryption withAES_DECRYPT(). When you specify kdfname, you must specify initvector, using either a valid initialization vector, or a null string if the encryption mode does not require an initialization vector.
The following values are supported:
hkdf
HKDF, which is available from OpenSSL 1.1.0. HKDF extracts a pseudorandom key from the keying material then expands it into additional keys. With HKDF, you can specify an optional salt (salt) and context-specific information such as application details (info) to include in the keying material.
pbkdf2_hmac
PBKDF2, which is available from OpenSSL 1.0.2. PBKDF2 applies a pseudorandom function to the keying material, and repeats this process a large number of times to produce the key. With PBKDF2, you can specify an optional salt (salt) to include in the keying material, and set the number of iterations used to produce the key (iterations).
In this example, HKDF is specified as the key derivation function, and a salt and context information are provided. The argument for the initialization vector is included but is the empty string:

SELECT AES_ENCRYPT('mytext','mykeystring', '', 'hkdf', 'salt', 'info');  

In this example, PBKDF2 is specified as the key derivation function, a salt is provided, and the number of iterations is doubled from the recommended minimum:

SELECT AES_ENCRYPT('mytext','mykeystring', '', 'pbkdf2_hmac','salt', '2000');  

salt
A salt to be passed to the key derivation function (KDF). This optional argument is available from MySQL 8.0.30. Both HKDF and PBKDF2 can use salts, and their use is recommended to help prevent attacks based on dictionaries of common passwords or rainbow tables.
A salt consists of random data, which for security must be different for each encryption operation. A random string of bytes to use for the salt can be produced by calling RANDOM_BYTES(). This example produces a 64-bit salt:

SET @salt = RANDOM_BYTES(8);  

For the same instance of data, use the same value of_salt_ for encryption withAES_ENCRYPT() and decryption withAES_DECRYPT(). The salt can safely be stored along with the encrypted data.
info
Context-specific information for HKDF to include in the keying material, such as information about the application. This optional argument is available from MySQL 8.0.30 when you specify hkdf as the KDF name. HKDF adds this information to the keying material specified in keystr and the salt specified in_salt_ to produce the key.
For the same instance of data, use the same value of_info_ for encryption withAES_ENCRYPT() and decryption withAES_DECRYPT().
iterations
The iteration count for PBKDF2 to use when producing the key. This optional argument is available from MySQL 8.0.30 when you specify pbkdf2_hmac as the KDF name. A higher count gives greater resistance to brute-force attacks because it has a greater computational cost for the attacker, but the same is necessarily true for the key derivation process. The default if you do not specify this argument is 1000, which is the minimum recommended by the OpenSSL standard.
For the same instance of data, use the same value of_iterations_ for encryption with AES_ENCRYPT() and decryption withAES_DECRYPT().

mysql> SET block_encryption_mode = 'aes-256-cbc';  
mysql> SET @key_str = SHA2('My secret passphrase',512);  
mysql> SET @init_vector = RANDOM_BYTES(16);  
mysql> SET @crypt_str = AES_ENCRYPT('text',@key_str,@init_vector);  
mysql> SELECT CAST(AES_DECRYPT(@crypt_str,@key_str,@init_vector) AS CHAR);  
+-------------------------------------------------------------+  
| CAST(AES_DECRYPT(@crypt_str,@key_str,@init_vector) AS CHAR) |  
+-------------------------------------------------------------+  
| text                                                        |  
+-------------------------------------------------------------+  
mysql> SELECT LENGTH(COMPRESS(REPEAT('a',1000)));  
        -> 21  
mysql> SELECT LENGTH(COMPRESS(''));  
        -> 0  
mysql> SELECT LENGTH(COMPRESS('a'));  
        -> 13  
mysql> SELECT LENGTH(COMPRESS(REPEAT('a',16)));  
        -> 15  

The compressed string contents are stored the following way:

mysql> SELECT MD5('testing');  
        -> 'ae2b1fca515949e5d54fb22b8ed95575'  

This is the “RSA Data Security, Inc. MD5 Message-Digest Algorithm.”
See the note regarding the MD5 algorithm at the beginning this section.

mysql> SELECT SHA1('abc');  
        -> 'a9993e364706816aba3e25717850c26c9cd0d89d'  

SHA1() can be considered a cryptographically more secure equivalent ofMD5(). However, see the note regarding the MD5 and SHA-1 algorithms at the beginning this section.

mysql> SELECT SHA2('abc', 224);  
        -> '23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7'  

This function works only if MySQL has been configured with SSL support. See Section 8.3, “Using Encrypted Connections”.
SHA2() can be considered cryptographically more secure thanMD5() orSHA1().

mysql> SET @stmt = 'SELECT * FROM mytable WHERE cola = 10 AND colb = 20';  
mysql> SELECT STATEMENT_DIGEST(@stmt);  
+------------------------------------------------------------------+  
| STATEMENT_DIGEST(@stmt)                                          |  
+------------------------------------------------------------------+  
| 3bb95eeade896657c4526e74ff2a2862039d0a0fe8a9e7155b5fe492cbd78387 |  
+------------------------------------------------------------------+  
mysql> SELECT STATEMENT_DIGEST_TEXT(@stmt);  
+----------------------------------------------------------+  
| STATEMENT_DIGEST_TEXT(@stmt)                             |  
+----------------------------------------------------------+  
| SELECT * FROM `mytable` WHERE `cola` = ? AND `colb` = ?  |  
+----------------------------------------------------------+  
mysql> SELECT UNCOMPRESS(COMPRESS('any string'));  
        -> 'any string'  
mysql> SELECT UNCOMPRESS('any string');  
        -> NULL  
mysql> SELECT UNCOMPRESSED_LENGTH(COMPRESS(REPEAT('a',30)));  
        -> 30