12.9.7 Adding a User-Defined Collation for Full-Text Indexing (original) (raw)

12.9.7 Adding a User-Defined Collation for Full-Text Indexing

This section describes how to add a user-defined collation for full-text searches using the built-in full-text parser. The sample collation is like latin1_swedish_ci but treats the '-' character as a letter rather than as a punctuation character so that it can be indexed as a word character. General information about adding collations is given in Section 10.14, “Adding a Collation to a Character Set”; it is assumed that you have read it and are familiar with the files involved.

To add a collation for full-text indexing, use the following procedure. The instructions here add a collation for a simple character set, which as discussed inSection 10.14, “Adding a Collation to a Character Set”, can be created using a configuration file that describes the character set properties. For a complex character set such as Unicode, create collations using C source files that describe the character set properties.

  1. Add a collation to the Index.xml file. The permitted range of IDs for user-defined collations is given in Section 10.14.2, “Choosing a Collation ID”. The ID must be unused, so choose a value different from 1025 if that ID is already taken on your system.
<charset name="latin1">  
...  
<collation name="latin1_fulltext_ci" id="1025"/>  
</charset>  
  1. Declare the sort order for the collation in thelatin1.xml file. In this case, the order can be copied fromlatin1_swedish_ci:
<collation name="latin1_fulltext_ci">  
<map>  
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F  
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F  
20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F  
30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F  
40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F  
50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F  
60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F  
50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F  
80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F  
90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F  
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF  
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF  
41 41 41 41 5C 5B 5C 43 45 45 45 45 49 49 49 49  
44 4E 4F 4F 4F 4F 5D D7 D8 55 55 55 59 59 DE DF  
41 41 41 41 5C 5B 5C 43 45 45 45 45 49 49 49 49  
44 4E 4F 4F 4F 4F 5D F7 D8 55 55 55 59 59 DE FF  
</map>  
</collation>  
  1. Modify the ctype array inlatin1.xml. Change the value corresponding to 0x2D (which is the code for the'-' character) from 10 (punctuation) to 01 (uppercase letter). In the following array, this is the element in the fourth row down, third value from the end.
<ctype>  
<map>  
00  
20 20 20 20 20 20 20 20 20 28 28 28 28 28 20 20  
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20  
48 10 10 10 10 10 10 10 10 10 10 10 10 01 10 10  
84 84 84 84 84 84 84 84 84 84 10 10 10 10 10 10  
10 81 81 81 81 81 81 01 01 01 01 01 01 01 01 01  
01 01 01 01 01 01 01 01 01 01 01 10 10 10 10 10  
10 82 82 82 82 82 82 02 02 02 02 02 02 02 02 02  
02 02 02 02 02 02 02 02 02 02 02 10 10 10 10 20  
10 00 10 02 10 10 10 10 10 10 01 10 01 00 01 00  
00 10 10 10 10 10 10 10 10 10 02 10 02 00 02 01  
48 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10  
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10  
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01  
01 01 01 01 01 01 01 10 01 01 01 01 01 01 01 02  
02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02  
02 02 02 02 02 02 02 10 02 02 02 02 02 02 02 02  
</map>  
</ctype>  
  1. Restart the server.
  2. To employ the new collation, include it in the definition of columns that are to use it:
mysql> DROP TABLE IF EXISTS t1;  
Query OK, 0 rows affected (0.13 sec)  
mysql> CREATE TABLE t1 (  
    a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci,  
    FULLTEXT INDEX(a)  
    ) ENGINE=InnoDB;  
Query OK, 0 rows affected (0.47 sec)  
  1. Test the collation to verify that hyphen is considered as a word character:
mysql> INSERT INTO t1 VALUEs ('----'),('....'),('abcd');  
Query OK, 3 rows affected (0.22 sec)  
Records: 3  Duplicates: 0  Warnings: 0  
mysql> SELECT * FROM t1 WHERE MATCH a AGAINST ('----' IN BOOLEAN MODE);  
+------+  
| a    |  
+------+  
| ---- |  
+------+  
1 row in set (0.00 sec)