Unicode Character Database (original) (raw)

About the Unicode Character Database

The Unicode Character Database (UCD) consists of a number of data files listing Unicode character properties and related data. It also includes data files containing test data for conformance to several important Unicode algorithms. Full documentation for the UCD can be found in Unicode Standard Annex #44, Unicode Character Database.

Latest Version of the Unicode Character Database

All files for the most up-to-date version of the Unicode Character Database can be found at:https://www.unicode.org/Public/UCD/latest/.

Files in the UCD/latest/ subdirectories are unversioned: they do not contain any version indicator in their file name. However, most of the data files contain a file header in a standard format, which indicates the Unicode version and the date of last revision of that file.

The latest version of the Unicode Standard, which corresponds to the latest version of the UCD, can be found at:https://www.unicode.org/versions/latest/.

Specific Versions of the UCD

Each specific version of the UCD is available for archival access in a versioned directory. For example, the UCD for Unicode 14.0 specifically is available at:
https://www.unicode.org/Public/14.0.0/

The UCD for Unicode 13.0 is available at:
https://www.unicode.org/Public/13.0.0/and so on for each earlier version of the standard.

For access to versions of the UCD earlier than Version 4.1, the structure of the archival directories differed somewhat. For full details, seeUnicode Standard Annex #44, Unicode Character Database.

A comprehensive list of the exact data files that make up a given version of the UCD can be found in the component lists at Enumerated Versions of the Unicode Standard.

The UCD in XML

The contents of each version of the UCD is also available in XML format. The XML files are in zipped format and are stored in a subdirectory for each version. For example, the XML version of UCD Version 14.0 can be found in:
https://www.unicode.org/Public/14.0.0/ucdxml/

Full documentation about the XML versions of the UCD can be found in Unicode Standard Annex #42, Unicode Character Database in XML.

BETA Versions

During periods when a preliminary (beta) version of the standard is being released for public comment Public Beta files are available. For more information about any ongoing public betas see the BETA notice as well as Public Review Issues.

FTP Access

All files and directories in the Unicode Character Database are accessible both via HTTPS and FTP. For FTP access use an FTP client and anonymous access.

For example, to access the contents of https://www.unicode.org/Public/UCD/latest/ by FTP, point an FTP client to www.unicode.org as the host, and /Public/UCD/latest as the path.