Unicode equivalence (original) (raw)

dbo:abstract

Cet article traite des équivalences Unicode. Unicode contient de nombreux caractères. Pour maintenir la compatibilité avec des standards existants, certains d’entre eux sont équivalents à d’autres caractères ou à des séquences de caractères. Unicode fournit deux notions d’équivalence : canonique et de compatibilité, la première étant un sous-ensemble de la deuxième. Par exemple, le caractère n suivi du diacritique tilde ◌̃ est canoniquement équivalent et donc compatible avec le simple caractère Unicode ñ, tandis que la ligature typographique ff est seulement compatible (mais non canoniquement équivalente) avec la séquence de deux caractères f. La normalisation Unicode est une normalisation de texte qui transforme des caractères ou séquences de caractères en une même représentation équivalente, appelée « forme normale » dans cet article. Cette transformation est importante, car elle permet de faire des comparaisons, recherches et tris de séquences Unicode. Pour chacune des deux notions d’équivalence, Unicode définit deux formes, l’une composée, et l’autre décomposée, conduisant à quatre formes normales, abrégées NFC, NFD, NFKC et NFKD, qui seront détaillées ci-dessous et qui sont aussi décrites dans Normalisation Unicode. (fr)
Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters. Unicode provides two such notions, canonical equivalence and compatibility. Code point sequences that are defined as canonically equivalent are assumed to have the same appearance and meaning when printed or displayed. For example, the code point U+006E (the Latin lowercase "n") followed by U+0303 (the combining tilde "◌̃") is defined by Unicode to be canonically equivalent to the single code point U+00F1 (the lowercase letter "ñ" of the Spanish alphabet). Therefore, those sequences should be displayed in the same manner, should be treated in the same way by applications such as alphabetizing names or searching, and may be substituted for each other. Similarly, each Hangul syllable block that is encoded as a single character may be equivalently encoded as a combination of a leading conjoining jamo, a vowel conjoining jamo, and, if appropriate, a trailing conjoining jamo. Sequences that are defined as compatible are assumed to have possibly distinct appearances, but the same meaning in some contexts. Thus, for example, the code point U+FB00 (the typographic ligature "ﬀ") is defined to be compatible—but not canonically equivalent—to the sequence U+0066 U+0066 (two Latin "f" letters). Compatible sequences may be treated the same way in some applications (such as sorting and indexing), but not in others; and may be substituted for each other in some situations, but not in others. Sequences that are canonically equivalent are also compatible, but the opposite is not necessarily true. The standard also defines a text normalization procedure, called Unicode normalization, that replaces equivalent sequences of characters so that any two texts that are equivalent will be reduced to the same sequence of code points, called the normalization form or normal form of the original text. For each of the two equivalence notions, Unicode defines two normal forms, one fully composed (where multiple code points are replaced by single points whenever possible), and one fully decomposed (where single points are split into multiple ones). (en)
유니코드 등가성(Unicode equivalence)은 특정한 일련의 들이 반드시 동일 문자를 대표해야 하는 유니코드 문자 인코딩 표준의 사양이다. 이 기능은 비슷하거나 동일한 문자들을 포함하는 기존의 표준 문자 집합과의 호환성을 허용하기 위해 표준에 도입되었다. 유니코드는 2가지 개념을 제공하는데, 하나는 표준 형식의 등가성이고 나머지 하나는 호환성이다. 표준 형식의 등가성으로 정의되는 코드포인트 시퀀스는 인쇄와 출력을 할 때 동일한 모양과 의미를 가질 것으로 추정한다. 이를테면 코드포인트 U+006E(라틴어 소문자 "n")에 이어서 U+0303(결합 물결표 "◌̃")가 오면 하나의 코드포인트 U+00F1(스페인어 알파벳의 소문자 "ñ")과 동일하게 정의된다. 그러므로 이 시퀀스들은 동일한 방식으로 표시되어야 하고 이름의 알파벳순 배열이나 검색 등을 할 때 애플리케이션에서 동일하나 방식으로 처리되어야 한다. 이 표준은 동등한 문자 시퀀스를 대체함으로써 2개의 텍스트 중 어느 것이 와도 동일한 코드 포인트 시퀀스로 통합해주는 유니코드 정규화로 불리는 절차를 정의한다. (ko)
Unicodeには既存の標準との互換性を維持するための文字が多数存在する。それらの中には他の文字や文字の並びと機能的に等価なものが存在する。このため、Unicodeは数種類の等価性を定義している。たとえば、文字 n の後ろに結合文字 ~ を続けたものは、1つのUnicode文字 ñ と等価である。Unicodeは等価性を定義するために2つの標準を保守している。 (ja)
Unicode等價性（Unicode equivalence）是為和許多現存的標準能夠相容，Unicode（統一碼）包含了許多特殊字符。在這些字符中，有些在功能上會和其它字符或字符序列等價。因此，Unicode將一些碼位序列定義成相等的。Unicode提供了兩種等價概念：標準等價和相容等價。前者是後者的一個子集。例如，字符n後接著組合字符~標準等價和相容等價於Unicode字符ñ。而合字ﬀ則只有相容等價於兩個f字符。 Unicode正規化是文字正規化的一種形式，是指將彼此等價的序列轉成同一列序。此序列在Unicode標準中稱作正規形式。對於每種等價概念，Unicode又定義兩種形式，一種是完全合成的，一種是完全分解的。因此，最後會有四種形式，其縮寫分別為：NFC、NFD、NFKC、NFKD。對於Unicode的文字處理程式而言，正規化是很重要的。因為它影響了比較、搜尋和排序的意義。 (zh)

rdfs:comment

유니코드 등가성(Unicode equivalence)은 특정한 일련의 들이 반드시 동일 문자를 대표해야 하는 유니코드 문자 인코딩 표준의 사양이다. 이 기능은 비슷하거나 동일한 문자들을 포함하는 기존의 표준 문자 집합과의 호환성을 허용하기 위해 표준에 도입되었다. 유니코드는 2가지 개념을 제공하는데, 하나는 표준 형식의 등가성이고 나머지 하나는 호환성이다. 표준 형식의 등가성으로 정의되는 코드포인트 시퀀스는 인쇄와 출력을 할 때 동일한 모양과 의미를 가질 것으로 추정한다. 이를테면 코드포인트 U+006E(라틴어 소문자 "n")에 이어서 U+0303(결합 물결표 "◌̃")가 오면 하나의 코드포인트 U+00F1(스페인어 알파벳의 소문자 "ñ")과 동일하게 정의된다. 그러므로 이 시퀀스들은 동일한 방식으로 표시되어야 하고 이름의 알파벳순 배열이나 검색 등을 할 때 애플리케이션에서 동일하나 방식으로 처리되어야 한다. 이 표준은 동등한 문자 시퀀스를 대체함으로써 2개의 텍스트 중 어느 것이 와도 동일한 코드 포인트 시퀀스로 통합해주는 유니코드 정규화로 불리는 절차를 정의한다. (ko)
Unicodeには既存の標準との互換性を維持するための文字が多数存在する。それらの中には他の文字や文字の並びと機能的に等価なものが存在する。このため、Unicodeは数種類の等価性を定義している。たとえば、文字 n の後ろに結合文字 ~ を続けたものは、1つのUnicode文字 ñ と等価である。Unicodeは等価性を定義するために2つの標準を保守している。 (ja)
Unicode等價性（Unicode equivalence）是為和許多現存的標準能夠相容，Unicode（統一碼）包含了許多特殊字符。在這些字符中，有些在功能上會和其它字符或字符序列等價。因此，Unicode將一些碼位序列定義成相等的。Unicode提供了兩種等價概念：標準等價和相容等價。前者是後者的一個子集。例如，字符n後接著組合字符~標準等價和相容等價於Unicode字符ñ。而合字ﬀ則只有相容等價於兩個f字符。 Unicode正規化是文字正規化的一種形式，是指將彼此等價的序列轉成同一列序。此序列在Unicode標準中稱作正規形式。對於每種等價概念，Unicode又定義兩種形式，一種是完全合成的，一種是完全分解的。因此，最後會有四種形式，其縮寫分別為：NFC、NFD、NFKC、NFKD。對於Unicode的文字處理程式而言，正規化是很重要的。因為它影響了比較、搜尋和排序的意義。 (zh)
Cet article traite des équivalences Unicode. Unicode contient de nombreux caractères. Pour maintenir la compatibilité avec des standards existants, certains d’entre eux sont équivalents à d’autres caractères ou à des séquences de caractères. Unicode fournit deux notions d’équivalence : canonique et de compatibilité, la première étant un sous-ensemble de la deuxième. Par exemple, le caractère n suivi du diacritique tilde ◌̃ est canoniquement équivalent et donc compatible avec le simple caractère Unicode ñ, tandis que la ligature typographique ff est seulement compatible (mais non canoniquement équivalente) avec la séquence de deux caractères f. (fr)
Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters. (en)