UTF-1 (original) (raw)

UTF-1 is a method of transforming ISO/IEC 10646/Unicode into a stream of bytes. Its design does not provide self-synchronization, which makes searching for substrings and error recovery difficult. It reuses the ASCII printing characters for multi-byte encodings, making it unsuited for some uses (for instance Unix filenames cannot contain the byte value used for forward slash). UTF-1 is also slow to encode or decode due to its use of division and multiplication by a number which is not a power of 2. Due to these issues, it did not gain acceptance and was quickly replaced by UTF-8.

Property	Value
dbo:abstract	UTF-1 war das erste UCS Transformation Format für Unicode und ISO 10646 und wurde 1993 im Anhang G der ursprünglichen Version von ISO 10646 veröffentlicht, ist jedoch heute nicht mehr Teil dieser Norm. UTF-1 ist kompatibel zu ISO 2022. ASCII-Zeichen, C0- und C1-Steuerzeichen werden wie in ISO 8859 unverändert (1:1) kodiert. Andere Zeichen werden – über eine relativ rechenaufwändige Modulo-190-Arithmetik – als Zeichenfolgen von 2, 3 oder 5 Byte Länge kodiert. Dabei können auch ASCII-Zeichen Teil dieser Zeichenfolgen sein. Das hat den Nachteil, dass zum Beispiel der Schrägstrich in so einer Zeichenfolge enthalten sein kann, so dass diese Kodierung nicht für Dateinamen verwendet werden kann. Aufgrund dieses Nachteils wurde später eine andere Kodierung für Unicode entwickelt, welche anfangs „UTF-FSS“ (file system safe) genannt wurde und sich heute unter dem Namen UTF-8 allgemein durchgesetzt hat. (de) UTF-1 is a method of transforming ISO/IEC 10646/Unicode into a stream of bytes. Its design does not provide self-synchronization, which makes searching for substrings and error recovery difficult. It reuses the ASCII printing characters for multi-byte encodings, making it unsuited for some uses (for instance Unix filenames cannot contain the byte value used for forward slash). UTF-1 is also slow to encode or decode due to its use of division and multiplication by a number which is not a power of 2. Due to these issues, it did not gain acceptance and was quickly replaced by UTF-8. (en) UTF-1은 국제 문자 세트/유니코드를 바이트 스트림으로 변환하는 한 방법이다. 설계 상의 이유로, 디코딩이 문자 중간에 시작하면 재동기화가 불가능하며 검색 루틴은 이와 함께 신뢰성있게 사용할 수 없다. UTF-1은 또한 제곱이 아닌 수의 나누기를 사용하기 때문에 상당히 느리다. 이러한 문제로 UTF-1은 폭넓게 채택되지 못했으며 UTF-8로 대체되었다. (ko) UTF-1 é um formato de transformação de ISO 10646/Unicode em fluxos de bytes, a fim de serialização. Devido ao seu formato não é possível resincronizar se a decodificação começa no meio dum caractere (o que dificulta o truncamento) e rotinas de busca de caractere não podem ser usadas de forma confiável. Dados tais problemas, esse padrão nunca ganhou grande adoção, sendo quase que completamente substituído pelo UTF-8. (pt) UTF-1是一种将ISO 10646 / Unicode转化成字节流的方式。由于其本身的设计问题，如果自中间的一个字符开始解码，UTF-1將無法重新同步（這造成擷取的困難），而且UTF-1也沒辦法進行可靠的字节搜索。又因为UTF-1使用的除数不是2的幂，所以转化得也相当缓慢。由于以上这些问题，UTF-1从来没有得到广泛採用，并已被UTF-8所取代。 (zh)
dbo:wikiPageExternalLink	http://kikaku.itscj.ipsj.or.jp/ISO-IR/178.pdf https://web.archive.org/web/20150318032101/http:/kikaku.itscj.ipsj.or.jp/ISO-IR/178.pdf https://web.archive.org/web/20160607111732/http:/czyborra.com/utf/%23UTF-1 http://czyborra.com/utf/%23UTF-1 https://www.rfc-editor.org/rfc/rfc3629.html https://www.unicode.org/versions/Unicode1.1.0/appF.pdf
dbo:wikiPageID	2535122 (xsd:integer)
dbo:wikiPageLength	5498 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID	1092940825 (xsd:integer)
dbo:wikiPageWikiLink	dbr:UTF-8 dbr:Unicode dbr:Unicode_Transformation_Format dbr:Modulo_operation dbr:Comparison_of_Unicode_encodings dbr:Byte dbr:Byte-order_mark dbr:C0_and_C1_control_codes dbr:UCS-4 dbr:US-ASCII dbr:ASCII dbr:Hexadecimal dbc:Unicode_Transformation_Formats dbr:Binary_Ordered_Compression_for_Unicode dbr:Code_point dbr:ISO/IEC_2022 dbr:MIME dbr:Universal_Character_Set dbr:Variable-width_encoding dbr:Extended_ASCII dbr:ISO/IEC_10646 dbr:Self-synchronizing_code dbr:Substring
dbp:classification	dbr:Unicode_Transformation_Format dbr:Variable-width_encoding dbr:Extended_ASCII
dbp:encodes	ISO/IEC 10646 (en)
dbp:extends	dbr:US-ASCII
dbp:lang	International (en)
dbp:mime	ISO-10646-UTF-1 (en)
dbp:name	UTF-1 (en)
dbp:next	dbr:UTF-8
dbp:status	Obscure, of mainly historical interest. (en)
dbp:wikiPageUsesTemplate	dbt:Cite_document dbt:Cite_web dbt:Notelist dbt:Short_description dbt:Unicode_navigation dbt:Character_encoding dbt:Infobox_character_encoding
dct:subject	dbc:Unicode_Transformation_Formats
gold:hypernym	dbr:Way
rdf:type	yago:WikicatUnicodeTransformationFormats yago:Abstraction100002137 yago:Communication100033020 yago:Format106636806 yago:Information106634376 yago:Message106598915
rdfs:comment	UTF-1 is a method of transforming ISO/IEC 10646/Unicode into a stream of bytes. Its design does not provide self-synchronization, which makes searching for substrings and error recovery difficult. It reuses the ASCII printing characters for multi-byte encodings, making it unsuited for some uses (for instance Unix filenames cannot contain the byte value used for forward slash). UTF-1 is also slow to encode or decode due to its use of division and multiplication by a number which is not a power of 2. Due to these issues, it did not gain acceptance and was quickly replaced by UTF-8. (en) UTF-1은 국제 문자 세트/유니코드를 바이트 스트림으로 변환하는 한 방법이다. 설계 상의 이유로, 디코딩이 문자 중간에 시작하면 재동기화가 불가능하며 검색 루틴은 이와 함께 신뢰성있게 사용할 수 없다. UTF-1은 또한 제곱이 아닌 수의 나누기를 사용하기 때문에 상당히 느리다. 이러한 문제로 UTF-1은 폭넓게 채택되지 못했으며 UTF-8로 대체되었다. (ko) UTF-1 é um formato de transformação de ISO 10646/Unicode em fluxos de bytes, a fim de serialização. Devido ao seu formato não é possível resincronizar se a decodificação começa no meio dum caractere (o que dificulta o truncamento) e rotinas de busca de caractere não podem ser usadas de forma confiável. Dados tais problemas, esse padrão nunca ganhou grande adoção, sendo quase que completamente substituído pelo UTF-8. (pt) UTF-1是一种将ISO 10646 / Unicode转化成字节流的方式。由于其本身的设计问题，如果自中间的一个字符开始解码，UTF-1將無法重新同步（這造成擷取的困難），而且UTF-1也沒辦法進行可靠的字节搜索。又因为UTF-1使用的除数不是2的幂，所以转化得也相当缓慢。由于以上这些问题，UTF-1从来没有得到广泛採用，并已被UTF-8所取代。 (zh) UTF-1 war das erste UCS Transformation Format für Unicode und ISO 10646 und wurde 1993 im Anhang G der ursprünglichen Version von ISO 10646 veröffentlicht, ist jedoch heute nicht mehr Teil dieser Norm. UTF-1 ist kompatibel zu ISO 2022. Aufgrund dieses Nachteils wurde später eine andere Kodierung für Unicode entwickelt, welche anfangs „UTF-FSS“ (file system safe) genannt wurde und sich heute unter dem Namen UTF-8 allgemein durchgesetzt hat. (de)
rdfs:label	UTF-1 (de) UTF-1 (ko) UTF-1 (pt) UTF-1 (en) UTF-1 (zh)
owl:sameAs	freebase:UTF-1 yago-res:UTF-1 wikidata:UTF-1 dbpedia-de:UTF-1 dbpedia-ko:UTF-1 dbpedia-pt:UTF-1 dbpedia-zh:UTF-1 https://global.dbpedia.org/id/4wYs1
prov:wasDerivedFrom	wikipedia-en:UTF-1?oldid=1092940825&ns=0
foaf:isPrimaryTopicOf	wikipedia-en:UTF-1
is dbo:wikiPageDisambiguates of	dbr:UTF
is dbo:wikiPageRedirects of	dbr:CsISO10646UTF1 dbr:ISO-10646-UTF-1
is dbo:wikiPageWikiLink of	dbr:UTF-8 dbr:UTF-EBCDIC dbr:Unicode dbr:Comparison_of_Unicode_encodings dbr:CsISO10646UTF1 dbr:Binary_Ordered_Compression_for_Unicode dbr:Byte_order_mark dbr:ISO/IEC_2022 dbr:UTF dbr:Variable-width_encoding dbr:Universal_Coded_Character_Set dbr:ISO-10646-UTF-1
is dbp:prev of	dbr:UTF-8
is foaf:primaryTopic of	wikipedia-en:UTF-1