[I18n-sig] Re: [Python-Dev] Unicode debate (original) (raw)

Tue, 2 May 2000 08:59:03 +0200

# example 1

u =3D aUnicodeStringFromSomewhere
s =3D an8bitStringFromSomewhere

DoSomething(s + u)
# example 2

u =3D aUnicodeStringFromSomewhere
s =3D an8bitStringFromSomewhere

if len(u) + len(s) =3D=3D len(u + s):
    print "true"
else:
    print "not true"
A parsed entity contains text, a sequence of characters,
which may represent markup or character data.

A character is an atomic unit of text as specified by
ISO/IEC 10646.
Each external parsed entity in an XML document may
use a different encoding for its characters. All XML
processors must be able to read entities in either
UTF-8 or UTF-16.=20

Entities encoded in UTF-16 must begin with the Byte
Order Mark /.../ XML processors must be able to use
this character to differentiate between UTF-8 and
UTF-16 encoded documents.

Parsed entities which are stored in an encoding other
than UTF-8 or UTF-16 must begin with a text declaration
containing an encoding declaration.