[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)

Tres Seaver tseaver at palladion.com
Fri Jan 8 22:19:10 CET 2010


-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

M.-A. Lemburg wrote:

Shouldn't this encoding guessing be a separate function that you call on either a file or a seekable stream ?

After all, detecting encodings is just as useful to have for non-file streams.

Other stream sources typically have out-of-band ways to signal the encoding: only when reading from the filesystem do we pretty much have to guess, and in that case the BOM / signature is the best heuristic we have. Also, some non-file streams are not seekable, and so can't be guessed via a pre-pass.

You'd then avoid having to stuff everything into a single function call and also open up the door for more complex application specific guess work or defaults.

The whole process would then have two steps: 1. guess encoding import codecs encoding = codecs.guessfileencoding(filename)

Filename is not enough information: or do you mean that API to actually open the stream?

2. open the file with the found encoding

f = open(filename, encoding=encoding) For seekable streams f, you'd have: 1. guess encoding import codecs encoding = codecs.guessstreamencoding(f) 2. wrap the stream with a reader for the found encoding readerclass = codecs.getreader(encoding) g = readerclass(f)

Tres. - --

Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktHoU4ACgkQ+gerLs4ltQ5o3QCeLOJ7J91E+5f66vhgu1BUhYh4 9UgAnR2IeCd0BCsPez8ZilGNHJfhRn3Y =SoPb -----END PGP SIGNATURE-----



More information about the Python-Dev mailing list