[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)

Tres Seaver tseaver at palladion.com
Fri Jan 8 22:59:04 CET 2010


-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Eric Smith wrote:

Shouldn't this encoding guessing be a separate function that you call on either a file or a seekable stream ?

After all, detecting encodings is just as useful to have for non-file streams. Other stream sources typically have out-of-band ways to signal the encoding: only when reading from the filesystem do we pretty much have to guess, and in that case the BOM / signature is the best heuristic we have. Also, some non-file streams are not seekable, and so can't be guessed via a pre-pass. But what if the file were in (for example) a zip file? I think you definitely want to have access to this functionality outside of open().

If the application expects a possibly-BOM-signature-marked file, but you pass it mismatched garbage:

f = open('some.zip', encoding='BOM")

the error handling should be the same as if you passed any other mismatched encoding:

f = open('some.zip', encoding='UTF8')

i.e., you discover the error when you try to read from the (non)encoded stream, not when you open it.

Tres. - --

Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktHqpwACgkQ+gerLs4ltQ7uAACeKEc+WT4TASGcVl1Hfqe6L9La I6EAn1pJtngtLWPdothGbYB+zUabEvTW =TjBK -----END PGP SIGNATURE-----



More information about the Python-Dev mailing list