[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)

Eric Smith eric at trueblade.com
Fri Jan 8 22:40:47 CET 2010


Shouldn't this encoding guessing be a separate function that you call on either a file or a seekable stream ?

After all, detecting encodings is just as useful to have for non-file streams. Other stream sources typically have out-of-band ways to signal the encoding: only when reading from the filesystem do we pretty much have to guess, and in that case the BOM / signature is the best heuristic we have. Also, some non-file streams are not seekable, and so can't be guessed via a pre-pass.

But what if the file were in (for example) a zip file? I think you definitely want to have access to this functionality outside of open().

Eric.



More information about the Python-Dev mailing list