[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)

Victor Stinner victor.stinner at haypocalc.com
Fri Jan 8 23:10:32 CET 2010


Le vendredi 08 janvier 2010 22:40:47, Eric Smith a écrit :

>> Shouldn't this encoding guessing be a separate function that you call >> on either a file or a seekable stream ? >> >> After all, detecting encodings is just as useful to have for non-file >> streams. > > Other stream sources typically have out-of-band ways to signal the > encoding: only when reading from the filesystem do we pretty much > have to guess, and in that case the BOM / signature is the best > heuristic we have. Also, some non-file streams are not seekable, and so > can't be guessed via a pre-pass.

But what if the file were in (for example) a zip file? I think you definitely want to have access to this functionality outside of open().

FYI my patch (encoding="BOM") is implemented in TextIOWrapper, and TextIOWrapper takes a binary stream as input, not a filename.

-- Victor Stinner http://www.haypocalc.com/



More information about the Python-Dev mailing list