[Python-Dev] PEP 461 updates (original) (raw)
Oscar Benjamin oscar.j.benjamin at gmail.com
Sat Jan 18 15:39:29 CET 2014
- Previous message: [Python-Dev] PEP 461 updates
- Next message: [Python-Dev] PEP 461 updates
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 17 January 2014 21:37, Chris Barker <chris.barker at noaa.gov> wrote:
For the record, we've got a pretty good thread (not this good, though!) over on the numpy list about how to untangle the mess that has resulted from porting text-file-parsing code to py3 (and the underlying issue with the 'S' data type in numpy...) One note from the github issue: """ The use of asbytes originates only from the fact that b'%d' % (20,) does not work. """ So yeah PEP 461! (even if too late for numpy...)
The discussion about numpy.loadtxt and the 'S' dtype is not relevant to PEP 461. PEP 461 is about facilitating handling ascii/binary protocols and file formats. The loadtxt function is for reading text files. Reading text files is already handled very well in Python 3. The only caveat is that you need to specify the encoding when you open the file.
The loadtxt function doesn't specify the encoding when it opens the file so on Python 3 it gets the system default encoding when reading from the file. Since the 'S' dtype is for an array of bytes the loadtxt function has to encode the unicode strings before storing them in the array. The function has no idea what encoding the user wants so it just uses latin-1 leading to mojibake if the file content and encoding are not compatible with latin-1 e.g.: utf-8.
The loadtxt function is a classic example of how not to do text and whoever made it that way probably didn't understand unicode and the Python 3 text model. If they did understand what they were doing then they knew that they were implementing a dirty hack.
If you want to draw a relevant lesson from that thread in this one then the lesson argues against PEP 461: adding back the bytes formatting methods helps people who refuse to understand text processing and continue implementing dirty hacks instead of doing it properly.
Oscar
- Previous message: [Python-Dev] PEP 461 updates
- Next message: [Python-Dev] PEP 461 updates
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]