[Python-Dev] csv module TODO list (original) (raw)

Andrew McNamara andrewm at object-craft.com.au
Wed Jan 5 10:34:14 CET 2005


Andrew McNamara wrote:

There's a bunch of jobs we (CSV module maintainers) have been putting off - attached is a list (in no particular order): * unicode support (this will probably uglify the code considerably). Martin v. Löwis wrote: Can you please elaborate on that? What needs to be done, and how is that going to be done? It might be possible to avoid considerable uglification.

I'm not altogether sure there. The parsing state machine is all written in C, and deals with signed chars - I expect we'll need two versions of that (or one version that's compiled twice using pre-processor macros). Quite a large job. Suggestions gratefully received.

M.-A. Lemburg wrote:

Indeed. The trick is to convert to Unicode early and to use Unicode literals instead of string literals in the code.

Yes, although it would be nice to also retain the 8-bit versions as well.

Note that the only real-life Unicode format in use is UTF-16 (with BOM mark) written by Excel. Note that there's no standard for specifying the encoding in CSV files, so this is also the only feasable format.

Yes - that's part of the problem I hadn't really thought about yet - the csv module currently interacts directly with files as iterators, but it's clear that we'll need to decode as we go.

-- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/



More information about the Python-Dev mailing list