[Python-Dev] Decoding incomplete unicode (original) (raw)

M.-A. Lemburg mal at egenix.com
Thu Aug 19 22:09:09 CEST 2004


Walter Dörwald wrote:

M.-A. Lemburg wrote:

Walter Dörwald wrote:

Let's compare example uses:

1) Having feed() as part of the StreamReader API: --- s = u"???".encode("utf-8") r = codecs.getreader("utf-8")() for c in s: print r.feed(c) --- I consider adding a .feed() method to the stream codec bad design. .feed() is something you do on a stream, not a codec. I don't care about the name, we can call it statefuldecodebytechunk() or whatever. (In fact I'd prefer to call it decode(), but that name is already taken by another method. Of course we could always rename decode() to internaldecode() like Martin suggested.)

It's not that name that doesn't fit, it's the fact that you are mixing a stream action into a codec which I'd rather see well separated.

2) Explicitely using a queue object: --- from whatever import StreamQueue

s = u"???".encode("utf-8") q = StreamQueue() r = codecs.getreader("utf-8")(q) for c in s: q.write(c) print r.read() --- This is probably how an advanced codec writer would use the APIs to build new stream interfaces. > 3) Using a special wrapper that implicitely creates a queue: ---- from whatever import StreamQueueWrapper s = u"???".encode("utf-8") r = StreamQueueWrapper(codecs.getreader("utf-8")) for c in s: print r.feed(c) ----

This could be turned into something more straight forward, e.g. from codecs import EncodedStream # Load data s = u"???".encode("utf-8") # Write to encoded stream (one byte at a time) and print # the read output q = EncodedStream(inputencoding="utf-8", outputencoding="unicode") This is confusing, because there is no encoding named "unicode". This should probably read: q = EncodedQueue(encoding="utf-8", errors="strict")

Fine.

I was thinking of something similar to EncodedFile() which also has two separate encodings, one for the file side of things and one for the Python side.

for c in s: q.write(c) print q.read()

# Make sure we have processed all data: if q.haspendingdata(): raise ValueError, 'data truncated' This should be the job of the error callback, the last part should probably be: for c in s: q.write(c) print q.read() print q.read(final=True)

Ok; both methods have their use cases. (You seem to be obsessed with this final argument ;-)

I very much prefer option 1).

I prefer the above example because it's easy to read and makes things explicit. "If the implementation is hard to explain, it's a bad idea." The user usually doesn't care about the implementation, only it's interfaces. Bye, Walter Dörwald


Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mal%40egenix.com

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Aug 19 2004)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::



More information about the Python-Dev mailing list