[Python-Dev] Decoding incomplete unicode (original) (raw)

M.-A. Lemburg mal at egenix.com
Thu Aug 19 12:15:49 CEST 2004


Walter Dörwald wrote:

Let's compare example uses:

1) Having feed() as part of the StreamReader API: --- s = u"???".encode("utf-8") r = codecs.getreader("utf-8")() for c in s: print r.feed(c) ---

I consider adding a .feed() method to the stream codec bad design. .feed() is something you do on a stream, not a codec.

2) Explicitely using a queue object: --- from whatever import StreamQueue

s = u"???".encode("utf-8") q = StreamQueue() r = codecs.getreader("utf-8")(q) for c in s: q.write(c) print r.read() ---

This is probably how an advanced codec writer would use the APIs to build new stream interfaces.

3) Using a special wrapper that implicitely creates a queue: ---- from whatever import StreamQueueWrapper s = u"???".encode("utf-8") r = StreamQueueWrapper(codecs.getreader("utf-8")) for c in s: print r.feed(c) ----

This could be turned into something more straight forward, e.g.

from codecs import EncodedStream

Load data

s = u"???".encode("utf-8")

Write to encoded stream (one byte at a time) and print

the read output

q = EncodedStream(input_encoding="utf-8", output_encoding="unicode") for c in s: q.write(c) print q.read()

Make sure we have processed all data:

if q.has_pending_data(): raise ValueError, 'data truncated'

I very much prefer option 1).

I prefer the above example because it's easy to read and makes things explicit.

"If the implementation is hard to explain, it's a bad idea."

The user usually doesn't care about the implementation, only it's interfaces.

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Aug 19 2004)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::



More information about the Python-Dev mailing list