[Python-Dev] Decoding incomplete unicode (original) (raw)
M.-A. Lemburg mal at egenix.com
Wed Aug 18 10:36:06 CEST 2004
- Previous message: [Python-Dev] Decoding incomplete unicode
- Next message: [Python-Dev] Decoding incomplete unicode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
I've thought about this some more. Perhaps I'm still missing something, but wouldn't it be possible to add a feeding mode to the existing stream codecs by creating a new queue data type (much like the queue you have in the test cases of your patch) and using the stream codecs on these ? Here is the problem. In UTF-8, how does the actual algorithm tell (the application) that the bytes it got on decoding provide for three fully decodable characters, and that 2 bytes are left undecoded, and that those bytes are not inherently ill-formed, but lack a third byte to complete the multi-byte sequence?
This state can be stored in the stream codec instance, e.g. by using a special state object that is stored in the instance and passed to the encode/decode APIs of the codec or by implementing the stream codec itself in C.
We do need to extend the API between the stream codec and the encode/decode functions, no doubt about that. However, this is an extension that is well hidden from the user of the codec and won't break code.
On top of that, you can implement whatever queuing or streaming APIs you want, but you need an efficient way to communicate incompleteness.
Agreed.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, Aug 18 2004)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
- Previous message: [Python-Dev] Decoding incomplete unicode
- Next message: [Python-Dev] Decoding incomplete unicode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]