[Python-Dev] email package status in 3.X (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Mon Jun 21 14:20:13 CEST 2010
- Previous message: [Python-Dev] email package status in 3.X
- Next message: [Python-Dev] email package status in 3.X
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Jun 21, 2010 at 11:58 AM, P.J. Eby <pje at telecommunity.com> wrote:
At 08:08 AM 6/21/2010 +1000, Nick Coghlan wrote:
Perhaps if people could identify which specific string methods are causing problems? getitem(int) returns an integer rather than a bytestring, so anything that manipulates individual characters can't be given bytes and have it work.
It can if you use length one slices rather than simple indexing. Depending on the details, such algorithms may still fail for multi-byte codecs though.
That was one of the key differences I had in mind for a bstr type, apart from designing it to coerce normal strings to bstrs in cross-type operations, and to allow O(1) "conversion" to/from bytes.
Erk, that just sounds like a recipe for recreating the problems 2.x has in a new form.
Another randomly chosen byte/string incompatibility (Python 3.1; I don't have 3.2 handy at the moment):
os.path.join(b'x','y') Traceback (most recent call last): File "", line 1, in File "c:\Python31\lib\ntpath.py", line 161, in join if b[:1] in seps: TypeError: Type str doesn't support the buffer API os.path.join('x',b'y') Traceback (most recent call last): File "", line 1, in File "c:\Python31\lib\ntpath.py", line 161, in join if b[:1] in seps: TypeError: 'in ' requires string as left operand, not bytes Ironically, it seems to me that in trying to make the type distinction more rigid, Py3K fails in this area precisely because it is not a rigidly typed language in the Java or Haskell sense: i.e., os.path.join doesn't say, "I need two stringlike objects of the same type", not even in its docstring.
I believe it actually needs the objects to be compatible with the type of os.sep, rather than just with each other (i.e. the type restrictions on os.path.join are the same as those on os.sep.join, even though the join algorithm itself is slightly different). This restriction should be mentioned in the Py3k docstring and docs for os.path.join - if it isn't, that would be a doc bug.
At least in Java, you would either implement a "path" type with coercions from bytes and strings, or you'd have a class with overloaded methods for handling join operations on bytes and strings, respectively, thereby avoiding this whole mess.
(Alas, this little example on the 'in' operator also shows that my bstr effort would probably fail anyway, because there's no 'rcontains' (lcontains?) to allow it to override the str type's contains.)
OK, these examples convince me that the incompatibility problem is real. However, I don't think a bstr type can solve them even without the rcontains problem - it would just recreate the pain that we already have in the 2.x world.
Something that may make sense to ease the porting process is for some of these "on the boundary" I/O related string manipulation functions (such as os.path.join) to grow "encoding" keyword-only arguments. The recommended approach would be to provide all strings, but bytes could also be accepted if an encoding was specified. (If you want to mix encodings - tough, do the decoding yourself).
For the idea of avoiding excess copying of bytes through multiple encoding/decoding calls... isn't that meant to be handled at an architectural level (i.e. decode once on the way in, encode once on the way out)? Optimising the single-byte codec case by minimising data copying (possibly through creative use of PEP 3118) may be something that we want to look at eventually, but it strikes me as something of a premature optimisation at this point in time (i.e. the old adage "first get it working, then get it working fast").
Cheers, Nick.
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
- Previous message: [Python-Dev] email package status in 3.X
- Next message: [Python-Dev] email package status in 3.X
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]