[Python-3000] Question about email/generator.py (original) (raw)
Guido van Rossum guido at python.org
Tue Oct 23 21:36:16 CEST 2007
- Previous message: [Python-3000] Three new failing tests?
- Next message: [Python-3000] Question about email/generator.py
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
There's an issue in the email package that I can't resolve by myself. I described it to Barry like this:
> So in generator.py on line 291, I read: > > print(part.getpayload(decode=True), file=self) > > It turns out that part.getpayload(decode=True) returns a bytes > object, and printing a bytes object to a text file is not the right > thing to do -- in 3.0a1 it silently just prints those bytes, in 3.0a2 > it will probably print the repr() of the bytes object. Right now, it > errors out because I'm removing the encode() method on PyString > objects, and print() converts PyBytes to PyString; then the > TextIOWrapper.write() method tries to encode its argument. > > If I change this to (decode=False), all tests in the email package > pass. But is this the right fix???
I should note that this was checked in by the time Barry replied, even though it clearly was the wrong thing to do. Barry replied:
Maybe. ;) The problem is that this API is either being too smart for its own good, or not smart enough. The intent of decode=True is to return the original object encoded in the payload. So for example, if MIMEImage was used to encode some jpeg, then decode=True should return that jpeg.
The problem is that what you really want is something that's content- type aware, such that if your main type is some non-text type like image/* or audio/* or even application/octet-stream, you will almost always want a bytes object back. But text can also be encoded via charset and/or transfer-encoding, and (at least in Py2.x), you'd use the same method to get the original, unencoded text back. In that case, you definitely want the string, since that's the most natural API (i.e. you fed it a string object when you created the MIMEText, so you want a string on the way back out). This is yet another corner case where the old API doesn't really fit the new bytes/string model correctly, and of course you can (rightly!) argue we were sloppy in Py2.x but were able to (mostly) get away with it. In this /specific/ situation, generator.py:291 can only be called when the main type is text, so I think it is clearly expecting a string, even though .getpayload() will return a bytes there. Short of redesigning the API, I can think of two options. First, we can change .getpayload() to specific return a string when the main type is text and decode=True. This is ugly because the return type will depend on the content type of the message. OTOH, getpayload() is already fairly ugly here because its return type differs based on its argument, although I'd like to split this into a separate .getdecodedpayload() method. The other option is to let .getpayload() return bytes in all cases, but in generator.py:291, explicitly convert it to a string, probably using raw-unicode-escape. Because we know the main type is text here, we know that the payload must contain a string. getpayload() will return the bytes of the decoded unicode string, so raw-unicode- escape should do the right thing. That's ugly too for obvious reasons. The one thing that doesn't seem right is for decode=False to be used because should the payload be an encoded string, it won't get correctly decoded. This is part of the DecodedGenerator, which honestly is probably not much used outside the test cases. but the intent of that generator is clearly to print the decoded text parts with the non-text parts stripped and replaced by a placeholder. So I think it definitely wants decoded text payloads, otherwise there's not much point in the class. I hope that explains the situation. I'm open to any other idea -- it doesn't even have to be better. ;) I see that you made the decode=False change in svn, but that's the one solution that doesn't seem right.
At this point I (Guido) am really hoping someone will want to "own" this issue and redesign the API properly...
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-3000] Three new failing tests?
- Next message: [Python-3000] Question about email/generator.py
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]