| msg56154 - (view) |
Author: Robert T McQuaid (rtmq) |
Date: 2007-09-27 05:49 |
| imaplib does not run under Python 3. The following two-line python program, named testimap.py, works when run from a Windows XP system shell prompt using Python 2.5.1, but fails with Python 3.0. It appears that the logic does not follow the distinction between characters and bytes in Python 3. import imaplib mail=imaplib.IMAP4("mail.rtmq.infosathse.com") e:\python25\python testimap.py e:\python30\python testimap.py 2>f:syserr The last line produced the trace: Traceback (most recent call last): File "testimap.py", line 10, in mail=imaplib.IMAP4("mail.rtmq.infosathse.com") File "e:\python30\lib\imaplib.py", line 184, in __init__ self.welcome = self._get_response() File "e:\python30\lib\imaplib.py", line 962, in _get_response self._append_untagged(typ, dat) File "e:\python30\lib\imaplib.py", line 800, in _append_untagged if typ in ur: TypeError: unhashable type: 'bytes' |
|
|
| msg56156 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2007-09-27 06:10 |
| Would you like to work on a patch? |
|
|
| msg56163 - (view) |
Author: Raghuram Devarakonda (draghuram)  |
Date: 2007-09-27 14:39 |
| Just to further understand the issue, I added "imaplib.Debug=5" and here is the output preceding the exception stack trace(I replaced the real IMAP server name) *************** 20:19.52 imaplib version 2.58 20:19.52 new IMAP4 connection, tag=LOLD 20:19.52 < * OK Microsoft Exchange Server 2003 IMAP4rev1 server version 6.5.7638.1 (imapserver.com) ready. 20:19.52 matched r'\* (?P[A-Z-]+)( (?P.*))?' => (b'OK', b' Microsoft Exchange Server 2003 IMAP4rev1 server version 6.5.7638.1 (imapserver.com) ready.', b'Microsoft Exchange Server 2003 IMAP4rev1 server version 6.5.7638.1 (imapserver.com) ready.') *************** So it appears that the response is of type "bytes" which in turn is due to reading the socket in binary mode (self.file = self.sock.makefile('rb')). I would like to see how the problem can be fixed but any pointers are appreciated. |
|
|
| msg56193 - (view) |
Author: Raghuram Devarakonda (draghuram)  |
Date: 2007-09-28 18:41 |
| I have gone through the python-3000 discussions about similar problems in other stdlib modules (email, imghdr, sndhdr etc) and found PEP 3137 (Immutable Bytes and Mutable Buffer). Since that work is in progress, I don't think it is worthwhile to fix this problem at this point. |
|
|
| msg57242 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2007-11-08 13:53 |
| The transition is done. Can you work on a patch and maybe add some tests, too? It helps when you start Python with the -bb flag: $ ./python -bb -c 'import imaplib; imaplib.Debug=5; imaplib.IMAP4("mail.rtmq.infosathse.com")' 52:01.86 imaplib version 2.58 52:01.86 new IMAP4 connection, tag=PNFO Traceback (most recent call last): File "", line 1, in File "/home/heimes/dev/python/py3k/Lib/imaplib.py", line 184, in __init__ self.welcome = self._get_response() File "/home/heimes/dev/python/py3k/Lib/imaplib.py", line 907, in _get_response resp = self._get_line() File "/home/heimes/dev/python/py3k/Lib/imaplib.py", line 1009, in _get_line self._mesg('< %s' % line) File "/home/heimes/dev/python/py3k/Lib/warnings.py", line 62, in warn globals) File "/home/heimes/dev/python/py3k/Lib/warnings.py", line 102, in warn_explicit raise message BytesWarning: str() on a bytes instance |
|
|
| msg57254 - (view) |
Author: Raghuram Devarakonda (draghuram)  |
Date: 2007-11-08 14:59 |
| I will see what I can do but it may take a while. |
|
|
| msg57430 - (view) |
Author: Raghuram Devarakonda (draghuram)  |
Date: 2007-11-12 21:42 |
| Index: Lib/imaplib.py =================================================================== --- Lib/imaplib.py (revision 58956) +++ Lib/imaplib.py (working copy) @@ -228,7 +228,7 @@ self.port = port self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) self.sock.connect((host, port)) - self.file = self.sock.makefile('rb') + self.file = self.sock.makefile('r', encoding='ASCII', newline='') def read(self, size): ------------- This patch fixes the issue but I am not entirely sure that it is correct. I quickly looked at IMAP RFC and there does seem to be spec for CHARSET in which case, that will have to be used instead of ASCII. It requires more research and imap knowledge which I can't claim. As for the tests, we need a imap server to connect to. Perhaps, google wouldn't mind being used for this purpose? |
|
|
| msg59609 - (view) |
Author: Jean-Paul Calderone (exarkun) *  |
Date: 2008-01-09 16:17 |
| You're correct in pointing out that IMAP4 supports arbitrary encodings, so simply hard-coding ASCII is not correct. The encoding isn't connection-level, but applies to particular sequences of bytes in the connection stream. To correctly interpret the bytes as characters, decoding must be integrated with the rest of the protocol implementation. |
|
|
| msg61918 - (view) |
Author: Bill Janssen (janssen) *  |
Date: 2008-01-31 18:03 |
| IMAP doesn't really support multiple charsets (just looked at RFC 3501). There are two places where character sets other than ASCII is used. One is in the SEARCH command; there's an optional parameter which can indicate that the search strings are in a non-ASCII character set. The other is in transmission of message literals (email messages) back and forth. So probably setting the default encoding at this level isn't quite right, as you should definitely be reading raw bytes from the socket, not characters, but it isn't too far off. Looks like _command() needs a bit of work (it shouldn't try to quote bytes, only strings), and the documentation need to be improved, to say that non-ASCII search strings and message bodies should be passed as bytes encoded according to the specified CHARSET, but with those fixes it should work. Assuming that bytes are hashable in Python 3K. |
|
|
| msg71894 - (view) |
Author: Neal Norwitz (nnorwitz) *  |
Date: 2008-08-24 22:22 |
| Is this still a problem? |
|
|
| msg71989 - (view) |
Author: Ismail Donmez (donmez) * |
Date: 2008-08-26 17:50 |
| Still fails with beta2: >>> import imaplib >>> mail=imaplib.IMAP4("mail.rtmq.infosathse.com") Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.0/imaplib.py", line 185, in __init__ self.welcome = self._get_response() File "/usr/local/lib/python3.0/imaplib.py", line 912, in _get_response if self._match(self.tagre, resp): File "/usr/local/lib/python3.0/imaplib.py", line 1021, in _match self.mo = cre.match(s) TypeError: can't use a string pattern on a bytes-like object |
|
|
| msg71992 - (view) |
Author: Neal Norwitz (nnorwitz) *  |
Date: 2008-08-26 18:37 |
| This may not be a real release blocker, but I want to raise the priority. It is a regression and we should try to fix it, especially if it's easy. |
|
|
| msg72459 - (view) |
Author: Barry A. Warsaw (barry) *  |
Date: 2008-09-04 02:12 |
| This should be fixed but it's not a release blocker. |
|
|
| msg72479 - (view) |
Author: Bill Janssen (janssen) *  |
Date: 2008-09-04 04:58 |
| Take a look at the thread here: http://mailman2.u.washington.edu/mailman/htdig/imap-protocol/2008-February/000811.html I think the summary is, arbitrary bytes may occur in some places, but they're likely to be UTF-8. Otherwise, it's mainly ASCII, but purposely left vague to see what convention developed. |
|
|
| msg74731 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-10-14 11:27 |
| Here is a patch for imaplib: - add encoding attribute to IMAP4 class (as ftplib and see also issue 3727 for my poplib patch) - use makefile('r', encoding=self.encoding) instead of a binary file (mode='rb') - remove duplicate code in IMAP4_SSL I choosed ISO-8859-1 as the default charset. I tested the library on my local IMAP4 server using IMAP4 and IMAP4_SSL classes. But the library needs more unit tests as done for poplib. |
|
|
| msg74752 - (view) |
Author: Bill Janssen (janssen) *  |
Date: 2008-10-14 15:57 |
| Victor, what kind of content have you tried this with? For instance, have you passed unencoded (Content-Transfer-Encoding: binary) binary data through it, by mailing a JPEG, for instance? These things are strings really only at the application level; the data is still bytes. In addition, the use of Latin-1 goes against the explicit directives of the IMAP group, doesn't it? They're pushing UTF-8. Bill On Tue, Oct 14, 2008 at 4:27 AM, STINNER Victor <report@bugs.python.org>wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > Here is a patch for imaplib: > - add encoding attribute to IMAP4 class (as ftplib and see also issue > 3727 for my poplib patch) > - use makefile('r', encoding=self.encoding) instead of a binary file > (mode='rb') > - remove duplicate code in IMAP4_SSL > > I choosed ISO-8859-1 as the default charset. I tested the library on > my local IMAP4 server using IMAP4 and IMAP4_SSL classes. But the > library needs more unit tests as done for poplib. > > ---------- > keywords: +patch > nosy: +haypo > Added file: http://bugs.python.org/file11786/imaplib_unicode.patch > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue1210> > _______________________________________ > |
|
|
| msg74760 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-10-14 18:14 |
| IMAP_stream() is also broken because it uses os.popen2() which has been deprecated since long time and now replaced by subprocess. Here is a patch replacing os.popen2() by subprocess, but also using transparent conversion from/to unicode using io.TextIOWrapper(). |
|
|
| msg74761 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-10-14 18:21 |
| > what kind of content have you tried this with? I only tried the most basic commands like capability(). I retried with search() and... hey, search() has a charset argument!? It should reuse self.encoding. Same for sort(). Then I tried to get the content of an email but fetch(num, '(RFC822)') fails with "imaplib.abort: command: FETCH => unexpected response: 'Return-Path: <example@example.com'". RFC822 is not supported by imaplib? The test also fails with Python 2.5. |
|
|
| msg74767 - (view) |
Author: Bill Janssen (janssen) *  |
Date: 2008-10-14 19:31 |
| Maybe the first thing to do is to expand the Lib/test/test_imaplib.py file, which right now is pretty darn minimal. We really need an IMAP server somewhere to test against, with a standard library of varied messages. Perhaps Python.org is running an IMAP server? |
|
|
| msg74775 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-10-14 22:14 |
| The server can send raw 8 bits email in any charset (charset is specified in the email headers). That's why I think that it's better to keep bytes instead of the unicode conversion using a fixed charset. Each email can use a different charset. Types used in my new patch: - unicode: * IMAP commands (charset=ASCII) * untagged_responses keys (charset=ASCII) - bytes: * answer * regex * tagre attribute * untagged_responses values I chooosed to keep unicode for some variables to minimize the changes in imaplib library and to keep readable code. Patch TODO: - Remove the assert (added for quicker debugging) - Test more functions - Restore _checkquote() in _command() method or use _quote()/_checkquote() in method which need it. login() already quote the password (but why not the login?) I also wrote a patch for a "pure bytes string" version, but the patch is complex, long and the resulting module source code is hard to read. |
|
|
| msg74778 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-10-14 22:34 |
| New version of my bytes patch: - fix IMAP4_stream: use subprocess.Popen() as my previous imap_stream.patch but use bytes instead of characters - fix IMAP4_SSL: sslobj wasn't set in IMAP4_SSL.open() but used, for example, in read() method; remove duplicate method (simplify the code) - IMAP4.read(): call file.read() multiple times if the result is smaller than size (needed especially for the SSL version); FIXME: does this function raise an error of EOF or just loop forever? should we stop the loop if data is b''? |
|
|
| msg74779 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-10-14 22:43 |
| Oops, my previous patch didn't include changes to the documentation. New patch changes: - fix the documentation: os.popen2() => subprocess.Popen(); no more ssl() method: use socket() - use a buffer of 4096 bytes in read() method (as suggested in socket documentation) - break read() loop if read() returns an empty bytes string |
|
|
| msg75282 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-10-28 15:02 |
| Can anyone review my last patch (imaplib_bytes-3.patch)? |
|
|
| msg75479 - (view) |
Author: Barry A. Warsaw (barry) *  |
Date: 2008-11-03 23:58 |
| The assertion on line 813 is indented incorrectly. Please fix that. I'm concerned we really need better test coverage for this code, but it's doubtful we'll get that before 3.0 final is released. I think this is the best we're going to do, and nothing else about the code jumps out at me. Go ahead and land it after that minor fix. |
|
|
| msg75501 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-11-04 18:34 |
| Le Tuesday 04 November 2008 00:59:02 Barry A. Warsaw, vous avez écrit : > The assertion on line 813 is indented incorrectly. Please fix that. Ooops. I'm using the following command because my editor is configured to remove the trailing spaces: svn diff --diff-cmd="/usr/bin/diff" -x "-ub" The line 813 was an assertion. I added many assertions to check types (for easier debug) but there are not needed anymore (my code is bugfreee, haha, no it's a joke). The new attached patch has no more assertion. |
|
|
| msg75527 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2008-11-05 19:40 |
| Committed in r67107 |
|
|