msg120158 - (view) |
Author: Julien ÉLIE (jelie) |
Date: 2010-11-01 19:58 |
Following the first example of the documentation: import nntplib s = nntplib.NNTP('news.trigofacile.com') resp, count, first, last, name = s.group('fr.comp.lang.python') print('Group', name, 'has', count, 'articles, range', first, 'to', last) resp, overviews = s.over((last - 9, last)) for id, over in overviews: print(id, nntplib.decode_header(over['subject'])) s.quit() An exception is raised: "OVER/XOVER response doesn't include names of additional headers" I believe the issue comes from the fact that the source code does not handle the case described in Section 8.3.2 of RFC 3977: For all fields, the value is processed by first removing all CRLF pairs (that is, undoing any folding and removing the terminating CRLF) and then replacing each TAB with a single space. If there is no such header in the article, no such metadata item, or no header or item stored in the database for that article, the corresponding field MUST be empty. Example of a successful retrieval of overview information for a range of articles: [C] GROUP misc.test [S] 211 1234 3000234 3002322 misc.test [C] OVER 3000234-3000240 [S] 224 Overview information follows [S] 3000234|I am just a test article |
"Demo User" <nobody@example.com> |
6 Oct 1998 04:38:40 -0500 |
msg120272 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-11-02 22:41 |
I am wondering how to return the corresponding information. Should the field be totally absent from the returned dictionary, should it map to the empty string, or should it map to None? I'm leaning towards the latter (map to None), but perhaps the empty string is better? |
|
|
msg120275 - (view) |
Author: Julien ÉLIE (jelie) |
Date: 2010-11-02 22:45 |
The empty string would mean the header exists, and is empty (though not RFC-compliant). For instance: "User-Agent: \r\n" I believe None is better. |
|
|
msg120278 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-11-02 22:52 |
Here is a patch for returning None on absent fields. (works with trigofacile.com) |
|
|
msg120282 - (view) |
Author: Julien ÉLIE (jelie) |
Date: 2010-11-02 23:05 |
OK, thanks. By the way, why is the token stripped? token = token[len(h):].lstrip(" ") "X-Header: test \r\n" in an header is kept in the overview as-is. I do not see why " test " should not be the value returned. Also, with: token = token or None "X-Header: \r\n" becomes None if I understand how the source code works... Yet, it is a real '', not None. |
|
|
msg120283 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-11-02 23:08 |
> OK, thanks. > By the way, why is the token stripped? > token = token[len(h):].lstrip(" ") > > "X-Header: test \r\n" in an header is kept in the overview as-is. > I do not see why " test " should not be the value returned. It's a simple way of handling "Xref: foo" and returning "foo" rather than " foo". If spaces are supposed to be significant I can just strip the first one, though. > Also, with: > token = token or None > > "X-Header: \r\n" becomes None if I understand how the source code > works... Yet, it is a real '', not None. Er, so you're disagreeing with your previous message? Or am I missing something? :) |
|
|
msg120286 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-11-02 23:18 |
Here is a patch trying to better handle whitespace. Would it be ok for you? |
|
|
msg120287 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-11-02 23:19 |
Oops, sorry. |
|
|
msg120299 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-11-03 02:00 |
My conclusion in working on the email package is that only the first space after the ':', if it exists, should be stripped. That is, even though the RFC (for email) reads as if the space after the colon is part of the value, in practice it is part of the delimiter, but is optional (and almost always present, in email). Whether additional leading spaces are significant depends on why they are there. Since they are an unusual case, I would choose to preserve them on the theory that someone might care, and that someone who doesn't care can strip them. |
|
|
msg120332 - (view) |
Author: Julien ÉLIE (jelie) |
Date: 2010-11-03 17:55 |
> Er, so you're disagreeing with your previous message? > Or am I missing something? :) I was saying that if an empty string is returned, then it means that the header exists and is empty. An example was "User-Agent: \r\n". And my remark "I believe None is better." concerned your initial question "Should the field be totally absent [...]" regarding how to deal with a header that does not exist. Therefore, "User-Agent: \r\n" becomes a real '', not None. None is only when the User-Agent: header field is absent from the headers. > Here is a patch trying to better handle whitespace. > Would it be ok for you? Yes Antoine, thanks! |
|
|
msg120333 - (view) |
Author: Julien ÉLIE (jelie) |
Date: 2010-11-03 18:01 |
> My conclusion in working on the email package is that only > the first space after the ':', if it exists, should be stripped. > That is, even though the RFC (for email) reads as if the space > after the colon is part of the value, in practice it is part > of the delimiter, but is optional (and almost always present, > in email). That is why the RFC (for netnews) explicitly mentions that the space after the colon is not part of the value. See the grammar for OVER in RFC 3977: hdr-n-content = [(header-name ":" / metadata-name) SP hdr-content] So yes, only the first space should be stripped. |
|
|
msg120335 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-11-03 18:19 |
Ok, committed in r86139. |
|
|