msg250088 - (view) |
Author: Thomas Guettler (guettli) * |
Date: 2015-09-07 09:01 |
At the top of the htmllib module: > Deprecated since version 2.6: The htmllib module has been removed in > Python 3. Source: https://docs.python.org/2/library/htmllib.html#module-htmllib Newcomers need more advice: Which library should be used? I know there are many html parsing libraries. But there should be a sane default for newcomers. Is there already an agreement of a sane default html parsing library? |
|
|
msg250092 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-09-07 09:50 |
PEP 3108 says “Superseded by HTMLParser”. I presume this means Python 3’s “html.parser” module (called “HTMLParser” in Python 2). I guess a lot of work would be involved in changing existing code over, but it shouldn’t be much of a problem for someone writing new code. |
|
|
msg250123 - (view) |
Author: Thomas Guettler (guettli) * |
Date: 2015-09-07 19:54 |
This issue is just about documentation. No code change is required for it. How to update the docs, to point to html.parser? |
|
|
msg250125 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2015-09-07 20:07 |
If you want to create a patch, you have to edit the file Doc/library/htmllib.rst in the 2.7 branch. You can find information about cloning the CPython repository and switching branch in the devguide. The warning should suggest :mod:`HTMLParser` for Python 2 and the equivalent :mod:`html.parser` for Python 3. |
|
|
msg253098 - (view) |
Author: Nan Wu (Nan Wu) * |
Date: 2015-10-16 20:52 |
Added a small patched for this change. |
|
|
msg253274 - (view) |
Author: Berker Peksag (berker.peksag) *  |
Date: 2015-10-21 03:17 |
Thanks for the patch. I think we can move the Python 3 part of the patch to a new note directive (similar to the example in httplib documentation: https://docs.python.org/2/library/httplib.html) For example: .. deprecated:: 2.6 Use 📳`HTMLParser` instead. .. note:: The :mod:`htmllib` module has been removed in Python 3. Use :mod:`html.parser` (equivalent of 📳`HTMLParser`) instead. |
|
|
msg253279 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-10-21 08:02 |
Also beware it should be :mod: not 📳 :) |
|
|
msg253285 - (view) |
Author: Nan Wu (Nan Wu) * |
Date: 2015-10-21 12:56 |
Updated the patch. The typo was fixed too. Thanks for the catching. |
|
|
msg253533 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-10-27 12:35 |
This looks good enough to me. I would have probably avoided littering the page with too many Deprecated and Note boxes, but I can respect your and Berker’s preference to add the separate box. |
|
|
msg253541 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2015-10-27 14:24 |
The note should actually be parallel to the http one (assuming 2to3 does do the translation), rather than say "use instead", which would be incorrect advice for a python2 user :) |
|
|
msg253562 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-10-27 20:41 |
Not quite. This is a two-step deprecation: 1. “htmllib” is removed in favour of HTMLParser. The API is different, so no automatic 2to3 change would be practical. 2. HTMLParser is renamed to “html.parser”, and 2to3 handles this. This is already documented at <https://docs.python.org/2/library/htmlparser.html>. |
|
|
msg253565 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2015-10-27 21:40 |
OK, then the note should be dropped. |
|
|
msg254256 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-11-07 05:59 |
David: are you saying you like the first patch better (ignoring the markup mistakes)? |
|
|
msg254313 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2015-11-07 23:21 |
Yes, though I hadn't looked at it before this :) |
|
|
msg254582 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-11-13 02:44 |
Here is a cleaned-up version of Nan’s first patch. |
|
|
msg254586 - (view) |
Author: Berker Peksag (berker.peksag) *  |
Date: 2015-11-13 03:11 |
htmllib_deprecation_warning_3.patch looks good to me. |
|
|
msg254639 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2015-11-14 00:45 |
New changeset 7bc8f56ef1f3 by Martin Panter in branch '2.7': Issue #25017: Document that htmllib is superseded by module HTMLParser https://hg.python.org/cpython/rev/7bc8f56ef1f3 |
|
|