gh-135661: Fix parsing start and end tags in HTMLParser by serhiy-storchaka · Pull Request #135930 · python/cpython (original) (raw)
Whitespaces no longer accepted between
</and the tag name. E.g.</ script>does not end the script section.Vertical tabulation (
\v) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are\t\n\r\f.Null character (U+0000) no longer ends the tag name.
End tag can have attributes and slashes after tag name. It no longer ends after the first
>in quoted attribute value. E.g.</script/foo=">"/>.Multiple slashes and whitespaces between the last attribute and closing
>are now accepted in both start and end tags. E.g.<a foo=bar/ //>.Multiple
=between attribute name and value are no longer collapsed. E.g.<a foo==bar>produces attribute "foo" with value "=bar".Whitespaces between the
=separator and attribute name or value are no longer ignored. E.g.<a foo =bar>produces two attributes "foo" and "=bar", both with value None;<a foo= bar>produces two attributes: "foo" with value "" and "bar" with value None.
Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
…ng to the HTML5 standard (pythonGH-135930)
Whitespaces no longer accepted between
</and the tag name. E.g.</ script>does not end the script section.Vertical tabulation (
\v) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are\t\n\r\f.Null character (U+0000) no longer ends the tag name.
Attributes and slashes after the tag name in end tags are now ignored, instead of terminating after the first
>in quoted attribute value. E.g.</script/foo=">"/>.Multiple slashes and whitespaces between the last attribute and closing
>are now ignored in both start and end tags. E.g.<a foo=bar/ //>.Multiple
=between attribute name and value are no longer collapsed. E.g.<a foo==bar>produces attribute "foo" with value "=bar".Whitespaces between the
=separator and attribute name or value are no longer ignored. E.g.<a foo =bar>produces two attributes "foo" and "=bar", both with value None;<a foo= bar>produces two attributes: "foo" with value "" and "bar" with value None.Fix Sphinx errors.
Apply suggestions from code review
Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
Address review comments.
Move to Security.
(cherry picked from commit 0243f97)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
…ng to the HTML5 standard (pythonGH-135930)
Whitespaces no longer accepted between
</and the tag name. E.g.</ script>does not end the script section.Vertical tabulation (
\v) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are\t\n\r\f.Null character (U+0000) no longer ends the tag name.
Attributes and slashes after the tag name in end tags are now ignored, instead of terminating after the first
>in quoted attribute value. E.g.</script/foo=">"/>.Multiple slashes and whitespaces between the last attribute and closing
>are now ignored in both start and end tags. E.g.<a foo=bar/ //>.Multiple
=between attribute name and value are no longer collapsed. E.g.<a foo==bar>produces attribute "foo" with value "=bar".Whitespaces between the
=separator and attribute name or value are no longer ignored. E.g.<a foo =bar>produces two attributes "foo" and "=bar", both with value None;<a foo= bar>produces two attributes: "foo" with value "" and "bar" with value None.Fix Sphinx errors.
Apply suggestions from code review
Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
Address review comments.
Move to Security.
(cherry picked from commit 0243f97)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
Pranjal095 pushed a commit to Pranjal095/cpython that referenced this pull request
…ng to the HTML5 standard (pythonGH-135930)
Whitespaces no longer accepted between
</and the tag name. E.g.</ script>does not end the script section.Vertical tabulation (
\v) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are\t\n\r\f.Null character (U+0000) no longer ends the tag name.
Attributes and slashes after the tag name in end tags are now ignored, instead of terminating after the first
>in quoted attribute value. E.g.</script/foo=">"/>.Multiple slashes and whitespaces between the last attribute and closing
>are now ignored in both start and end tags. E.g.<a foo=bar/ //>.Multiple
=between attribute name and value are no longer collapsed. E.g.<a foo==bar>produces attribute "foo" with value "=bar".Whitespaces between the
=separator and attribute name or value are no longer ignored. E.g.<a foo =bar>produces two attributes "foo" and "=bar", both with value None;<a foo= bar>produces two attributes: "foo" with value "" and "bar" with value None.Fix Sphinx errors.
Apply suggestions from code review
Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
Address review comments.
Move to Security.
Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
picnixz pushed a commit to picnixz/cpython that referenced this pull request
…ng to the HTML5 standard (pythonGH-135930)
Whitespaces no longer accepted between
</and the tag name. E.g.</ script>does not end the script section.Vertical tabulation (
\v) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are\t\n\r\f.Null character (U+0000) no longer ends the tag name.
Attributes and slashes after the tag name in end tags are now ignored, instead of terminating after the first
>in quoted attribute value. E.g.</script/foo=">"/>.Multiple slashes and whitespaces between the last attribute and closing
>are now ignored in both start and end tags. E.g.<a foo=bar/ //>.Multiple
=between attribute name and value are no longer collapsed. E.g.<a foo==bar>produces attribute "foo" with value "=bar".Whitespaces between the
=separator and attribute name or value are no longer ignored. E.g.<a foo =bar>produces two attributes "foo" and "=bar", both with value None;<a foo= bar>produces two attributes: "foo" with value "" and "bar" with value None.Fix Sphinx errors.
Apply suggestions from code review
Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
Address review comments.
Move to Security.
Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request
…=" separator in HTMLParser
This fixes a regression introduced in pythonGH-135930.
ambv pushed a commit that referenced this pull request
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
…=" separator in HTMLParser (pythonGH-136908)
This fixes a regression introduced in pythonGH-135930. (cherry picked from commit dee6501)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
This was referenced
Jul 21, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
…=" separator in HTMLParser (pythonGH-136908)
This fixes a regression introduced in pythonGH-135930. (cherry picked from commit dee6501)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
…=" separator in HTMLParser (pythonGH-136908)
This fixes a regression introduced in pythonGH-135930. (cherry picked from commit dee6501)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
…=" separator in HTMLParser (pythonGH-136908)
This fixes a regression introduced in pythonGH-135930. (cherry picked from commit dee6501)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
…=" separator in HTMLParser (pythonGH-136908)
This fixes a regression introduced in pythonGH-135930. (cherry picked from commit dee6501)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request
…d the "=" separator in HTMLParser (pythonGH-136908)
This fixes a regression introduced in pythonGH-135930. (cherry picked from commit dee6501)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
ambv pushed a commit that referenced this pull request
…"=" separator in HTMLParser (GH-136908) (GH-136918)
This fixes a regression introduced in GH-135930. (cherry picked from commit dee6501)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
ambv pushed a commit that referenced this pull request
…"=" separator in HTMLParser (GH-136908) (GH-136919)
This fixes a regression introduced in GH-135930. (cherry picked from commit dee6501)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
ambv pushed a commit that referenced this pull request
…"=" separator in HTMLParser (GH-136908) (GH-136920)
This fixes a regression introduced in GH-135930. (cherry picked from commit dee6501)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
ambv pushed a commit that referenced this pull request
…"=" separator in HTMLParser (GH-136908) (GH-136921)
This fixes a regression introduced in GH-135930. (cherry picked from commit dee6501)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
ambv pushed a commit that referenced this pull request
…=" separator in HTMLParser (GH-136908) (GH-136922)
This fixes a regression introduced in GH-135930. (cherry picked from commit dee6501)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
gentoo-bot pushed a commit to gentoo/cpython that referenced this pull request
…ccording to the HTML5 standard (pythonGH-135930) (pythonGH-136268) (python#136293)
Whitespaces no longer accepted between
</and the tag name. E.g.</ script>does not end the script section.Vertical tabulation (
\v) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are\t\n\r\f.Null character (U+0000) no longer ends the tag name.
Attributes and slashes after the tag name in end tags are now ignored, instead of terminating after the first
>in quoted attribute value. E.g.</script/foo=">"/>.Multiple slashes and whitespaces between the last attribute and closing
>are now ignored in both start and end tags. E.g.<a foo=bar/ //>.Multiple
=between attribute name and value are no longer collapsed. E.g.<a foo==bar>produces attribute "foo" with value "=bar".Whitespaces between the
=separator and attribute name or value are no longer ignored. E.g.<a foo =bar>produces two attributes "foo" and "=bar", both with value None;<a foo= bar>produces two attributes: "foo" with value "" and "bar" with value None.Fix data loss after unclosed script or style tag (pythongh-86155).
Also backport test.support.subTests() (pythongh-135120).
(cherry picked from commit 0243f97) (cherry picked from commit c555f88)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com Co-authored-by: Ezio Melotti ezio.melotti@gmail.com Co-authored-by: Waylan Limberg waylan.limberg@icloud.com Signed-off-by: Michał Górny mgorny@gentoo.org
gentoo-bot pushed a commit to gentoo/cpython that referenced this pull request
… the "=" separator in HTMLParser (pythonGH-136908) (pythonGH-136922)
This fixes a regression introduced in pythonGH-135930. (cherry picked from commit dee6501)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com Signed-off-by: Michał Górny mgorny@gentoo.org
taegyunkim pushed a commit to taegyunkim/cpython that referenced this pull request
…ng to the HTML5 standard (pythonGH-135930)
Whitespaces no longer accepted between
</and the tag name. E.g.</ script>does not end the script section.Vertical tabulation (
\v) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are\t\n\r\f.Null character (U+0000) no longer ends the tag name.
Attributes and slashes after the tag name in end tags are now ignored, instead of terminating after the first
>in quoted attribute value. E.g.</script/foo=">"/>.Multiple slashes and whitespaces between the last attribute and closing
>are now ignored in both start and end tags. E.g.<a foo=bar/ //>.Multiple
=between attribute name and value are no longer collapsed. E.g.<a foo==bar>produces attribute "foo" with value "=bar".Whitespaces between the
=separator and attribute name or value are no longer ignored. E.g.<a foo =bar>produces two attributes "foo" and "=bar", both with value None;<a foo= bar>produces two attributes: "foo" with value "" and "bar" with value None.Fix Sphinx errors.
Apply suggestions from code review
Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
Address review comments.
Move to Security.
Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
taegyunkim pushed a commit to taegyunkim/cpython that referenced this pull request
Agent-Hellboy pushed a commit to Agent-Hellboy/cpython that referenced this pull request
…ng to the HTML5 standard (pythonGH-135930)
Whitespaces no longer accepted between
</and the tag name. E.g.</ script>does not end the script section.Vertical tabulation (
\v) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are\t\n\r\f.Null character (U+0000) no longer ends the tag name.
Attributes and slashes after the tag name in end tags are now ignored, instead of terminating after the first
>in quoted attribute value. E.g.</script/foo=">"/>.Multiple slashes and whitespaces between the last attribute and closing
>are now ignored in both start and end tags. E.g.<a foo=bar/ //>.Multiple
=between attribute name and value are no longer collapsed. E.g.<a foo==bar>produces attribute "foo" with value "=bar".Whitespaces between the
=separator and attribute name or value are no longer ignored. E.g.<a foo =bar>produces two attributes "foo" and "=bar", both with value None;<a foo= bar>produces two attributes: "foo" with value "" and "bar" with value None.Fix Sphinx errors.
Apply suggestions from code review
Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
Address review comments.
Move to Security.
Co-authored-by: Ezio Melotti ezio.melotti@gmail.com
Agent-Hellboy pushed a commit to Agent-Hellboy/cpython that referenced this pull request
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})