Issue 975556: HTMLParser lukewarm on bogus bare attribute chars (original) (raw)

I tripped over the same problem mentioned in bug #921657 (HTMLParser.py), except that my bogus attribute char is '|' instead of '@'.

May I suggest that HTMLParser either require strict compliance with the HTML spec, or alternatively that it accept everything reasonable? The latter approach would be much more useful, and it would also be valuable to have this decision documented.

In particular, 'attrfind' needs to be changed to accept (following the '=\s*') something like the subpattern given for 'locatestarttagend' (see the "bare value" line).