Issue 1504333: sgmllib should allow angle brackets in quoted values (original) (raw)

Created on 2006-06-11 12:58 by rubys, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
sgmllib.patch	rubys,2006-06-11 20:23	better patch
sgmllib_2008-03-08.patch	barnabas79,2008-03-09 05:01	patch to allow angle brackets, newlines in quoted attributes

Messages (7)
msg28773 - (view)	Author: Sam Ruby (rubys)	Date: 2006-06-11 12:58
Real live example (search for "other corrections") http://latticeqcd.blogspot.com/2006/05/non-relativistic-qcd.html This addresses the following (included in the file): # XXX The following should skip matching quotes (' or ")
msg28774 - (view)	Author: Fred Drake (fdrake)	Date: 2006-06-29 17:17
Logged In: YES user_id=3066 I checked in a modified version of this patch: changed to use separate REs for start and end tags to reduce matching cost for end tags; extended tests; updated to avoid breaking previous changes to support IPv6 addresses in unquoted attribute values. Committed as revisions 47154 (trunk) and 47155 (release24-maint).
msg28775 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2006-09-11 04:26
Logged In: YES user_id=33168 I reverted the patch and added the test case for sgml so the infinite loop doesn't recur. This was mentioned several times on python-dev. Committed revision 51854. (head) Committed revision 51850. (2.5) Committed revision 51853. (2.4)
msg28776 - (view)	Author: Haejoong Lee (haepal)	Date: 2007-01-11 18:01
Could someone check if the following patch fixes the problem? This patch was made against revision 51854. --- sgmllib.py.org 2006-11-06 02:31:12.000000000 -0500 +++ sgmllib.py 2007-01-11 12:39:30.000000000 -0500 @@ -16,6 +16,35 @@ # Regular expressions used for parsing +class MyMatch: + def __init__(self, i): + self._i = i + def start(self, i): + return self._i + +class EndBracket: + def search(self, data, index): + s = data[index:] + bs = None + quote = None + for i,c in enumerate(s): + if bs: + bs = False + else: + if c == '<' or c == '>': + if quote is None: + break + elif c == "'" or c == '"': + if c == quote: + quote = None + else: + quote = c + elif c == '\\': + bs = True + else: + return None + return MyMatch(i+index) + interesting = re.compile('[&<]') incomplete = re.compile('&([a-zA-Z][a-zA-Z0-9]\|#[0-9])?	' '<([a-zA-Z][^<>]*	' @@ -29,7 +58,8 @@ shorttagopen = re.compile('<[a-zA-Z][-.a-zA-Z0-9]/') shorttag = re.compile('<([a-zA-Z][-.a-zA-Z0-9])/([^/])/') piclose = re.compile('>') -endbracket = re.compile('[<>]') +#endbracket = re.compile('[<>]') +endbracket = EndBracket() tagfind = re.compile('[a-zA-Z][-_.a-zA-Z0-9]') attrfind = re.compile( r'\s([a-zA-Z_][-:.a-zA-Z_0-9])(\s=\s'
msg28777 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2007-01-12 06:04
You should be able to check yourself. Use the current version of Python, apply the test case from the original patch and your patch to the code. If the test passes, I'll be happy to check in the fix. If that does work, please create a new patch with your code and the test case from the original patch.
msg63409 - (view)	Author: Paul Molodowitch (barnabas79)	Date: 2008-03-09 05:01
Patch for sgmllib.py (and test_sgmllib.py) Correctly parses quoted attribute - allowing for brackets, newlines, etc within attributes - implemented by altering the loop which finds attributes within parse_starttag so it checks for open-ended quotes, and makes sure any closing brackets it finds are not within quotes In test_sgmllib, added the test case from the original patch, as well as re-enabling two other test cases, which both work now
msg114668 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010-08-22 10:40
sgmllib has been deprecated since 2.6 and has been removed from py3k.

History
Date	User	Action	Args
2022-04-11 14:56:18	admin	set	github: 43487
2010-08-22 10:40:37	BreamoreBoy	set	status: open -> closednosy: + BreamoreBoymessages: + resolution: out of date
2010-07-29 14:02:30	georg.brandl	link	issue745002 superseder
2008-03-09 05:01:25	barnabas79	set	files: + sgmllib_2008-03-08.patchnosy: + barnabas79messages: + keywords: + patch
2006-06-11 12:58:36	rubys	create