msg28773 - (view) |
Author: Sam Ruby (rubys) |
Date: 2006-06-11 12:58 |
Real live example (search for "other corrections") http://latticeqcd.blogspot.com/2006/05/non-relativistic-qcd.html This addresses the following (included in the file): # XXX The following should skip matching quotes (' or ") |
|
|
msg28774 - (view) |
Author: Fred Drake (fdrake)  |
Date: 2006-06-29 17:17 |
Logged In: YES user_id=3066 I checked in a modified version of this patch: changed to use separate REs for start and end tags to reduce matching cost for end tags; extended tests; updated to avoid breaking previous changes to support IPv6 addresses in unquoted attribute values. Committed as revisions 47154 (trunk) and 47155 (release24-maint). |
|
|
msg28775 - (view) |
Author: Neal Norwitz (nnorwitz) *  |
Date: 2006-09-11 04:26 |
Logged In: YES user_id=33168 I reverted the patch and added the test case for sgml so the infinite loop doesn't recur. This was mentioned several times on python-dev. Committed revision 51854. (head) Committed revision 51850. (2.5) Committed revision 51853. (2.4) |
|
|
msg28776 - (view) |
Author: Haejoong Lee (haepal) |
Date: 2007-01-11 18:01 |
Could someone check if the following patch fixes the problem? This patch was made against revision 51854. --- sgmllib.py.org 2006-11-06 02:31:12.000000000 -0500 +++ sgmllib.py 2007-01-11 12:39:30.000000000 -0500 @@ -16,6 +16,35 @@ # Regular expressions used for parsing +class MyMatch: + def __init__(self, i): + self._i = i + def start(self, i): + return self._i + +class EndBracket: + def search(self, data, index): + s = data[index:] + bs = None + quote = None + for i,c in enumerate(s): + if bs: + bs = False + else: + if c == '<' or c == '>': + if quote is None: + break + elif c == "'" or c == '"': + if c == quote: + quote = None + else: + quote = c + elif c == '\\': + bs = True + else: + return None + return MyMatch(i+index) + interesting = re.compile('[&<]') incomplete = re.compile('&([a-zA-Z][a-zA-Z0-9]*|#[0-9]*)? |
' '<([a-zA-Z][^<>]* |
' @@ -29,7 +58,8 @@ shorttagopen = re.compile('<[a-zA-Z][-.a-zA-Z0-9]*/') shorttag = re.compile('<([a-zA-Z][-.a-zA-Z0-9]*)/([^/]*)/') piclose = re.compile('>') -endbracket = re.compile('[<>]') +#endbracket = re.compile('[<>]') +endbracket = EndBracket() tagfind = re.compile('[a-zA-Z][-_.a-zA-Z0-9]*') attrfind = re.compile( r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*' |
msg28777 - (view) |
Author: Neal Norwitz (nnorwitz) *  |
Date: 2007-01-12 06:04 |
You should be able to check yourself. Use the current version of Python, apply the test case from the original patch and your patch to the code. If the test passes, I'll be happy to check in the fix. If that does work, please create a new patch with your code and the test case from the original patch. |
|
|
msg63409 - (view) |
Author: Paul Molodowitch (barnabas79) |
Date: 2008-03-09 05:01 |
Patch for sgmllib.py (and test_sgmllib.py) Correctly parses quoted attribute - allowing for brackets, newlines, etc within attributes - implemented by altering the loop which finds attributes within parse_starttag so it checks for open-ended quotes, and makes sure any closing brackets it finds are not within quotes In test_sgmllib, added the test case from the original patch, as well as re-enabling two other test cases, which both work now |
|
|
msg114668 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2010-08-22 10:40 |
sgmllib has been deprecated since 2.6 and has been removed from py3k. |
|
|