(original) (raw)
Heh. The good old manual approach. :-) How bad indeed?
>>> from idlelib import colorizer; colorizer.make\_pat()
from idlelib import colorizer; colorizer.make\_pat()
'\\\\b(?PFalse|None|True|and|as|assert|break|class|continue|def|del|elif|else|except|finally|for|from|global|if|import|in|is|lambda|nonlocal|not|or|pass|raise|return|try|while|with|yield)\\\\b|(\[^.\\'\\\\"\\\\\\\\#\]\\\\b|^)(?PArithmeticError|AssertionError|AttributeError|BaseException|BlockingIOError|BrokenPipeError|BufferError|BytesWarning|ChildProcessError|ConnectionAbortedError|ConnectionError|ConnectionRefusedError|ConnectionResetError|DeprecationWarning|EOFError|Ellipsis|EnvironmentError|Exception|FileExistsError|FileNotFoundError|FloatingPointError|FutureWarning|GeneratorExit|IOError|ImportError|ImportWarning|IndentationError|IndexError|InterruptedError|IsADirectoryError|KeyError|KeyboardInterrupt|LookupError|MemoryError|ModuleNotFoundError|NameError|NotADirectoryError|NotImplemented|NotImplementedError|OSError|OverflowError|PendingDeprecationWarning|PermissionError|ProcessLookupError|RecursionError|ReferenceError|ResourceWarning|RuntimeError|RuntimeWarning|StopAsyncIteration|StopIteration|SyntaxError|SyntaxWarning|SystemError|SystemExit|TabError|TimeoutError|TypeError|UnboundLocalError|UnicodeDecodeError|UnicodeEncodeError|UnicodeError|UnicodeTranslateError|UnicodeWarning|UserWarning|ValueError|Warning|ZeroDivisionError|abs|all|any|ascii|bin|bool|bytearray|bytes|callable|chr|classmethod|compile|complex|copyright|credits|delattr|dict|dir|divmod|enumerate|eval|exec|exit|filter|float|format|frozenset|getattr|globals|hasattr|hash|help|hex|id|input|int|isinstance|issubclass|iter|len|license|list|locals|map|max|memoryview|min|next|object|oct|open|ord|pow|print|property|quit|range|repr|reversed|round|set|setattr|slice|sorted|staticmethod|str|sum|super|tuple|type|vars|zip)\\\\b|(?P#\[^\\\\n\]\*)|(?P(?i:\\\\br|u|f|fr|rf|b|br|rb)?\\'\\'\\'\[^\\'\\\\\\\\\]\*((\\\\\\\\.|\\'(?!\\'\\'))\[^\\'\\\\\\\\\]\*)\*(\\'\\'\\')?|(?i:\\\\br|u|f|fr|rf|b|br|rb)?"""\[^"\\\\\\\\\]\*((\\\\\\\\.|"(?!""))\[^"\\\\\\\\\]\*)\*(""")?|(?i:\\\\br|u|f|fr|rf|b|br|rb)?\\'\[^\\'\\\\\\\\\\\\n\]\*(\\\\\\\\.\[^\\'\\\\\\\\\\\\n\]\*)\*\\'?|(?i:\\\\br|u|f|fr|rf|b|br|rb)?"\[^"\\\\\\\\\\\\n\]\*(\\\\\\\\.\[^"\\\\\\\\\\\\n\]\*)\*"?)|(?P\\\\n)'
>>>
>>> from idlelib import colorizer; colorizer.make\_pat()
from idlelib import colorizer; colorizer.make\_pat()
'\\\\b(?PFalse|None|True|and|as|assert|break|class|continue|def|del|elif|else|except|finally|for|from|global|if|import|in|is|lambda|nonlocal|not|or|pass|raise|return|try|while|with|yield)\\\\b|(\[^.\\'\\\\"\\\\\\\\#\]\\\\b|^)(?PArithmeticError|AssertionError|AttributeError|BaseException|BlockingIOError|BrokenPipeError|BufferError|BytesWarning|ChildProcessError|ConnectionAbortedError|ConnectionError|ConnectionRefusedError|ConnectionResetError|DeprecationWarning|EOFError|Ellipsis|EnvironmentError|Exception|FileExistsError|FileNotFoundError|FloatingPointError|FutureWarning|GeneratorExit|IOError|ImportError|ImportWarning|IndentationError|IndexError|InterruptedError|IsADirectoryError|KeyError|KeyboardInterrupt|LookupError|MemoryError|ModuleNotFoundError|NameError|NotADirectoryError|NotImplemented|NotImplementedError|OSError|OverflowError|PendingDeprecationWarning|PermissionError|ProcessLookupError|RecursionError|ReferenceError|ResourceWarning|RuntimeError|RuntimeWarning|StopAsyncIteration|StopIteration|SyntaxError|SyntaxWarning|SystemError|SystemExit|TabError|TimeoutError|TypeError|UnboundLocalError|UnicodeDecodeError|UnicodeEncodeError|UnicodeError|UnicodeTranslateError|UnicodeWarning|UserWarning|ValueError|Warning|ZeroDivisionError|abs|all|any|ascii|bin|bool|bytearray|bytes|callable|chr|classmethod|compile|complex|copyright|credits|delattr|dict|dir|divmod|enumerate|eval|exec|exit|filter|float|format|frozenset|getattr|globals|hasattr|hash|help|hex|id|input|int|isinstance|issubclass|iter|len|license|list|locals|map|max|memoryview|min|next|object|oct|open|ord|pow|print|property|quit|range|repr|reversed|round|set|setattr|slice|sorted|staticmethod|str|sum|super|tuple|type|vars|zip)\\\\b|(?P#\[^\\\\n\]\*)|(?P(?i:\\\\br|u|f|fr|rf|b|br|rb)?\\'\\'\\'\[^\\'\\\\\\\\\]\*((\\\\\\\\.|\\'(?!\\'\\'))\[^\\'\\\\\\\\\]\*)\*(\\'\\'\\')?|(?i:\\\\br|u|f|fr|rf|b|br|rb)?"""\[^"\\\\\\\\\]\*((\\\\\\\\.|"(?!""))\[^"\\\\\\\\\]\*)\*(""")?|(?i:\\\\br|u|f|fr|rf|b|br|rb)?\\'\[^\\'\\\\\\\\\\\\n\]\*(\\\\\\\\.\[^\\'\\\\\\\\\\\\n\]\*)\*\\'?|(?i:\\\\br|u|f|fr|rf|b|br|rb)?"\[^"\\\\\\\\\\\\n\]\*(\\\\\\\\.\[^"\\\\\\\\\\\\n\]\*)\*"?)|(?P\\\\n)'
>>>
On Mon, Apr 2, 2018 at 11:32 AM, MRAB <python@mrabarnett.plus.com> wrote:
On 2018-04-02 05:43, Guido van Rossum wrote:
My question for you: how on earth did you find this?! Speaking of a needle in a haystack. Did you run some kind of analysis program that looks for regexprs? (We've received some good reports from someone who did that looking for possible DoS attacks.)The thread was about string prefixes.
Terry Reedy wrote "IDLE's colorizer does its parsing with a giant regex."
I wondered: "How bad could it be?" (It's smaller now that the IGNORECASE flag can have a local scope.)
It wasn't hard to find because it was in a file called "colorizer.py" in a folder called "idlelib".
On Sun, Apr 1, 2018 at 6:49 PM, MRAB <python@mrabarnett.plus.com python@mrabarnett.plus.com>> wrote:
A thread on python-ideas is talking about the prefixes of string
literals, and the regex used in IDLE.
Line 25 of Lib\\idlelib\\colorizer.py is:
stringprefix = r"(?i:\\br|u|f|fr|rf|b|br|rb)?"
which looks slightly wrong to me.
The \\b will apply only to the first choice.
Shouldn't it be more like:
stringprefix = r"(?:\\b(?i:r|u|f|fr|rf|b|br|rb))?"
?
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido% 40python.org
--
--Guido van Rossum (python.org/\~guido)