oss-fuzz 69058: TokenError · Issue #1787 · nedbat/coveragepy (original) (raw)

This link seems to be private, so copying details here... https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=69058

Project: coveragepy
Fuzzing Engine: libFuzzer
Fuzz Target: fuzz_parse
Job Type: libfuzzer_asan_coveragepy
Platform Id: linux

Crash Type: Uncaught exception
Crash Address: 
Crash State:
  _removeHandlerRef
  _tokenize
  generate_tokens

This is the claimed stack trace:

      === Uncaught Python exception: ===
    TokenError: ('EOF in multi-line string', (2, 0))
    Traceback (most recent call last):
      File "fuzz_parse.py", line 33, in TestOneInput
      File "coverage/parser.py", line 265, in parse_source
      File "coverage/parser.py", line 143, in _raw_parse
      File "coverage/phystokens.py", line 179, in generate_tokens
      File "tokenize.py", line 461, in _tokenize
    TokenError: ('EOF in multi-line string', (2, 0))

The provided test case is an 8-byte file:

% hexdump -C /dwn/clusterfuzz-testcase-minimized-fuzz_parse-5820066691088384
00000000  ff 8d a7 dc 0a 27 27 a7                           |.....''.|
00000008

I've tried to reproduce this problem, and cannot:

from coverage.parser import PythonParser
parser = PythonParser(text="\xFF\x8D\xA7\xDC\n''\xA7")
parser.parse_source()

produces:

Traceback (most recent call last):
  File "/Users/ned/coverage/trunk/coverage/parser.py", line 265, in parse_source
    self._ast_root = ast_parse(self.text)
                     ^^^^^^^^^^^^^^^^^^^^
  File "/Users/ned/coverage/trunk/coverage/misc.py", line 381, in ast_parse
    return ast.parse(text)
           ^^^^^^^^^^^^^^^
  File "/usr/local/pyenv/pyenv/versions/3.11.9/lib/python3.11/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<unknown>", line 1
    ÿ�§Ü
     ^
SyntaxError: invalid non-printable character U+008D

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ned/coverage/trunk/fuzz.py", line 3, in <module>
    parser.parse_source()
  File "/Users/ned/coverage/trunk/coverage/parser.py", line 268, in parse_source
    raise NotPython(
coverage.exceptions.NotPython: Couldn't parse '<code>' as Python source: 'invalid non-printable character U+008D' at line 1

Somehow they have a TokenError, but coverage.py does not. I don't understand how they are getting their error.