get_python_source gets called on shared object file and causes SyntaxError · Issue #1160 · nedbat/coveragepy (original) (raw)

Describe the bug

While testing with pytest-cov we started noticing some failures due to an "internal" error. Digging deeper, it looks like somehow a shared object (.so) file has made its way to get_python_source, which then fails with

INTERNALERROR> SyntaxError: invalid or missing encoding declaration

with the actual issue being

INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "/Users/gabriele.tornetta/.pyenv/versions/3.5.10/lib/python3.5/tokenize.py", line 392, in find_cookie
INTERNALERROR>     line_string = line.decode('utf-8')
INTERNALERROR> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte

Whilst it's maybe the case that the issue is more upstream (maybe .so files should not make it this far?), perhaps get_python_source should be enhanced to catch these potential cases. For instance, the extension check at

if ext == ".py" and env.WINDOWS:
exts = [".py", ".pyw"]
else:
exts = [ext]

could be enhanced?

To Reproduce

Unfortunately, it's not clear to me how to reproduce this problem, as it has appeared all of a sudden, and I don't see any changes in the dependency versions that are currently being pulled in the CI jobs. The project that is being tested contains some Cythonized code, and this is where the shared object is coming from.

  1. What version of Python are you using? 3.5, 3.6, 3.8
  2. What version of coverage.py are you using? The output of coverage debug sys is helpful. 5.5
  3. What versions of what packages do you have installed? The output of pip freeze is helpful.
  4. What code are you running? Give us a specific commit of a specific repo that we can check out.
  5. What commands did you run?

Expected behavior

No internal errors 🙂