Exception on unencodable filename · Issue #891 · nedbat/coveragepy (original) (raw)

Apologies for the quick report - I'm only seeing this in the CPython GitHub Actions CI against Python 3.9 right now. I'll come back with more details when I get a chance, but wanted to provide a heads-up in case this is an obvious enough issue.

See the build logs at https://github.com/python/cpython/commit/9707e8e22d80ca97bf7a9812816701cecde6d226/checks?check_suite_id=363085844 - to navigate there, "Coverage" / "Ubuntu (Coverage)" / "Tests with coverage". The build step is marked as successful because we suppress the failure (and then the upload fails because the file was never written).

  File "/home/runner/work/cpython/cpython/.venv/lib/python3.9/site-packages/coverage/cmdline.py", line 555, in command_line
    return self.do_run(options, args)
  File "/home/runner/work/cpython/cpython/.venv/lib/python3.9/site-packages/coverage/cmdline.py", line 710, in do_run
    self.coverage.save()
  File "/home/runner/work/cpython/cpython/.venv/lib/python3.9/site-packages/coverage/control.py", line 613, in save
    data = self.get_data()
  File "/home/runner/work/cpython/cpython/.venv/lib/python3.9/site-packages/coverage/control.py", line 667, in get_data
    if self._collector and self._collector.flush_data():
  File "/home/runner/work/cpython/cpython/.venv/lib/python3.9/site-packages/coverage/collector.py", line 427, in flush_data
    self.covdata.add_arcs(self.mapped_file_dict(self.data))
  File "/home/runner/work/cpython/cpython/.venv/lib/python3.9/site-packages/coverage/sqldata.py", line 474, in add_arcs
    file_id = self._file_id(filename, add=True)
  File "/home/runner/work/cpython/cpython/.venv/lib/python3.9/site-packages/coverage/sqldata.py", line 367, in _file_id
    cur = con.execute("insert or replace into file (path) values (?)", (filename,))
  File "/home/runner/work/cpython/cpython/.venv/lib/python3.9/site-packages/coverage/sqldata.py", line 1025, in execute
    return self.con.execute(sql, parameters)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 74: surrogates not allowed

I suspect the 0xDCFF character comes from one of our deliberately-failing tests, but it seems to be coming from a point that we couldn't possibly override in coverage without changing the test itself.

As I said, when I get a few more minutes, I'll try and turn this into a minimal repro. But wanted to at least provide the easily-accessible info now rather than delaying any report by my availability.