Chokes on source files with non-utf-8 encoding · Issue #157 · nedbat/coveragepy (original) (raw)

If you have python source files that are, e.g. latin-1 encoded, the reporter will die like this:

    coverage.main()
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/cmdline.py", line 657, in main
    status = CoverageScript().command_line(argv)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/cmdline.py", line 549, in command_line
    directory=options.directory, **report_args)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/control.py", line 599, in html_report
    reporter.report(morfs, config=self.config)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/html.py", line 83, in report
    self.report_files(self.html_file, morfs, config, config.html_dir)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/report.py", line 86, in report_files
    report_fn(cu, self.coverage._analyze(cu))
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/html.py", line 198, in html_file
    self.write_html(html_path, html)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/html.py", line 103, in write_html
    write_encoded(fname, html, 'ascii', 'xmlcharrefreplace')
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/backward.py", line 137, in write_encoded
    f.write(text.decode('utf8'))
  File "/usr/local/python2.6/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 14451: invalid continuation byte

The workaround is simple, of course, change the file's encoding and declaration (and you should be using utf-8 if any, anyway). But still I wonder whether this could be handled more gracefully and with an error message that tells what's going on.