HTMLReporter fails when source file is encoded in UTF-8 with BOM signature · Issue #179 · nedbat/coveragepy (original) (raw)

Originally reported by pablodcar (Bitbucket: pablodcar, GitHub: pablodcar)


Hi, I'm thankful for this wonderful tool. We are using it very extensively and I hope to contribute adding new APIs and features in the future.

When a source code is encoded in UTF-8 with BOM signature, //coverage.phystokens.source_encoding// returns the correct encoding: //"utf-8-sig"//. But when the file is rendered inside the html template, using that encoding to write the report to disk, it raises a //UnicodeDecodeError//, because the BOM can not be in the middle of the final output:

  File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/control.py", line 603, in html_report
    reporter.report(morfs)
  File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/html.py", line 87, in report
    self.report_files(self.html_file, morfs, self.config.html_dir)
  File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/report.py", line 83, in report_files
    report_fn(cu, self.coverage._analyze(cu))
  File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/html.py", line 222, in html_file
    html = html.encode(encoding)
  File "/home/pablo/baco-dyn/lib/python2.6/encodings/utf_8_sig.py", line 15, in encode
    return (codecs.BOM_UTF8 + codecs.utf_8_encode(input, errors)[0], len(input))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 18296: ordinal not in range(128)

I'm attaching a patch to decode and encode the source file in advance, using UTF-8 when utf-8-sig is detected. I hope you can review it and consider adding this change.

Thanks in advance,

Pablo Carballo