HTMLReporter fails when source file is encoded in UTF-8 with BOM signature · Issue #179 · nedbat/coveragepy (original) (raw)
Originally reported by pablodcar (Bitbucket: pablodcar, GitHub: pablodcar)
Hi, I'm thankful for this wonderful tool. We are using it very extensively and I hope to contribute adding new APIs and features in the future.
When a source code is encoded in UTF-8 with BOM signature, //coverage.phystokens.source_encoding// returns the correct encoding: //"utf-8-sig"//. But when the file is rendered inside the html template, using that encoding to write the report to disk, it raises a //UnicodeDecodeError//, because the BOM can not be in the middle of the final output:
File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/control.py", line 603, in html_report
reporter.report(morfs)
File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/html.py", line 87, in report
self.report_files(self.html_file, morfs, self.config.html_dir)
File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/report.py", line 83, in report_files
report_fn(cu, self.coverage._analyze(cu))
File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/html.py", line 222, in html_file
html = html.encode(encoding)
File "/home/pablo/baco-dyn/lib/python2.6/encodings/utf_8_sig.py", line 15, in encode
return (codecs.BOM_UTF8 + codecs.utf_8_encode(input, errors)[0], len(input))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 18296: ordinal not in range(128)
I'm attaching a patch to decode and encode the source file in advance, using UTF-8 when utf-8-sig is detected. I hope you can review it and consider adding this change.
Thanks in advance,
Pablo Carballo
- Bitbucket: https://bitbucket.org/ned/coveragepy/issue/179
- This issue had attachments: html.py.diff. See the original issue for details.