bz2 uses "s" format to parse the filename argument: it uses the default (unicode) encoding to encode the unicode filename to a byte string. It should use the default file system encoding instead. It should also support surrogates in unicode filename and bytes/bytearray filenames. Attached patch uses PyUnicode_FSConverter() to implement that.