The streamreader in utf_8_sig.py fails when asked to read a specified bytelength of data that ends up in the middle of a multibyte utf8 code. I will attached a atandalone unittest (which does work from autotest, but doesn't use test_support), test_utf8sig_stream.py. I will attach a patch (applied to the trunk 2.6 version), u8sig26.diff. Regards, ..jim
Oops, it looks like my patch may have broken test_partial in test_codecs. I will try to figure out what the test_partial does in the next day or so, unless someone else can add some insignt in the meantime. .jim
One additional clue: test_codecs succeeds in verbose mode but fails in non- verbose mode (autotest "verbosity") .. I think. My eyes are getting blurry. More tomorrow, I guess. ..j
I found the errror in my previous patch. It lacked a self.decode=.. line in the StreamReader.decode elif branch. I attach a replacement patch diff-u.py26_utf8sig (apply to the 2.6 version of utf_8_sig.py. (If allowed, I will next remove the incorrect patch.) This one passes test_codecs.py as well as my previously attached test module. The resulting utf_8_sig.py may benefit from further refctoring, but I didn't want to do more than necessary to fix the immediate bug. Regards, ..jim