Issue 1276: LookupError: unknown encoding: X-MAC-JAPANESE (original) (raw)
When I compile Python-3.0a1 on Mac OS X with Japanese locale, I've got LookupError like below.
========================================== running build_scripts creating build/scripts-3.0 Traceback (most recent call last): File "./setup.py", line 1572, in main() File "./setup.py", line 1567, in main 'Lib/smtpd.py'] File "/private/tmp/Python-3.0a1/Lib/distutils/core.py", line 148, in setup dist.run_commands() File "/private/tmp/Python-3.0a1/Lib/distutils/dist.py", line 943, in run_commands self.run_command(cmd) File "/private/tmp/Python-3.0a1/Lib/distutils/dist.py", line 963, in run_command cmd_obj.run() File "/private/tmp/Python-3.0a1/Lib/distutils/command/build.py", line 106, in run self.run_command(cmd_name) File "/private/tmp/Python-3.0a1/Lib/distutils/cmd.py", line 317, in run_command self.distribution.run_command(command) File "/private/tmp/Python-3.0a1/Lib/distutils/dist.py", line 963, in run_command cmd_obj.run() File "/private/tmp/Python-3.0a1/Lib/distutils/command/build_scripts.py", line 51, in run self.copy_scripts() File "/private/tmp/Python-3.0a1/Lib/distutils/command/build_scripts.py", line 82, in copy_scripts first_line = f.readline() File "/private/tmp/Python-3.0a1/Lib/io.py", line 1259, in readline decoder = self._decoder or self._get_decoder() File "/private/tmp/Python-3.0a1/Lib/io.py", line 1111, in _get_decoder make_decoder = codecs.getincrementaldecoder(self._encoding) File "/private/tmp/Python-3.0a1/Lib/codecs.py", line 951, in getincrementaldecoder decoder = lookup(encoding).incrementaldecoder LookupError: unknown encoding: X-MAC-JAPANESE make: *** [sharedmods] Error 1
This problem happens for lack of appropriate codec so also occurs in apps using getdefaultencoding.
After patching Tools/unicode/Makefile and running make generates build/mac_japanese.py, mac-japanese codec.
Added a patch that implements codecs for CJK Macintosh encodings. I tried to implement that just alike the other existing CJK codecs, but it required many inefficient mapping tables due to their odd mappings (like this: u'ABCDE' <-> 'ab' AND u'ABCD' <-> 'ac'!).
So, I decided to implement a general extension codec wrapper that can be easily modified by dictionaries given by Python code. Because all Mac CJK encodings have codecs that implement their base encodings, I just put their difference in Python codec code. The extension mechanism may be reused in customized codecs for in-house applications or legacy encoding supports.
The first patch was generated for 2.6 trunk. I'm working on porting it to 3.0.