Issue 1382096: MacRoman Encoding Bug (OHM vs. OMEGA) (original) (raw)

The file encodings/mac_roman.py in Python 2.4.1 contains the following incorrect character definition on line 96:

    0x00bd: 0x2126, # OHM SIGN

This should read:

    0x00bd: 0x03A9, # GREEK CAPITAL LETTER OMEGA

Presumably this bug occurred due to a misreading, given that OHM and OMEGA having the same glyph. Evidence that the OMEGA interpretation is correct:

0xBD 0x03A9 # GREEK CAPITAL LETTER OMEGA -http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT

Further evidence can be found by Googling for MacRoman tables. This bug means that, for example, the following code gives a UnicodeEncodeError when it shouldn't do:

u'\u03a9'.encode('macroman')

For a workaround, I've been using the following code:

import codecs from encodings import mac_roman mac_roman.decoding_map[0xBD] = 0x03A9 mac_roman.encoding_map = codecs.make_encoding_map(mac_roman.decoding_map)

And then, to use the example above:

u'\u03a9'.encode('macroman') '\xbd'

Thanks,

-- Sean B. Palmer