Issue 1313051: mac_roman codec missing "apple" codepoint (original) (raw)

Created on 2005-10-04 16:37 by tony_nelson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
mac_roman.py tony_nelson,2005-10-05 02:16 mac_roman codec generated from ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT
Messages (7)
msg26497 - (view) Author: Tony Nelson (tony_nelson) Date: 2005-10-04 16:37
The mac_roman codec is missing a single codepoint for the trademarked Apple logo (0xF0 <=> 0xF8FF per Apple docs), which prevents round-tripping of mac_roman text through Unicode. Adding the codepoint as a private encoding (per Apple) has no trademark implications, only the character itself, in a font, would have such issues. I'm using Python 2.3, but AFAICT it is an issue in later Python versions as well.
msg26498 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2005-10-04 18:07
Logged In: YES user_id=89016 The codepoint 0xF8FF is in the Private Use Area, so this is not an official Unicode character, and for other uses 0xF8FF might mean something completely different. So I think this mapping shouldn't be added to mac_roman.
msg26499 - (view) Author: Tony Nelson (tony_nelson) Date: 2005-10-04 20:41
Logged In: YES user_id=1356214 It isn't Python's job to tell people what characters they are allowed to use. Apple defined the codepoint and its mapping to Unicode. Python is not the Unicode Police, and should not damage the data it was given just to prove a point. Damaging the user's data isn't very "batteries included".
msg26500 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-10-04 21:48
Logged In: YES user_id=38388 Tony, comment like yours are not very helpful. Python's codecs rely on facts defined by standards bodies, e.g. the Unicode consortium, ISO, etc.. If you don't present proof of your claim then there's nothing much we can do about your particular problem. Fortunately, proof isn't hard to find in this case: http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT Looks like Apple added the mapping sometime after the codec was created. Walter: it is common for companies to add their logos as private Unicode characters. This happens a lot in the Asian world. Of course, interop isn't great, but at least you don't lose information by converting to Unicode. Tony: Python is not damaging your data - the codec will raise an exception in case that particular character is converted to Unicode. Please recreate the codec using gencodec.py (which you can find the Tools/ directory) and add it as attachement to this bug report. Thanks.
msg26501 - (view) Author: Tony Nelson (tony_nelson) Date: 2005-10-05 02:16
Logged In: YES user_id=1356214 >Tony: Python is not damaging your data - the codec will >raise an exception in case that particular character is >converted to Unicode. Right, crashing the unsuspecting user's program and destroying the data utterly. Anyway, it doesn't damage /my/ data because I add the missing codepoint to the codec: # Fix missing Apple logo in mac_roman. import encodings.mac_roman if not encodings.mac_roman.decoding_map[0xF0]: encodings.mac_roman.decoding_map[0xF0] = 0xF8FF encodings.mac_roman.encoding_map[0xF8FF] = 0xF0 It just damages data for all the other users of the codec. >Please recreate the codec using gencodec.py (which you can >find the Tools/ directory) and add it as attachement to this >bug report. Thanks. Umm, I take it you want me to download a mapping file first. Here is a new mac_roman.py.
msg26502 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-10-21 14:27
Logged In: YES user_id=38388 This should be resolved with the new codec in CVS.
msg26503 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2005-10-27 19:38
Logged In: YES user_id=341410 tony_nelson: Raising an exception during execution need not crash a user program, that's why Guido added try/except clauses into the language. You would be well advised to learn about and use them, as you will no doubt run into other exception-causing situations in the future.
History
Date User Action Args
2022-04-11 14:56:13 admin set github: 42445
2005-10-04 16:37:41 tony_nelson create