Issue 4060: PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC (original) (raw)

Created on 2008-10-06 22:27 by trentm, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue4060_macosx_endian.patch trentm,2008-10-06 22:53 patch to fix (as described in original comment)
pymacconfig.h.patch ronaldoussoren,2008-10-07 06:13 Moving detection of WORDS_BIGENDIAN to pymacconfig.h
pymacconfig.h.patch2 ronaldoussoren,2008-10-07 12:31
Messages (16)
msg74398 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008-10-06 22:27
Revision 63955 removed a block from configure.in (and effectively from pyconfig.h.in) having to do with endianness that results in an incorrect setting for "WORDS_BIGENDIAN" in Universal builds on Mac OS X. The removed part was this: > AH_VERBATIM([WORDS_BIGENDIAN], > [ > /* Define to 1 if your processor stores words with the most significant byte > first (like Motorola and SPARC, unlike Intel and VAX). > > The block below does compile-time checking for endianness on platforms > that use GCC and therefore allows compiling fat binaries on OSX by using > '-arch ppc -arch i386' as the compile flags. The phrasing was choosen > such that the configure-result is used on systems that don't use GCC. > */ > #ifdef __BIG_ENDIAN__ > #define WORDS_BIGENDIAN 1 > #else > #ifndef __LITTLE_ENDIAN__ > #undef WORDS_BIGENDIAN > #endif > #endif]) This used to allow "WORDS_BIGENDIAN" to be correct for all parts of a universal Python build done via `gcc -arch i386 -arch ppc ...`. This was originally added for issue 1471883 (see for a discussion of this particular bit). The result of this bug is that Python extensions using either of the following to get native byte ordering for UTF-16 decoding: PyUnicode_DecodeUTF16(..., byteorder=0); PyUnicode_DecodeUTF16Stateful(..., byteorder=0, ...); on Mac OS X/PowerPC with a universal build built on Intel hardware (most such builds) will get the wrong byte-ordering. The fix is to restore that section to configure.in and re-run autoconf and autoheader. Ronald, Was there are particular reason that this block was removed from configure.in (and pyconfig.h.in)? I'd like to hear comments from either Ronald or Martin, and then I can commit the fix.
msg74399 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008-10-06 22:31
This also shows up in the byte ordering that Python uses to encode utf-16: $ uname -a Darwin sphinx 8.11.0 Darwin Kernel Version 8.11.0: Wed Oct 10 18:26:00 PDT 2007; root:xnu-792.24.17~1/RELEASE_PPC Power Macintosh powerpc $ python2.6 -c "import codecs; codecs.open('26.txt', 'w', 'utf-16').write('hi')" $ od -cx 26.txt 0000000 377 376 h \0 i \0 fffe 6800 6900 0000006 $ /usr/bin/python -c "import codecs; codecs.open('system.txt', 'w', 'utf-16').write('hi')" $ od -cx system.txt 0000000 376 377 \0 h \0 i feff 0068 0069 0000006 The BOM here ensures, of course, that this is still valid UTF-16 content, but the difference in behaviour here btwn Python versions might not be intended.
msg74400 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-10-06 22:40
Does this also affect sys.byteorder and the struct module ? I think those would be more important to get right than the UTF-16 codec, since this only uses the native byte ordering for increased performance and compatibility with other OS tools. Since UTF-16 is not wide-spread on Mac OS X, it's not so much an issue... it would be on Windows.
msg74402 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-10-06 22:47
BTW: Does this simplified approach really work for Python on Mac OS X: On 2008-10-07 00:27, Trent Mick wrote: >> The block below does compile-time checking for endianness on > platforms >> that use GCC and therefore allows compiling fat binaries on OSX by > using >> '-arch ppc -arch i386' as the compile flags. The phrasing was > choosen >> such that the configure-result is used on systems that don't use > GCC. For most other tools that require configure tests regarding endianness on Mac OS X, the process of building a universal binary goes something like this: http://developer.apple.com/opensource/buildingopensourceuniversal.html ie. you run the whole process twice and then combine the results using lipo.
msg74404 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008-10-06 22:49
> Does this also affect sys.byteorder and the struct module ? Doesn't seem to affect sys.byteorder: $ /usr/bin/python -c "import sys; print sys.byteorder" big $ python2.6 -c "import sys; print sys.byteorder" big > I think those would be more important to get right than the UTF-16 > codec, since this only uses the native byte ordering for increased > performance and compatibility with other OS tools. Since UTF-16 is not > wide-spread on Mac OS X, it's not so much an issue... It is an issue for Python extensions that use that API. For example, it is the cause of recent Komodo builds not starting Mac OS X/PowerPC (http://bugs.activestate.com/show_bug.cgi?id=79366) because the PyXPCOM extension and embedded Python 2.6 build was getting UTF-16 data mixed up when talking with Mozilla APIs. it would be on > Windows.
msg74406 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008-10-06 22:52
> BTW: Does this simplified approach really work for Python on Mac OS X It works for Python 2.5: http://svn.python.org/view/*checkout*/python/branches/release25-maint/configure.in?rev=66299 search for "BIGENDIAN".
msg74407 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-10-06 22:59
On 2008-10-07 00:52, Trent Mick wrote: > Trent Mick <trentm@gmail.com> added the comment: > >> BTW: Does this simplified approach really work for Python on Mac OS X > > It works for Python 2.5: > > http://svn.python.org/view/*checkout*/python/branches/release25-maint/configure.in?rev=66299 > > search for "BIGENDIAN". Thanks... didn't see that the settings enables a compile-time check.
msg74424 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2008-10-07 06:13
The issue was introduced while moving universal-binary specific trickery from pyconfig.h.in to a separate header file. Obviously I must have been drunk at the time, because I didn't move the WORDS_BIGENDIAN bits correctly. The attached patch in "pymacconfig.h.patch" adds detection of WORDS_BIGENDIAN to pymacconfig.h, the header where the other pyconfig.h overrides for universal builds are as well. Background: this work was done while adding support for 4-way universal builds, that is x86, x86_64, ppc and ppc64. This required many more updates to pyconfig.h, most of which couldn't be done in a clean platform-independent way. That's why I (tried to) move the setting of pyconfig.h values that are affected by the current architecture to Include/pymacconfig.h. NOTE: I haven't tested my patch yet, I'll do a full test round later today.
msg74425 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008-10-07 07:06
> Added file: http://bugs.python.org/file11723/pymacconfig.h.patch I'll test that on my end tomorrow -- though it looks like it will work fine. Thanks.
msg74442 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2008-10-07 12:31
Annoyingly enough my patch isn't good enough, it turns out that ctypes has introduced a SIZEOF__BOOL definition in configure.in and that needs special caseing as well. pymacconfig.h.patch2 fixes that issue as well. Do you have access to a PPC G5 system? I've determined the correct value of SIZEOF__BOOL for that platform by reading the assembly code for a small test program and hence am not 100% sure that sizeof(_Bool) actually is 1 on that architecture. One other annoying issue cropped up: regrtest.py consistently hangs in test_signal (with 100% CPU usage) when I run it in rossetta (PPC emulator). I'll test this on an actual PPC machine as well, this might well be an issue with the PPC emulator.
msg74448 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-10-07 13:37
I agree with Trent that this is a bug, and I agree with the second patch (pymacconfig.h.patch2). Mark-Andre, sys.byteorder is not affected because detects the byte order at run-time, not at compile-time. Likewise, in the struct module, several code paths rely on dynamic determination of the endianness, such as _PyLong_FromByteArray, the float packing, and the whichtable function.
msg74459 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-10-07 15:23
On 2008-10-07 14:33, Ronald Oussoren wrote: > Ronald Oussoren <ronaldoussoren@mac.com> added the comment: > > Annoyingly enough my patch isn't good enough, it turns out that ctypes > has introduced a SIZEOF__BOOL definition in configure.in and that needs > special caseing as well. > > pymacconfig.h.patch2 fixes that issue as well. Do you have access to a > PPC G5 system? I've determined the correct value of SIZEOF__BOOL for > that platform by reading the assembly code for a small test program and > hence am not 100% sure that sizeof(_Bool) actually is 1 on that > architecture. Using this helper: #include <stdio.h> main() { printf("sizeof(_Bool)=%i bytes\n", sizeof(_Bool)); } I get: sizeof(_Bool)=4 bytes on a G4 PPC. Seems strange to me, but reasonable since it is defined like this in stdbool.h: #if __STDC_VERSION__ < 199901L && __GNUC__ < 3 typedef int _Bool; #endif
msg74463 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008-10-07 16:29
> I get: > > sizeof(_Bool)=4 bytes > > on a G4 PPC. Same thing on a G5 PPC: $ cat main.c #include <stdio.h> int main(void) { printf("sizeof(_Bool) is %d\n", sizeof(_Bool)); } $ gcc main.c $ ./a.out sizeof(_Bool) is 4
msg74474 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2008-10-07 19:54
On 7 Oct, 2008, at 18:29, Trent Mick wrote: > > Trent Mick <trentm@gmail.com> added the comment: > >> I get: >> >> sizeof(_Bool)=4 bytes >> >> on a G4 PPC. > > Same thing on a G5 PPC: > > $ cat main.c > #include <stdio.h> > > int main(void) { > printf("sizeof(_Bool) is %d\n", sizeof(_Bool)); > } > $ gcc main.c What if you compile using 'gcc -arch ppc64 main.c'? Ronald
msg74494 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008-10-07 23:06
> What if you compile using 'gcc -arch ppc64 main.c'? $ gcc -arch ppc64 main.c $ ./a.out sizeof(_Bool) is 1 As you figured out.
msg78412 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-12-28 15:38
Applied the patch in r67982.
History
Date User Action Args
2022-04-11 14:56:40 admin set github: 48310
2008-12-28 15:38:00 benjamin.peterson set status: open -> closednosy: + benjamin.petersonresolution: fixedmessages: +
2008-10-07 23:06:10 trentm set messages: +
2008-10-07 19:54:17 ronaldoussoren set messages: +
2008-10-07 16:29:01 trentm set messages: + title: PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC -> PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC
2008-10-07 15:23:49 lemburg set messages: + title: PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC -> PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC
2008-10-07 13:37:39 loewis set messages: +
2008-10-07 12:31:54 ronaldoussoren set files: + pymacconfig.h.patch2messages: +
2008-10-07 07:06:41 trentm set messages: + title: PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC -> PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC
2008-10-07 06:13:04 ronaldoussoren set files: + pymacconfig.h.patchmessages: +
2008-10-06 22:59:05 lemburg set messages: + title: PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC -> PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC
2008-10-06 22:53:22 trentm set keywords: + patchfiles: + issue4060_macosx_endian.patch
2008-10-06 22:52:49 trentm set messages: +
2008-10-06 22:49:39 trentm set messages: +
2008-10-06 22:47:14 lemburg set messages: + title: PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC -> PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC
2008-10-06 22:40:34 lemburg set nosy: + lemburgmessages: +
2008-10-06 22:31:28 trentm set messages: +
2008-10-06 22:27:09 trentm create