msg74398 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-06 22:27 |
Revision 63955 removed a block from configure.in (and effectively from pyconfig.h.in) having to do with endianness that results in an incorrect setting for "WORDS_BIGENDIAN" in Universal builds on Mac OS X. The removed part was this: > AH_VERBATIM([WORDS_BIGENDIAN], > [ > /* Define to 1 if your processor stores words with the most significant byte > first (like Motorola and SPARC, unlike Intel and VAX). > > The block below does compile-time checking for endianness on platforms > that use GCC and therefore allows compiling fat binaries on OSX by using > '-arch ppc -arch i386' as the compile flags. The phrasing was choosen > such that the configure-result is used on systems that don't use GCC. > */ > #ifdef __BIG_ENDIAN__ > #define WORDS_BIGENDIAN 1 > #else > #ifndef __LITTLE_ENDIAN__ > #undef WORDS_BIGENDIAN > #endif > #endif]) This used to allow "WORDS_BIGENDIAN" to be correct for all parts of a universal Python build done via `gcc -arch i386 -arch ppc ...`. This was originally added for issue 1471883 (see for a discussion of this particular bit). The result of this bug is that Python extensions using either of the following to get native byte ordering for UTF-16 decoding: PyUnicode_DecodeUTF16(..., byteorder=0); PyUnicode_DecodeUTF16Stateful(..., byteorder=0, ...); on Mac OS X/PowerPC with a universal build built on Intel hardware (most such builds) will get the wrong byte-ordering. The fix is to restore that section to configure.in and re-run autoconf and autoheader. Ronald, Was there are particular reason that this block was removed from configure.in (and pyconfig.h.in)? I'd like to hear comments from either Ronald or Martin, and then I can commit the fix. |
|
|
msg74399 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-06 22:31 |
This also shows up in the byte ordering that Python uses to encode utf-16: $ uname -a Darwin sphinx 8.11.0 Darwin Kernel Version 8.11.0: Wed Oct 10 18:26:00 PDT 2007; root:xnu-792.24.17~1/RELEASE_PPC Power Macintosh powerpc $ python2.6 -c "import codecs; codecs.open('26.txt', 'w', 'utf-16').write('hi')" $ od -cx 26.txt 0000000 377 376 h \0 i \0 fffe 6800 6900 0000006 $ /usr/bin/python -c "import codecs; codecs.open('system.txt', 'w', 'utf-16').write('hi')" $ od -cx system.txt 0000000 376 377 \0 h \0 i feff 0068 0069 0000006 The BOM here ensures, of course, that this is still valid UTF-16 content, but the difference in behaviour here btwn Python versions might not be intended. |
|
|
msg74400 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2008-10-06 22:40 |
Does this also affect sys.byteorder and the struct module ? I think those would be more important to get right than the UTF-16 codec, since this only uses the native byte ordering for increased performance and compatibility with other OS tools. Since UTF-16 is not wide-spread on Mac OS X, it's not so much an issue... it would be on Windows. |
|
|
msg74402 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2008-10-06 22:47 |
BTW: Does this simplified approach really work for Python on Mac OS X: On 2008-10-07 00:27, Trent Mick wrote: >> The block below does compile-time checking for endianness on > platforms >> that use GCC and therefore allows compiling fat binaries on OSX by > using >> '-arch ppc -arch i386' as the compile flags. The phrasing was > choosen >> such that the configure-result is used on systems that don't use > GCC. For most other tools that require configure tests regarding endianness on Mac OS X, the process of building a universal binary goes something like this: http://developer.apple.com/opensource/buildingopensourceuniversal.html ie. you run the whole process twice and then combine the results using lipo. |
|
|
msg74404 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-06 22:49 |
> Does this also affect sys.byteorder and the struct module ? Doesn't seem to affect sys.byteorder: $ /usr/bin/python -c "import sys; print sys.byteorder" big $ python2.6 -c "import sys; print sys.byteorder" big > I think those would be more important to get right than the UTF-16 > codec, since this only uses the native byte ordering for increased > performance and compatibility with other OS tools. Since UTF-16 is not > wide-spread on Mac OS X, it's not so much an issue... It is an issue for Python extensions that use that API. For example, it is the cause of recent Komodo builds not starting Mac OS X/PowerPC (http://bugs.activestate.com/show_bug.cgi?id=79366) because the PyXPCOM extension and embedded Python 2.6 build was getting UTF-16 data mixed up when talking with Mozilla APIs. it would be on > Windows. |
|
|
msg74406 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-06 22:52 |
> BTW: Does this simplified approach really work for Python on Mac OS X It works for Python 2.5: http://svn.python.org/view/*checkout*/python/branches/release25-maint/configure.in?rev=66299 search for "BIGENDIAN". |
|
|
msg74407 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2008-10-06 22:59 |
On 2008-10-07 00:52, Trent Mick wrote: > Trent Mick <trentm@gmail.com> added the comment: > >> BTW: Does this simplified approach really work for Python on Mac OS X > > It works for Python 2.5: > > http://svn.python.org/view/*checkout*/python/branches/release25-maint/configure.in?rev=66299 > > search for "BIGENDIAN". Thanks... didn't see that the settings enables a compile-time check. |
|
|
msg74424 - (view) |
Author: Ronald Oussoren (ronaldoussoren) *  |
Date: 2008-10-07 06:13 |
The issue was introduced while moving universal-binary specific trickery from pyconfig.h.in to a separate header file. Obviously I must have been drunk at the time, because I didn't move the WORDS_BIGENDIAN bits correctly. The attached patch in "pymacconfig.h.patch" adds detection of WORDS_BIGENDIAN to pymacconfig.h, the header where the other pyconfig.h overrides for universal builds are as well. Background: this work was done while adding support for 4-way universal builds, that is x86, x86_64, ppc and ppc64. This required many more updates to pyconfig.h, most of which couldn't be done in a clean platform-independent way. That's why I (tried to) move the setting of pyconfig.h values that are affected by the current architecture to Include/pymacconfig.h. NOTE: I haven't tested my patch yet, I'll do a full test round later today. |
|
|
msg74425 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-07 07:06 |
> Added file: http://bugs.python.org/file11723/pymacconfig.h.patch I'll test that on my end tomorrow -- though it looks like it will work fine. Thanks. |
|
|
msg74442 - (view) |
Author: Ronald Oussoren (ronaldoussoren) *  |
Date: 2008-10-07 12:31 |
Annoyingly enough my patch isn't good enough, it turns out that ctypes has introduced a SIZEOF__BOOL definition in configure.in and that needs special caseing as well. pymacconfig.h.patch2 fixes that issue as well. Do you have access to a PPC G5 system? I've determined the correct value of SIZEOF__BOOL for that platform by reading the assembly code for a small test program and hence am not 100% sure that sizeof(_Bool) actually is 1 on that architecture. One other annoying issue cropped up: regrtest.py consistently hangs in test_signal (with 100% CPU usage) when I run it in rossetta (PPC emulator). I'll test this on an actual PPC machine as well, this might well be an issue with the PPC emulator. |
|
|
msg74448 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-10-07 13:37 |
I agree with Trent that this is a bug, and I agree with the second patch (pymacconfig.h.patch2). Mark-Andre, sys.byteorder is not affected because detects the byte order at run-time, not at compile-time. Likewise, in the struct module, several code paths rely on dynamic determination of the endianness, such as _PyLong_FromByteArray, the float packing, and the whichtable function. |
|
|
msg74459 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2008-10-07 15:23 |
On 2008-10-07 14:33, Ronald Oussoren wrote: > Ronald Oussoren <ronaldoussoren@mac.com> added the comment: > > Annoyingly enough my patch isn't good enough, it turns out that ctypes > has introduced a SIZEOF__BOOL definition in configure.in and that needs > special caseing as well. > > pymacconfig.h.patch2 fixes that issue as well. Do you have access to a > PPC G5 system? I've determined the correct value of SIZEOF__BOOL for > that platform by reading the assembly code for a small test program and > hence am not 100% sure that sizeof(_Bool) actually is 1 on that > architecture. Using this helper: #include <stdio.h> main() { printf("sizeof(_Bool)=%i bytes\n", sizeof(_Bool)); } I get: sizeof(_Bool)=4 bytes on a G4 PPC. Seems strange to me, but reasonable since it is defined like this in stdbool.h: #if __STDC_VERSION__ < 199901L && __GNUC__ < 3 typedef int _Bool; #endif |
|
|
msg74463 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-07 16:29 |
> I get: > > sizeof(_Bool)=4 bytes > > on a G4 PPC. Same thing on a G5 PPC: $ cat main.c #include <stdio.h> int main(void) { printf("sizeof(_Bool) is %d\n", sizeof(_Bool)); } $ gcc main.c $ ./a.out sizeof(_Bool) is 4 |
|
|
msg74474 - (view) |
Author: Ronald Oussoren (ronaldoussoren) *  |
Date: 2008-10-07 19:54 |
On 7 Oct, 2008, at 18:29, Trent Mick wrote: > > Trent Mick <trentm@gmail.com> added the comment: > >> I get: >> >> sizeof(_Bool)=4 bytes >> >> on a G4 PPC. > > Same thing on a G5 PPC: > > $ cat main.c > #include <stdio.h> > > int main(void) { > printf("sizeof(_Bool) is %d\n", sizeof(_Bool)); > } > $ gcc main.c What if you compile using 'gcc -arch ppc64 main.c'? Ronald |
|
|
msg74494 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-07 23:06 |
> What if you compile using 'gcc -arch ppc64 main.c'? $ gcc -arch ppc64 main.c $ ./a.out sizeof(_Bool) is 1 As you figured out. |
|
|
msg78412 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-12-28 15:38 |
Applied the patch in r67982. |
|
|