[Python-Dev] Unclear on the way forward with unsigned integers (original) (raw)

Tim Peters tim.one@comcast.net
Sun, 06 Oct 2002 21:43:21 -0400


[Mark Hammond]

I'm a little confused by the new world order for working with integers in extension modules.

At the end of the day, my question is this: Assume my extension module has an unsigned integer it wishes to return to Python. Further, assume that this unsigned integer is not really an integer as such, but more a set of bits, or some other value "encoded" in 32 bits, such as an enum. (To put it another way, the "signedness" of this value seems more random than chosen)

Well, it matters: a bitset is most naturally thought of as unsigned, so that shifting doesn't introduce "mystery bits". OTOH, a C enum is, by definition, a signed integer.

How should I create the object to return to Python?

I'd create a Python long.

For a concrete example. Let's say I want to return the value of the Win32 function GetVersion().

The documentation for this function declares it is an unsigned 32 bit value. The documentation then explains that to decode this value, specific bits in the value should be examined. It then expounds on this with C sample code that relies on this unsigned behaviour by using a simple "> 0x80000000" comparison to check the high bit!

Yup. The docs also say :

This function has been superseded by GetVersionEx, which is the preferred method for obtaining system version number information. New applications should use GetVersionEx. The GetVersionEx function was developed because many existing applications err when examining the DWORD return value of a GetVersion function call, transposing the major and minor version numbers packed into that DWORD.

IOW, the bag-of-bits mixed-with bag-of-bytes model was too confusing to work with.

I see 2 choices for returning this value:

* Use PyIntFromLong() - this will give me a signed Python integer, but with an identical bit pattern. * Use PyLongFromUnsignedLong() - this function will correctly be signed, but may no longer fit in 32 bits.

Python ints don't fit in 32 bits either: they've got object headers like all objects have. The space difference here is trivial.

A third choice is to pick the values apart in C, delivering a

(NT_or_later_bool, major_version_int, minor_version_int, build_int)

tuple back to the Python user. Making people pick apart the bits in Python code seems too low-level here:

dwVersion = GetVersion();

// Get major and minor version numbers of Windows dwWindowsMajorVersion = (DWORD)(LOBYTE(LOWORD(dwVersion))); dwWindowsMinorVersion = (DWORD)(HIBYTE(LOWORD(dwVersion))); // Get build numbers for Windows NT or Win32s if (dwVersion < 0x80000000) // Windows NT dwBuild = (DWORD)(HIWORD(dwVersion)); else if (dwWindowsMajorVersion < 4) // Win32s dwBuild = (DWORD)(HIWORD(dwVersion) & ~0x8000); else // Windows 95 -- No build numbers provided dwBuild = 0;

Now, I think I am trying to stay too close to the hardware for a language like Python, but something just seems wrong with promoting my nice 32 bit value to a Python long, simply for the sake of retaining the sign for a value that the whole concept of "signed" doesn't make much sense (as it doesn't in this case, or in the case of enums etc).

Except Python doesn't have unsigned ints, so the only faithful way to return one is to make a Python long. In this specific case, though, I think it would be better to pick the bits apart for the user -- there's really no use for the raw int, signed or unsigned, except after picking it apart.

Any suggestions or general advice? While this case seems quite trivial, I am starting to face this issue more and more, especially as I am seeing these lovely "FutureWarnings" from all my lovely 32 bit hexadecimal constants <wink/frown>

Sticking "L" at the end is usually all it takes.