[Python-Dev] Py_ssize_t (original) (raw)

Guido van Rossum guido at python.org
Tue Feb 20 16:57:48 CET 2007


On 2/20/07, Raymond Hettinger <raymond.hettinger at verizon.net> wrote:

After thinking more about Pyssizet, I'm surprised that we're not hearing about 64 bit users having a couple of major problems.

If I'm understanding what was done for dictionaries, the hash table can grow larger than the range of hash values. Accordingly, I would expect large dictionaries to have an unacceptably large number of collisions. OTOH, we haven't heard a single complaint, so perhaps my understanding is off.

Not until the has table has 4 billion entries. I believe that would be 96 GB just for the hash table; plus probably at least that for that many unique key strings. Not to mention the values (but those needn't be unique). I think the benefit of 64-bit architecture start above using 2 or 3 GB of RAM, so there's quite a bit of expansion space for 64-bit users before they run into this theoretical problem.

The other area where I expected to hear wailing and gnashing of teeth is users compiling with third-party extensions that haven't been updated to a Pyssizet API and still use longs. I would have expected some instability due to the size mismatches in function signatures -- the difference would only show-up with giant sized data structures -- the bigger they are, the harder they fall. OTOH, there have not been any compliants either -- I would have expected someone to submit a patch to pyport.h that allowed a #define to force Pyssizet back to a long so that the poster could make a reliable build that included non-updated third-party extensions.

In the absence of a bug report, it's hard to know whether there is a real problem. Have all major third-party extensions adopted Pyssizet or is some divine force helping unconverted extensions work with converted Python code? Maybe the datasets just haven't gotten big enough yet.

My suspicion is that building Python for an 64-bit address space is still a somewhat academic exercise. I know we don't do this at Google (we switch to other languages long before the datasets become so large we'd need a 64-bit address space for Python). What's your experience at EWT?

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list