[Python-Dev] DRAFT: python-dev summary for 2006-05-16 to 2006-05-31 (original) (raw)

Steven Bethard steven.bethard at gmail.com
Thu Jun 15 19:28:28 CEST 2006


Ok, for the first time in a few months, you're getting this summary before the next one is due. Woo-hoo! (Yes, I know I'm not even a day ahead. Let me enjoy my temporary victory.) =)

Here's the draft summary for the second half of May. Let me know what comments/corrections you have. Thanks!

============= Announcements


QOTF: Quote of the Fortnight

Martin v. Löwis on what kind of questions are appropriate for python-dev:

... [python-dev] is the list where you say "I want to help", not

so much "I need your help".

Contributing thread:


Python 2.5 schedule

Python 2.5 is moving steadily towards its next release. See PEP 356_ for more details and the full schedule. You may start to see a few warnings at import time if you've named non-package directories with the same names as your modules/packages. Python-dev suggests renaming these directories -- though the warnings won't give you any real trouble in Python 2.5, there's a chance that a future version of Python will drop the need for init.py.

.. _PEP 356: http://www.python.org/dev/peps/pep-0356/

Contributing thread:


Restructured library reference

Thanks to work by A.M. Kuchling and Michael Spencer, the organization of the development Library Reference documentation_ structure is much improved over the old one_. Thanks for your hard work guys!

.. _development Library Reference documentation: http://docs.python.org/dev/lib/lib.html .. _old one: http://docs.python.org/lib/lib.html

Contributing thread:


Need for Speed Sprint results

The results of the Need for Speed Sprint_ are all posted on the wiki. In particular, you should check a number of successes_ they had in speeding up various parts of Python including function calls, string and Unicode operations, and string<->integer conversions.

.. _Need for Speed Sprint: http://wiki.python.org/moin/NeedForSpeed/ .. _successes: http://wiki.python.org/moin/NeedForSpeed/Successes

Contributing threads:


Python old-timer memories

Guido's been collecting memories of old-timers_ who have been using Python for 10 years or more. Be sure to check 'em out and add your own!

.. _memories of old-timers: http://www.artima.com/weblogs/viewpost.jsp?thread=161207

Contributing thread:

========= Summaries


Struct module inconsistencies

Changes to the struct module to do proper range checking resulted in a few bugs showing up where the stdlib depended on the old, undocumented behavior. As a compromise, Bob Ippolito added code to do the proper range checking and issue DeprecationWarnings, and then made sure that the all struct results were calculated with appropriate bit masking. The warnings are expected to become errors in Python 2.6 or 2.7.

Bob also updated the struct module to return ints instead of longs whenever possible, even for the format codes that had previously guaranteed longs (I, L, q and Q).

Contributing threads:


Using epoll for the select module

Ross Cohen implemented a drop-in replacement for select.poll_ using Linux's epoll (a more efficient io notifcation system than poll). The select interface is already much closer to the the epoll API than the poll API, and people liked the idea of using epoll silently when available. Ross promised to look into merging his code with the current select module (though it wasn't clear whether or not he would do this using ctypes isntead of an extension module as some people had suggested).

.. _drop-in replacement for select.poll: http://sourceforge.net/projects/pyepoll

Contributing thread:


Negatives and sequences

Fredrik Lundh pointed out that using a negative sign and multiplying by -1 do not always produce the same behavior, e.g.::

>>> -1 * (1, 2, 3)
()
>>> -(1, 2, 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: bad operand type for unary -

Though no one seemed particularly concerned about the discrepancy, the thread did spend some time discussing the behavior of sequences multiplied by negatives. A number of folks were pushing for this to become an error until Uncle Timmy showed some use-cases like::

# right-justify to 80 columns, padding with spaces
s = " " * (80 - len(s)) + s

The rest of the thread turned into a (mostly humorous) competition for the best way to incomprehensibly alter sequence multiplication semantics.

Contributing thread:


Removing METH_OLDARGS

Georg Brandl asked about removing METH_OLDARGS which has been deprecated since 2.2. Unfortunately, there are still a bunch of uses of it in Modules, and it's still the default if no flag is specified. Georg promised to work on removing the ones in Python core, and there was some discussion of trying to mark the API as deprecated. Issuing a DeprecationWarning seemeed too heavy-handed, so Georg looked into generating C compile time warnings by marking PyArg_Parse as Py_DEPRECATED.

Contributing thread:


Propogating exceptions in dict lookup

Armin Rigo offered up a patch to stop dict lookup from hiding exceptions_ in user-defined eq methods. The PyDict_GetItem() API gives no way of propogating such an exception, so previously the exceptions were just swallowed. Armin moved the exception-swallowing part out of lookdict() and into PyDict_GetItem() so that even though PyDict_GetItem() will still swallow the exceptions, all other ways of invoking dict lookup (e.g. value = d[key] in Python code) will now propogate the exception properly. Scott Dial brought up an odd corner case where the old behavior would cause insertion of a value into the dict because the exception was assumed to indicate a new key, but people didn't seem to worried about breaking this behavior.

.. _a patch to stop dict lookup from hiding exceptions: http://bugs.python.org/1497053

Contributing thread:


String/unicode inconsistencies

After the Need for Speed Sprint unified some of the string and unicode code, some tests started failing where string and unicode objects had different behavior, e.g. 'abc'.find('', 100) used to return -1 but started returning 100. There was some discussion about what was the right behavior here and Fredrik Lundh promised to implement whatever was decided.

Contributing thread:


Allowing inline "if" with for-loops

Heiko Wundram presented a brief PEP suggesting that if-statements in the first line of a for-loop could be optionally inlined, so for example instead of::

for node in tree:
    if node.haschildren():
        <do something with node>

you could write::

for node in tree if node.haschildren():
    <do something with node>

Most people seemed to feel that saving a colon character and a few indents was not a huge gain. Some also worried that this change would encourage code that was harder to read, particularly if the for-clause or if-clause got long. Guido rejected it, and Heiko promised to submit it as a full PEP so that the rejection would be properly recorded.

Contributing thread:


Splitting strings with embedded quoted strings

Dave Cinege proposed augmenting str.split to allow a non-split delimiter to be specified so that splits would not happen within particular substrings, e.g.::

>>> 'Here is "a phrase that should not get split"'.split(None,-1,'"')
['Here', 'is', 'a phrase that should not get split']

Most people were opposed to complicating the API of str.split, but even as a separate method, people didn't seem to think that the need was that great, particularly since the most common needs for such functionality were already covered by shlex.split() and the csv module.

Contributing thread:


Deadlocks with fork() and multithreading

Rotem Yaari ran into some deadlocks using the subprocess module in a multithreaded environment. If a thread other than the thread calling fork is holding the import lock, then since posix only replicates the calling thread, the new child process ends up with an import lock that is locked by a no longer existing thread. Ronald Oussoren offered up a repeatable test case, and a number of strategies for solving the problem were discussed, including releasing the import lock during a fork and throwing away the old import lock after a fork.

Contributing threads:


string.partition

Fredrik Lundh asked about the status of string.partition, and there was a brief discussion about whether or not to return real string objects or lazy objects that would only make a copy if the original string disappeared. Guido opted for the simpler approach using real string objects, and Fredrik implemented it.

Contributing threads:


Speeding up parsing of longs

Runar Petursson asked about speeding up parsing of longs from a slice of a string, e.g. long(mystring[x:y]). He initially proposed adding start= and end= keyword arguments to the long constructor, but that seemed like a slippery slope where every function that took a string would eventually need the same arguments. Tim Peters pointed out that a buffer object would solve the problem if PyLong_FromString() supported buffer's "offset & length" view or the world instead of only seeing the start index. While adding a PyLong_FromStringAndSize() would solve this particular problem, all the internal parsing routines have a similar problem -- none of them support a slice-based API.

As an alternate approach, Martin Blais was working on a "hot" buffer class, based on the design of the Java NIO ByteBuffer class, which would work without an intermediate string creation or memory copy.

Contributing thread:


Speeding up try/except

After Steve Holden noticed a ~60% slowdown between Python 2.4.3 and the Python trunk on the pybench try/except test, Sean Reifschneider and Richard Jones looked into the problem and found that the slowdown was due to creation of Exception objects. Exceptions had been converted to new-style objects by using PyType_New() as the constructor and then adding magic methods with PyMethodDef(). By changing BaseException to use a PyType_Type definition and the proper C struct to associate methods with the class, Sean and Richard Jones were able to speed up try/except to 30% faster than it was in Python 2.4.3.

Contributing thread:


Supporting zlib's inflateCopy

Guido noticed that the zlib module was failing with libz 1.1.4. Even though Python has its own copy of libz 1.2.3, it tries to use the system libraries on Unix, so when the zlib module's compress and decompress objects were updated with a copy() method (using libz's inflateCopy() function), this broke compatibility for any system that used a zlib older than 1.2.0. Chris AtLee provided a patch conditionalizing the addition of the copy() method_ on the version of libz available.

.. _patch conditionalizing the addition of the copy() method: http://bugs.python.org/1503046

Contributing thread:


Potential ssize_t values

Neal Norwitz looked through the Python codebase for longs that should potentially be declared as ssize_t instead. There was a brief discussion about changing int's ob_ival to ssize_t, but this would have been an enormous change this late in the release cycle and would have slowed down operations on short int operations. Hash values were also discussed, but since there's no natural correlation between a hash value and the size of a collection, most people thought it was unnecessary for the moment. Martin v. Löwis suggested upping the recursion limit to ssize_t, and formalizing a 16-bit and 31-bit limit on line and column numbers, respectively.

Contributing threads:


itertools.iwindow

Torsten Marek proposed adding a windowing function to itertools like::

>>> list(iwindow(range(0,5), 3))
[[0, 1, 2], [1, 2, 3], [2, 3, 4]]

Raymond Hettinger pointed him to a previous discussion_ on comp.lang.python where he had explained that collections.deque() was usually a better solution. Nick Coghlan suggested putting the deque example in the collections module docs, but the thread trailed off after that.

.. _previous discussion: http://mail.python.org/pipermail/python-list/2005-March/270757.html

Contributing thread:


Problems with buildbots and files left around

Neal Norwitz discovered some problems with the buildbots after finding a few tests that didn't properly clean up, leaving a few files around afterwards. Martin v. Löwis explained that forcing a build on a non-existing branch will remove the build tree (which should clean up a lot of the files) and also that "make distclean" could be added to the clean step of Makefile.pre.in and master.cfg.

Contributing thread:


PEP 3101: Advanced String Formatting

The discussion of PEP 3101's string formatting continued again this fortnight. Guido generally liked the proposal, though he suggested following .NET's quoting syntax of doubling the braces, and maybe allowing all formatting errors to pass silently so that rarely raised exceptions don't hide themselves if their format string has an error. The discussion was then moved to the python-3000 list.

.. _PEP 3101: http://www.python.org/dev/peps/pep-3101/ .. _python-3000 list: http://mail.python.org/mailman/listinfo/python-3000

Contributing thread:


DONT_HAVE_* vs. HAVE_* macros

Neal Norwitz asked whether some recently checked-in DONT_HAVE_* macros should be replaced with HAVE_* macros instead. Martin v. Löwis indicated that these were probably written this way because Luke Dunstan (the contributor) didn't want to modify configure.in and run autoconf. Luke noted that the configure.in and autoconf effort is greater for Windows developers, but also agreed to convert things to autoconf anyway.

Contributing thread:


Changing python int to long long

Sean Reifschneider looked into converting the Python int type to long long. Though for simple math he saw speedups of around 25%, for ints that fit entirely within 32-bits, the slowdown was around 11%. Sean was considering changing the int->long automatic conversion so that ints would first be up-converted to long longs and then to Python longs. Guido said that it would be okay to standardize all ints as 64-bits everywhere, but only for Python 2.6.

Contributing thread:


C-level exception invariants

Tim Peters was looking at what kind of invariants could be promised for C-level exceptions. In particular, he was hoping to promise that for PyFrameObject's f_exc_type, f_exc_value, and f_exc_traceback, either all are NULL or none are NULL. In his investigation, he found a number of errors, including that _PyExc_Init() tries to raise an AttributeError before the exception pointers have been initialized.

Contributing thread:


C-code style

Martin Blais asked about the policy for C code in Python core. PEP 7_ explains that for old code, the most important thing is to be consistent with the surrounding style. For new C files (and for Python 3000 code) indentation should be 4 spaces per indent, all spaces (no tabs in any file). There was a short discussion about reformatting the current C code, but that would unnecessarily break svn blame and make merging more difficult.

.. _PEP 7: http://www.python.org/dev/peps/pep-0007/

Contributing thread:

================ Deferred Threads

================== Previous Summaries

=============== Skipped Threads



More information about the Python-Dev mailing list