[Python-Dev] DRAFT: python-dev summary for 2005-11-16 to 2005-11-31 (original) (raw)

Steven Bethard steven.bethard at gmail.com
Sun Dec 18 00:13:05 CET 2005


Here's the summary for the first half of November -- sorry for the bit of a delay. As always, let me or Tony know if you have any corrections!

===================== Summary Announcements


Reminder: Python is now on Subversion!

Don't forget that the Python source code is now hosted on svn.python.org as a Subversion (rather than CVS) repository.

Note that because of the way the subversion conversion was done, by-date revision specifications for dates prior to the switchover won't work. To work around this, you can use svn diff (find the changes since some date), svn up (check out revision a some date), and svn annotate (aka svn blame).

Removing the CVS repository from sourceforge isn't possible without hacks (as a result of their "open source never goes away" policy). However, it's no longer available from the project page, and the repository is now filled with files pointing people to the new repository.

Contributing threads:

[TAM]

========= Summaries


Memory management in the AST

Thomas Lee's attempt to implement PEP 341_ brought up some issues about working with the new AST code. Because the AST code used its own custom objects instead of PyObjects, it also introduced its own set of allocation/deallocation functions instead of the existing Py_INCREF and Py_DECREF. There was some discussion about how best to simplify the scheme, with the two main suggestions being:

(1) Convert all AST objects into PyObjects so Py_INCREF and Py_DECREF work (2) Create an arena API, where objects are added to the arena and then can be freed in one shot when the arena is freed

Neal Norwitz presented an example from the current AST code using the various asdl_*_free functions, and he, Greg Ewing and Martin v. Löwis compared how the code would look with the various API suggestions. While using ref-counting had the benefit of being consistent with the rest of Python, there were still some who felt that the arena API would simplify things enough to make the extra learning curve worthwhile. It seemed likely that branches or patches for the various APIs would appear shortly.

While the C API is still undergoing these changes, and thus the Python API is still a ways off, a few implementations for the Python API were suggested. If the AST code ends up using PyObjects, these could be passed directly to Python code, though they would probably have to be immutable. Brett Cannon suggested that another route would be a simple PyString marshalling like the current parser module, so that Python code and C code would never share the same objects.

.. _PEP 341: http://www.python.org/peps/pep-0341.html

Contributing threads:

[SJB]


Profilers in the stdlib

Armin Rigo summarised the current Python profiler situation, which includes profile.Profile (ages-old, slow, pure Python profiler with limited support for profiling C calls), hotshot (Python 2.2+, faster than profile.py, but very slow to convert the log file to the pstats.Stats format, possibly inaccurate, doesn't know about C calls), and lsprof_ (Brett Rosen, Ted Czotter, Michael Hudson, Armin Rigo; doesn't support C calls, incompatible interface with profile.py/hotshot, can record detailed stats about children). He suggested that lsprof be added to the standard library, keeping profile.py as a pure Python implementation and replacing hotshot with lsprof.

There was concern about maintenence of the library; however, since Armin and Michael are core developers, this seems covered. Martin suggested that lsprof be distributed separately for some time, and then included when it is more mature. Many people were concerned about having so many profilers included (with the preference for a single profiler that would suit beginners, since advanced users can easily install third-party modules, which could be referenced in the documentation).

Tim Peters explained that the aim of hotshot wasn't to reduce total time overhead, but to be less disruptive (than profile.py) to the code being profiled, while that code is running, via tiny little C functions that avoid memory allocation/deallocation. Hotshot can do much more than the minimalistic documentation says (e.g. it could be used as the basis of a tracing tool to debug software, to measure test coverage); you won't find them discussed in the documentation, which makes user experience mostly negative, but you do find them in Tim's e-mails.

Discussion centered around whether lsprof should be added to the standard distribution, and whether hotshot and/or profile.py should be removed. Armin indicated that he favours removing hotshot, adding lsprof, which would be added as "cProfile" (c.f cPickle/Pickle, cStringIO/StringIO), and possibly rewriting profile.py as a pure Python version of lsprof.

Floris Bruynooghe (for Google's Summer of Code) wrote a replacement for profile.py_ that uses hotshot directory. This replacement didn't fix the problems with hotshot, but did result in pstats loading hotshot data 30% faster, and would mean that profile.py could be removed.

There was a little debate about whether any profiler should even be included in the standard library, but there were several people who opined that it was an important 'battery'. A few people also liked the idea of adding a statistical profiler to the standard library at some point (e.g. http://wingolog.org/archives/2005/10/28/profiling).

Aahz suggested that Armin write a PEP for this, which seems the likely way that this will progress.

Contributing thread:

.. _lsprof: http://codespeak.net/svn/user/arigo/hack/misc/lsprof .. _replacement for profile.py: http://savannah.nongnu.org/projects/pyprof/

[TAM]


The tp_free slot and multiple inheritance in C

Travis Oliphant started a thread discussing a memory problem in some new scipy core code where a huge number of objects were not being freed. Making the allocation code use malloc and free instead of PyObject_New and PyObject_Del made these problems go away. After an intense discussion, Armin Rigo figured out that the problem arose in a type that inherited both from int and from another scipy type. The tp_free slot of this type was being inherited from its second parent (int) instead of its first parent (the scipy type), and thus "deallocated" objects were put on the CPython free list of integers instead of being freed. It was unclear as to whether the code in typeobject.c which made this decision could be "fixed", so Armin suggested forcing the appropriate tp_alloc/tp_free functions in the static types instead.

Contributing threads:

[SJB]


Patches for porting Python to a new OS

Ben Decker asked for some feedback on patches porting Python to DOS/DJGPP. This lead to a discussion of what the requirements for accepting a porting patch were. Guido made it clear that he wanted porting patches included in Python whenever reasonable so that the various obscure ports would be able to upgrade to new versions of Python when they were released. The basic conditions were that the submission came from a reputable platform maintainer, and that if the patches caused problems in future Python versions, the maintainer would either need to update the patch appropriately, or have it removed from Python.

Contributing thread:

[SJB]


Making StringIO behave more like a file

Walter Dörwald identified a number of situations where StringIO (but not cStringIO) does not behave like a normal file:

These were determined to be bugs in StringIO and will likely be fixed in an upcoming Python release.

Contributing threads:

[SJB]


User-defined data for logging calls

Vinay Sajip explained that on numerous occasions, requests have been made for the ability to easily add user-defined data to logging events. For example, a multi-threaded server application may want to output specific information to a particular server thread (e.g. the identity of the client, specific protocol options for the client connection).

While this is currently possible, you have to subclass the Logger class and override its makeRecord method to put custom attributes in the LogRecord; the approach is usable but requires more work than necessary.

Vinay proposed a simpler way of achieving the same result, which requires use of an additional optional keyword argument ("extra") in logging calls. The "extra" argument will be passed to Logger.makeRecord, which extend the logRecord's dict with this argument; however, if any of the keys are already present (values calculated by the logging package), then a KeyError will be raised.

Contributing thread:

[TAM]


Updating urlparse to support RFC 3986

Paul Jimenez complained that urlparse uses a table of url schemes to determine whether a protocol (e.g. http or ftp) supports specifying a username and password in the url (e.g. https://user:pass@host:port). He suggested that all protocols should be capable of using this format.

Guido pointed out that the main purpose of urlparse is to be RFC-compliant. Paul explained that the current code is valid according to RFC 1808_ (1995-1998), but that this was superceded by RFC 2396_ (1998-2004) and RFC 3986_ (2005-). Guido was convinced, and asked for a new API (for backwards compatibility) and a patch to be submitted via sourceforge.

Contributing thread:

.. _RFC 1808: http://www.ietf.org/rfc/rfc1808.txt .. _RFC 2396: http://www.ietf.org/rfc/rfc2396.txt .. _RFC 3986: http://www.ietf.org/rfc/rfc3986.txt

[TAM]


Magic methods on the instance and on the type

Nick Coghlan pointed out that the current semantics of PEP 343_ look up methods on the instance instead of on the type, and noted that slots are generally invoked as type(obj).__slot__(obj) instead. Guido explained that in general, using __xxxx__ methods in an undocumented way (e.g. relying on them being looked up in the instance) was not supported, and code relying on that could be expected to break if the __xxxx__ method was ever upgraded to a slot. So, it was okay that the PEP 343_ support looked up methods on the instance, but anyone depending on this behavior was asking for trouble.

.. _PEP 343: http://www.python.org/peps/pep-0343.html

Contributing thread:

[SJB]


Releasing the GIL in the re module

Duncan Grisby has a multi-threaded program that does a lot of complex regular expression searching, and has trouble with threads blocking because the GIL is not released while the re engine is running. He wanted to know whether there was any fundamental reason why the re engine could not release the interpreter lock.

Fredrik Lundh pointed out that SRE can operate on anything that implements the buffer interface. This means that the objects that the engine is accessing might be mutable, which could cause problems.

Several people suggested that a better solution would be using more efficient regular expressions; Duncan explained that the expressions are user-entered, which makes this difficult.

Eric Noyau put together a patch to release the GIL when the engine performs a low level search, if (and only if) the object searched is a [unicode] string.

.. _patch to release the GIL: http://python.org/sf/1366311

Contributing threads:

[TAM]

=============== Skipped Threads



More information about the Python-Dev mailing list