(original) (raw)

2011/11/25 Brett Cannon <brett@python.org>

On Thu, Nov 24, 2011 at 07:46, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Thu, Nov 24, 2011 at 10:20 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
> The problem is not with maintaining the modified directory. The
> problem was always things like changing interface between the C
> version and the Python version or introduction of new stuff that does
> not run on pypy because it relies on refcounting. I don't see how
> having a subrepo helps here.

Indeed, the main thing that can help on this front is to get more
modules to the same state as heapq, io, datetime (and perhaps a few
others that have slipped my mind) where the CPython repo actually
contains both C and Python implementations and the test suite
exercises both to make sure their interfaces remain suitably
consistent (even though, during normal operation, CPython users will
only ever hit the C accelerated version).

This not only helps other implementations (by keeping a Python version
of the module continuously up to date with any semantic changes), but
can help people that are porting CPython to new platforms: the C
extension modules are far more likely to break in that situation than
the pure Python equivalents, and a relatively slow fallback is often
going to be better than no fallback at all. (Note that ctypes based
pure Python modules \*aren't\* particularly useful for this purpose,
though - due to the libffi dependency, ctypes is one of the extension
modules most likely to break when porting).

And the other reason I plan to see this through before I die is to help distribute the maintenance burden. Why should multiple VMs fix bad assumptions made by CPython in their �own siloed repos and then we hope the change gets pushed upstream to CPython when it could be fixed once in a single repo that everyone works off of?

PyPy copied the CPython stdlib in a directory named "2.7", which is never modified; instead, adaptations are made by copying the file into "modified-2.7", and fixed there. Both directories appear in sys.path

This was done for this very reason: so that it's easy to identify the differences and suggest changes to push upstream.

But this process was not very successful for several reasons:

- The definition of "bad assumptions" used to be very strict. It's much much better nowadays,�thanks to the ResourceWarning in 3.x for example (most changes in modified-2.7 are related to the garbage collector),�and wider acceptance by the core developers of the "@impl\_detail" decorators in tests.

- 2.7 was already in maintenance mode, and such changes were not considered as bug fixes, so modified-2.7 never shrinks. �It was a bit hard to find the motivation to fix only the 3.2 version of the stdlib, which you can not even test with PyPy!

- Some modules in the stdlib�rely on specific behaviors of the VM or extension modules that are not always easy to implement correctly in PyPy.�The ctypes module is the most obvious example to me, but also the pickle/copy modules which were modified because of subtle differences around built-in methods (or was it the \_\_builtins\_\_ module?)

And oh, I almost forgot distutils, which needs to parse some Makefile which of course does not exist in PyPy.

- Differences between C extensions and pure Python modules are sometimes considered "undefined behaviour" and are rejected. See issue13274, this one has an happy ending, but I�remember�that the \_pyio.py module chose to not fix some obscure reentrancy issues (which I completely agree with)

--
Amaury Forgeot d'Arc