[Python-Dev] Module renaming and pickle mechanisms (original) (raw)

M.-A. Lemburg mal at egenix.com
Sun May 18 14:20:29 CEST 2008


On 2008-05-17 16:59, Alexandre Vassalotti wrote:

On Sat, May 17, 2008 at 5:05 AM, M.-A. Lemburg <mal at egenix.com> wrote:

I'd like to bring a potential problem to attention that is caused by the recent module renaming approach:

Object serialization protocols like e.g. pickle usually store the complete module path to the object class together with the object. Thanks for bringing this up. I was aware of the problem myself, but I hadn't yet worked out a good solution to it. It can also happen in storage setups where Python objects are stored using e.g. pickle, ZODB being a prominent example. As soon as a Python 2.6 application starts writing to such storages, Python 2.5 and lower versions will no longer be able to read back all the data. The opposite problem exists for Python 3.0, too. Pickle streams written by Python 2.x applications will not be readable by Python 3.0. And, one solution to this is to use Python 2.6 to regenerate pickle stream. Another solution would be to write a 2to3 pickle converter using the pickletools module. It is surely not the most elegant or robust solution, but I could work.

I'm not really worried much about going from 2.x to 3.x. Breakage is allowed for that transition.

However, the case is different for going from 2.5 to 2.6. Breakage should be avoided if at all possible.

Now, I think there's a way to solve this puzzle:

Instead of renaming the modules (e.g. Queue -> queue), we leave the code in the existing modules and packages and instead add the new module names and package structure with pointers and redirects to the existing 2.5 modules. This would certainly work for simple modules, but what about packages? For packages, you can't use the sys.modules[_name_] = Queue to preserve module identity. Therefore, pickle will use the new package name when writing its streams. So, we are back to the same problem again. A possible solution could be writing a compatibility layer for the Pickler class, which would map new module names to their old at runtime. Again, this is neither an elegant, nor robust, solution, but it should work in most cases.

While it's possible to fix pickle (at least the Python version), this would not help with other serialization formats that rely on the .module attribute mapping to an existing module.

It's better to address the problem at the module level.

Perhaps I have a misunderstanding of the reasoning behind doing the renaming in the 2.x branch, but it appears that the only reason is to get used to the new names. That's a rather low priority argument in comparison to the breakage the renaming will cause in the 2.x branch.

I think it's much better to have 2to3.py do the renaming and only add warnings to the renamed modules in 2.x (without actually applying any renaming).

It would also be possible to seed sys.modules with module proxy objects (see e.g. mx.Misc.LazyModule from egenix-mx-base) which only turn into real module object if the module is referenced.

This would allow adding a "from future import new_module_names" which then results in loading proxies for all renamed modules (without actually loading the modules until they are used under their new names).

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, May 18 2008)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
        Registered at Amtsgericht Duesseldorf: HRB 46611


More information about the Python-Dev mailing list