[Python-ideas] simplification pep-like thing (original) (raw)

tomer filiba tomerfiliba at gmail.com
Fri Jan 26 16:13:12 CET 2007


this text needs more thought and rephrasing, but i think that it covers all the details.


Abstract

The simplification API provides a mechanism to get the state of an object, as well as reconstructing an object with a given state. This new API would affect pickle, copy_reg, and copy modules, as well as the writers of serializers.

The idea is to separate the state-gathering from the encoding:

This is somewhat similar to the ISerializable interface of .NET, which defines only GetObjectData(); the actual encoding is done by a Formatter class, which has different implementations for SOAP-formatting, binary-formatting, and other formats.

Motivation

There are many generic and niche serializers out there, including pickle, banana/jelly, Cerealizer, brine, and lots of serializers targeting XML output. Currently, all serializers have their own methods of retrieving the contents, or state of the object.

This API attempts to solve this issue by providing standard means for getting object state, and creating objects with their state restored.

Another issue is making the serialization process "proxy-friendly". Many frameworks use object proxies to indirectly refer to another object (for instance, RPC proxies, FFI, etc.). In this case, it's desirable to simplify the referenced object rather than the proxy, and this API addresses this issue too.

Simplification

Simplification is the process of converting a 'complex' object into its "atomic" components. You may think of these atomic components as the contents of the object.

This proposal does not state what "atomic" means -- this is open to the decision of the class. The only restriction imposed is, collections of any kind must be simplified as tuples.

Moreover, the simplification process may be recursive: object X may simplify itself in terms of object Y, which in turn may go further simplification.

Simplification Protocol

This proposal introduces two new special methods:

def __simplify__(self):
    return type, state

@classmethod
def __rebuild__(cls, state):
    return new_instance_of_cls_with_given_state

simplify takes no arguments (bar 'self'), and returns a tuple of '(type, state)', representing the contents of 'self':

rebuild is a expected to be classmethod that takes the state returned by simplify, and returns a new instance of 'cls' with the given state.

If a type does not wish to be simplified, it may throw a TypeError in its simplify method; however, this is not recommended. Types that want to be treated as atomic elements, such as file, should just return themselves, and let the serializer handle them.

Default Simplification

All the built in types would grow a simplify and rebuild methods, which would follow these guidelines:

Primitive types (int, str, float, ...) are considered atomic.

Composite types (I think 'complex' is the only type), are broken down into their components. For the complex type, that would be a tuple of (real, imaginary).

Container types (tuples, lists, sets, dicts) represent themselves as tuples of items. For example, dicts would be simplified according to this pseudocode:

def PyDict_Simplifiy(PyObject * self):
    return PyDictType, tuple(self.items())

Built in types would be considered atomic. User-defined classes can be simplified into their metaclass, bases, and dict.

The type 'object' would simplify instances by returning their dict and any slots the instance may have. This is the default behavior; classes that desire a different behavior would override simplify and rebuild.

Example of default behavior: >>> class Foo(object): ... def init(self): ... self.a = 5 ... self.b = "spam" ... >>> f = Foo() >>> cls, state = f.simplify() >>> cls <class '__main__.Foo'> >>> state {"a" : 5, "b" : "spam"} >>> shallow_copy = cls.rebuild(state) >>> state.simplify() (<type 'dict'>, (("a", 5), ("b", "spam")))

Example of customized behavior >>> class Bar(object): ... def init(self): ... self.a = 5 ... self.b = "spam" ... def simplify(self): ... return Bar, 17.5 ... @clasmethod ... def rebuild(cls, state) ... self = cls.new(cls) ... if state == 17.5: ... self.a = 5 ... self.b = "spam" ... return self ... >>> b = Bar() >>> b.simplify() (<class '__main__.Bar'>, 17.5)

Code objects

I wish that modules, classes and functions would also be simplifiable, however, there are some issues with that:

It would be nice if .pyc files where generated like so: import foo pickle.dump(foo)

It would also allow sending of code between machines, just like any other object.

Copying

Shallow copy, as well as deep copy, can be implemented using the semantics of this new API. The copy module should be rewritten accordingly: def copy(obj): cls, state = obj.simplify() return cls.rebuild(state)

deepcopy() can be implemented similarly.

Deprecation

With the proposed API, copy_reg, reduce, reduce_ex, and possibly other modules become deprecated.

Apart from that, the pickle and copy modules need to be updated accordingly.

C API

The proposal introduces two new C-API functions: PyObject * PyObject_Simplify(PyObject * self); PyObject * PyObject_Rebuild(PyObject * type, PyObject * state);

Although this is only a suggestion. I'd like to hear ideas from someone with more experience in the core.

I do not see a need for a convenience routine such as simplify(obj) <--> obj.simplify(), since this API is not designed for everyday usage. This is the case with reduce today.

Object Proxying

Because simplify returns '(type, state)', it may choose to "lie" about it's actual type. This means that when the state is reconstructed, another type is used. Object proxies will use this mechanism to serialize the referred object, rather than the proxy.

class Proxy(object): [...] def simplify(self): # this returns a tuple with the real type and state return self.value.simplify()

Serialization

Serialization is the process of converting fully-simplified objects into byte sequences (strings). Fully simplified objects are created by a recursive simplifier, that simplifies the entire object graph into atomic components. Then, the serializer would convert the atomic components into strings.

Note that this proposal does not define how atomic objects are to be converted to strings, or how a 'recursive simplifier' should work. These issues are to be resolved by the implementation of the serializer.

For instance, file objects are atomic; one serializer may be able to handle them, by storing them as (filename, file-mode, file-position), while another may not be, so it would raise an exception.

Recursive Simplifier

This code demonstrates the general idea of how recursive simplifiers may be implemented:

def recursive_simplifier(obj): cls, state = obj.simplify()

# simplify all the elements inside tuples
if type(state) is tuple:
    nested_state = []
    for item in state:
        nested_state.append(recursive_simplifier(item))
    return cls, nested_state

# see if the object is atomic; if not, dig deeper
if (cls, state) == state.__simplify__():
    # 'state' is an atomic object, no need to go further
    return cls, state
else:
    # this object is not atomic, so dig deeper
    return cls, recusrive_simplifier(state)

-tomer



More information about the Python-ideas mailing list