[Python-Dev] Re: Of slots and metaclasses... (original) (raw)

Guido van Rossum guido@python.org
Thu, 28 Feb 2002 16:51:45 -0500


[me]

> A new-style class, with or without slots, should be considered > no different from a new-style built-in type, except that all of > the methods happen to be defined in Python (except maybe for > inherited methods).

[Kevin]

Sure. Except that I also want to be able to extend existing new-style classes/types in C, as well as Python. Here is how I do it now (minus error checking and ref-counting):

static PyMethodDef PyRowmethods[] = { {"init", (PyCFunction)rowinit, METHVARARGS}, {"repr", (PyCFunction)rowstrrepr, METHNOARGS }, {"getitem", (PyCFunction)rowgetitem, METHVARARGS} /* etc... */ } PyRowType = (PyTypeObject*)PyTypeType.tpcall((PyObject*)&PyTypeType,args, NULL) /* Methods must be added after PyRowType has been created since the type is an argument to PyDescrNewMethod */ dict = PyRowType->tpdict; meth = PyRowmethods; for (; meth->mlname != NULL; meth++) { PyObject* method = PyDescrNewMethod(PyRowType, meth); PyDictSetItemString(dict,meth->mlname,method); }

Heh?!?!!! Why can't you declare PyRow_Type as a statically initialized struct like all extensions and the core do?

[snip]

Sure. I was just hoping to have that list of descriptors pre-computed and stored in the class (like mro).

mro gets used all the time; on every method lookup at least. The list of instance variable descriptors is only interesting to a small number of highly introspective tools.

I suppose the question is why even expose slots if it is so worthless?

It's found in the dict when the class is defined. Why delete it? The idea is that you can make it a dict that has other info about the slots. It's got a foo name. I can give it any semantics I damn well please. :-)

> If the descriptors don't tell you everything you need, too bad -- > some types just are like that.

This has never been a concern of mine -- I don't mind if the C implementation chooses to hide things.

Exactly, and I'm telling you to have the same attitude about slots. Let me repeat something I just sent someone else about slots:

It seems that unfortunately slots is Python 2.2's most misunderstood feature...

I see it as a hack that lets me define a special-purpose class whose instances are (almost) as efficient as I can do using C, but without having to write a C extension. (I say "almost", because a C extension can store simple values as C ints, while slots only lets you store PyObject pointers. But still, it's a big savings compared to adding a dict to every instance, and sometimes the slot value is picked from a small number of interned or cached ints or strings.)

It has different semantics from regular attributes, and I don't try to hide that: introspection doesn't find slots the same way as it finds regular instance vars, you can't provide a default via a class variable, and there are a bunch of "don't do that" things like modifying slots of an existing class or overriding a slot defined by a base class. (There's a whole list of warnings in http://www.python.org/2.2/descrintro.html!)

I think as such, the feature is just right (except for the no-pickling bug). It's unfortunate that people have jumped on it as the answer to all their questions. I guess that means there's a big demand for more control over instance variables -- whether that demand is created by a real need or simply because that's how most other languages do it remains to be seen...

> Why do I reject your suggestion of making slots (more) usable > for introspection? Because it would create another split between > built-in types and user-defined classes: built-in types don't have > slots, so any strategy based on slots will only work for > user-defined types. And that's exactly what I'm trying to avoid!

Well, I'm busing creating C extension types that do have slots! One of my many current projects is to create a better type to store the results of relational database queries. I want the memory efficiency of tuples and the ability to query by name (via getitem or getattr). So I basically need to re-invent a magic tuple type that adds descriptors for every named field. Strangely enough, this is basically what the slots mechanism does. I do realize that I could accomplish the same end by sub-classing tuple and adding a bunch of descriptors.

Note that there's something already there that you might reuse: Objects/structseq.c, which is used to create the return values of localtime(), stat() and a few others in a way that looks both like a tuple and like a read-only record. It may not be powerful enough because I think the assumption is that the set of field names is static, but you may be able to extend it or copy some good ideas.

(Just don't try to understand what it does to make the tuple shorter than the record in some cases -- that's for backwards compatibility because lots of code would break if e.g. struct() returned a longer tuple than in previous Python versions, but we still want to provide new fields when using named fields. This part is not for the weak of heart, and I didn't write it, and can't guarantee that it's 100% bugfree.)

[items I rejected]

> - Alter vars(obj) to return a dict of all attrs

Ok, I'm a little baffled by this. Why not?

Currently, the assumption is that vars() returns a dict that can be modified to modify the underlying object's attributes. If it were to return a synthetic dict, that wouldn't work, or it would require more implementation effort than I care for -- again, since I doubt there is much demand for this outside a small set of introspection tools.

> I'll be the first to admit that some details are broken in 2.2. > > In particular, the fact that instances of classes with slots > appear picklable but lose all their slot values is a bug -- these > should either not be picklable unless you add a reduce method, > or they should be pickled properly.

My vote is that they should be pickled properly by default. In my mind, slots are a more static type of attribute. Since they are more static, my feeling is that they should be as or more accessible than dict attributes. Descriptors are fine for handing the black magic of making them addressable by name, but it just feels wrong to hide them from access by other means. Of course, I am really talking about slots defined at the Python level -- not necessarily all storage allocated in the 'members' array.

Slots share their descriptor implementation with anything defined by the tp_members array in a type object. E.g. file.softspace is a descriptor of the same type as used by slots. What they share is that they refer to "real" data stored in the instance -- either a PyObject* or some basic C type like int or double. I don't want to trust that slots has the right data: even if I made it immutable, someone could still do C.dict['slots'] = , and I don't want to go so far as to make slots a property stored in the type object. So I can't really tell which descriptors are slots and which are other things -- and I don't want to, because I believe that would be breaking through an abstraction.

Unless attribute access becomes scoped based on the static type of the method, then I think it is a bug. Re-declared slots become effectively orphaned and just waste memory. Coalescing them or raising an exception when they are re-declared seem much better alternatives.

It's a bug to redeclare a slot. I don't find it Python's job to make it an error.

> I think you're mostly right with your proposal "Update standard > library to use new reflection API". Insofar as there are standard > support classes that use introspection to provide generic services > for classic classes, it would be nice of these could work > correctly for new-style classes even if they use slots or are > derived from non-trivial built-in types like dict or list.> This > is a big job, and I'd love some help. Adding the right things to > the inspect module (without breaking pydoc :-) would probably be a > first priority.

Well, I'm happy to contribute, though my primary concern (other than correctness and completeness) is efficiency. The whole reason I'm using slots is to save space when allocating huge numbers of fairly small objects. I believe that there is a big performance difference between being able to pickle based on arbitrary descriptors and pickling just slots. Slots are already nicely laid out in rows, just waiting to be plucked out and stuffed into a pickle. Even without flattened slots lists, it is a fast and trivial operation to iterate over a class and all its bases and extract slots. Doing so over dictionaries is not nearly so trivial.

I think you're overstating the simplicity of pickling slots. There is no guarantee that the slots of a derived class are contiguous with the slots of a base class; a weakref and a dict field may be placed in between, and another metaclass could add other things. For example, you could write a metaclass in C that took the slots idea one step further and let you declare the types of the slots as basic C types, so that other structmember keys could be used, e.g. T_INT or T_FLOAT.

If you want your instances to be pickled efficiently, you should write a custom reduce method in C anyway -- right now, new-style classes are pickled by a piece of Python code at the end of copy_reg.py.

> Maybe you can formulate it as a set of tentative clarifying > patches to PEPs 252, 253, and 254?

To be honest, I forgot that those PEPs existed! I've been working off of the Python 2.2 source and the tutorials. I'll read them over tonight and see.

I had a feeling you were missing something basic. :-)

When I say SOMMCP, I really mean the "metaclass protocol" defined by the various postulates and theorems in the first few chapters of the book.

As I said, I don't have the whole set in my head, so you'll have to be more specific in your questions. (Basically, I don't expect to be adding much from the book, but I'll be looking to the book for clues as we find problems with how things are implemented now, e.g. the automatically derived metaclass issue below.)

> - I currently don't complain when there are serious order > disagreements. I haven't decided yet whether to make these an > error (then I'd have to implement an overridable way of defining > "serious") or whether it's more Pythonic to leave this up to the > user.

Sure -- I noticed this. Maybe you should store the order-safety in the metaclass? That way, the user can inspect it when they decide it is important.

You mean in the class object? I'm not sure what you mean by "storing the order-safety". I currently don't calculate whether there are any order conflicts: serious_order_disagreements() returns 0 without doing anything. Someone who wants it can easily implement the check from the book though.

> - I don't enforce any of their rules about cooperative methods. > This is Pythonic: you can be cooperative but you don't have to > be. It would also be too incompatible with current practice (I > expect few people will adopt super().)

I agree with most of that, except that I expect that MANY people will start using 'super'.

I doubt it with the current super(Class,self).method(args) notation. Probably they will once super is a keyword so you can write super.method(args).

I've trained an office full of Java programmers to program in Python and they are always complaining about the lack of super calls. Also, I've always considered this idiom ugly and hackish:

def Foo(Bar,Baz): def init(self): Bar.init(self) Baz.init(self)

Strange that you mention Java in the same paragraph as an example using multiple inheritance. ;-/

Also note that this is pretty much what C++ wants you to do, except it uses '::' instead of '.' and doesn't require you to pass self (which is a different issue).

I don't see this as a serious issue, just syntactic sugar.

Its so much better as:

def Foo(Bar,Baz): def init(self): # when super becomes a keyword and we write nice cooperative init # methods super.init(self)

But that's not what you'd be writing -- you'd be writing super.init().

> - I don't automatically derive a new metaclass if multiple base > classes have different metaclasses.

I have my own ideas about this, but like you, don't have enough experience with them in practice to do anything about it.

Can you share them? This might be interesting.

> Since I expect that non-trivial metaclasses are > often implemented in C, I'm not so comfortable with automatically > merging multiple metaclasses -- I can't prove to myself that it's > always safe.

It is always safe when the assumption of monotonicity is not violated.

And that we can't know.

> - I don't check that a base class doesn't override instance > variables. As I stated above, I don't think I should, but I'm not > 100% sure.

Do you mean slots or all Python instance attributes in this statement?

I just meant slots, but in a sense it's also true for other ivars: if you don't know that your base class defines an ivar 'foo', you might create your own ivar named 'foo' and use it in a way that's inconsistent with the base class. Because there are no type checks and no ivar declarations, that's much harder to avoid in Python than in more static languages like C++ or Java (I assume those will complain when you redefine an ivar, even with the same type).

> > 3) Do you intend to enforce monotonicity for all methods and > > slots? (Clearly, this is not desirable for instance > > dict attributes.) > > If I understand the concept of monotonicity, no. Python > traditionally allows you to override methods in ways that are > incompatible with the contract of the base class method, and I > don't intend to forbid this.

For Python, monotonicity means that the instance attributes and instance methods of a class are a superset of those of all its ancestors. This is not the way that normal dict attributes work in Python, so lets talk only about slots when discussing monotonic properties.

I'm not sure what you mean by "this is not the way that normal dict attrs work", unless you are talking about overriding init without calling the base class init (and perhaps the same for other methods), which of course can mean that a derived class instance lacks an ivar that a base class instance would have. This is Pythonic freedom IMO.

Since it's not true for regular ivars, why worry about it for slots?

In order words, it means that the metaclass interface does not provide a way to delete a slot or a method, only ways to add and override them. Combined with some static type information, the assumption of monotonicity will be very helpful when we can eventually compile Python.

I don't think we should be guided here by what might be needed by a compiler. Without actually trying to build a compiler, we'll probably miss important requirements that mean we'll have to change the language anyway, and we'll impose requirements that we think might be important without a good reason. (E.g. structured programming was once thought as an aid to compiler technology as well as to the human reader. Nowadays, optimizers reduce all control flow to labels and goto statements. :-)

> It would be good if PyChecker checked for accidental mistakes in > this area, and maybe there should be a way to declare that you do > want this enforced; I don't know how though.

I have a pretty good idea how. Its essentially a proof-based method that works by solving metatype constraints.

Isn't that how most of PyChecker works? At least the proof-base part?

> There's also the issue that (again, if I remember the concepts right) > there are some semantic requirements that would be really hard to > check at compile time for Python.

True for dict instance attributes, not for slots!

Again, you're trying to hijack slots for purposes for which they weren't created. Think of slots as an efficiency hack, not as a better way to declare ivars.

> > 4) Should descriptors work cooperatively? i.e., allowing a > > 'super' call within get and set. > > I don't think so, but I haven't thought through all the > consequences (I'm not sure why you're asking this, and whether > it's still a relevant question after my responses above). You can > do this for properties though.

class Foo(object): slots=() a = 1 class Bar(Foo): slots = ('a',) bar = Bar() print dir(a) print a

That's a NameError, I suppose you meant 'bar' instead of 'a' in the last two lines, then it makes sense. :-)

The resolution rule for descriptors could work cooperatively to find Foo's class attribute 'a' instead of giving up with an AttributeError.

Once a descriptor is found, that's the end of the line. When you find a method, you call it, and it raises an exception, you're not going to continue looking for a base class method either!

The descriptor type used to implement slots could do this, but doesn't. I don't care about this feature. With a dict, there's some real saving in not storing default values, since it means a smaller dict, which can save space. The slot space is always there, so you might as well initialize it.

Concluding: don't expect that you can take an arbitrary class, analyze what ivars it uses, and add a slots variable to speed it up. There are lots of differences in semantics when you use slots, and I don't want to hide those.

--Guido van Rossum (home page: http://www.python.org/~guido/)