[Python-Dev] Classes and Metaclasses in Smalltalk (original) (raw)

Guido van Rossum guido@digicool.com
Tue, 01 May 2001 19:52:29 -0500


Jim Althoff (a big commercial user of J[P]ython) sent me a summary of how metaclasses work in Smalltalk. He should know, since he invented them! :-) I include it below, with his permission.

While implementing more class-like behavior for built-in types in the experimental descr-branch in the 2.2 CVS tree, I've noticed problems caused by Python's collapsing of class attributes and instance attributes.

For example, suppose d is a dictionary. My experimental changes make d.class return DictType (from the types module). (DictType.class is TypeType, by the way.) I also added special methods. For example, d.repr() now returns repr(d). I am preparing for subclassing of built-in types, so I will eventually be able to derive a class MyDictType from DictType, as follows:

class MyDictType(DictType): ...

Now comes the fun part. Suppose MyDictType wants to define its own repr():

class MyDictType(DictType): def repr(self): return "MyDictType(%s)" % DictType.repr(self)

But, (surprise, surprise!), DictType itself also has a repr() method: it returns the string "<type 'dictionary'>".

So the above code would fail: DictType.repr() returns repr(DictType), and DictType.repr(self) raises an argument count error. The correct repr method for dictionary objects can be found as DictType.dict['repr'], but that looks hideous!

What to do? Pragmatically, I can make DictType.repr return DictType.dict['repr'], and all will be well in this example. But we have to tread carefully here: DictType.class is TypeType, but DictType.dict['class'] is a descriptor for the class attribute on dictionary objects.

The best rule I can think of so far is that DictType.dict gives the true set of attribute descriptors for dictionary objects, and is thus similar to Smalltalks's class.methodDict that Jim describes below. DictType.foo is a shortcut that can resolve to either DictType.dict['foo'] or to an attribute (maybe a method) of DictType described in TypeType.dict['foo'], whichever is defined. If both are defined, I propose the following, clumsy but backwards compatible rule: if DictType.dict['foo'] describes a method, it wins. Otherwise, TypeType.dict['foo'] wins.

Sigh.

--Guido van Rossum (home page: http://www.python.org/~guido/)

------------------------- Jim Althoff's message ---------------------------

Hi Guido,

I was reading the discussion on class methods in the python-dev archive and noticed your question about how Smalltalk determines the difference between instance methods and class methods. I have some info on this which I can't post to python-dev, not being a member; but I thought you might be interested in it anyway.

It turns out that I am the one that devised metaclasses in Smalltalk-80. (On the other hand, I haven't looked at any Smalltalk implementation code in a long time so this is merely a description of how it all started.)

Basically (I think) Smalltalk doesn't have the ambiguity you mention for instance methods versus class methods (as Python would) because Smalltalk doesn't do method lookup the same as Python does.

To illustrate, suppose you have object.method() (using Python-style syntax)

The Smalltalk method lookup is as follows: o find the class that object is an instance of -- this resulting thing is a "class object" (a first-class object, same as in Python) o since class is a "class object" one of its fields will be a dict of methods -- let's call it class.methodDict o find method in class.methodDict o if found, execute method on object o if not, do the same thing traversing the (single inheritance) superclass chain (follow class.superClass)

I believe Python works roughly as follows (Just testing my own understanding here -- correct me if I don't get it right): o convert (conceptually at least) object.method() into object. class.method(object) o find a function corresponding to method in object.class.dict o if found, execute the found function (with object bound as the first arg to function) o if not, traverse the (multiple inheritance) superclass chain (depth first)

I think the key difference is that Python treats object.method() the same as it treats object.class.method(object). Smalltalk doesn't do this. In Smalltalk, object.class.method(object) would mean: o consider object.class to be an "object" like any other "object" in Smalltalk (which it is) o get the "class object" of object.class , namely object. class.class__ o find method in object.class.class.methodDict o if found, execute the method on object.class o if not, do the same thing traversing the (single inheritance) superclass chain (follow object.class.class.superClass)

In other words, it exactly the same lookup mechanism. So there is no ambiguity.

To summarize, in Smalltalk:

o instance methods (for instances that are not "class objects") are specified by: instance.instanceMethod()

o class methods are specified by: class.classMethod()

o both of these are just object.objectMethod() since classes are objects and the method lookup mechanism is no different from that of any other kind of object.

A concrete example:

If I have a class Date in Smalltalk and an instance of it referenced by variable, d. I would do: o d.followingDate() for an instance method, and o Date.currentDate() for a class method

I think this is a nice, conceptually simple model. Things get interesting, though, when you start to consider how the mechanism of class. class -- which is the thing that makes class methods no different than instance methods -- actually works. And this leads to metaclasses in Smalltalk.

Here's a rough sketch of how metaclasses work:

Standard principles of Smalltalk: o everything is an object (first-class) o every object is an instance of a class o a class inherits (single-inheritance) from its superclass (except the root class Object, which has no superclass) o methods can be invoked on a object. All such methods are defined as part of the object's class definition (or a class going up the superclass chain)

Because of the first 2 principles above: o every class is an object (because everything is an object) o every class is, itself, an instance of some class (because every object is an instance of a class)

Originally in Smalltalk-76, there was one metaclass, Class. All classes (class objects) were instances of Class. Class was an instance of itself. Class had methods defined for it just like all classes did. In particular, it had a method "new" -- this being the method that creates instances of classes. So suppose you had class Rectangle. Rectangle is an instance of Class (hence it is a class object). If you wanted to create an instance of Rectangle, you would do: myRect = Rectangle.new(). This would mean: "find the 'new' method in the definition of Rectangle's class (Class) and invoke it on Rectangle (which is a class object). The result is a Rectangle instance which is assigned to the variable myRect. The Rectangle class object held data (state -- same rules as any other kind of object) -- such as number and name of fields its instances would have, a dictionary of methods for its instances, etc. So the "new" method in Class would have access to all the info it needed to create a Rectangle instance (as opposed to a Point instance, for example).

The limitation with this scheme was that all classes had to share exactly the same methods, namely all the methods defined in Class. The method "new" was one of these methods along with lots of "reflection-type" methods for class creation, modification, and inspection. But if you wanted an "application-oriented" class method -- like Date.currentDate() -- you couldn't do that because then the method "currentDate" would be shared amongst all class objects (instances of Class) and wouldn't make any sense (e.g., Rectangle.currentDate()).

In Smalltalk-80 I added a more flexible mechanism which we called metaclasses (we hadn't used that terminology previously for the single Class although it was a "metaclass"). The thing that everyone in the Smalltalk development team liked about the new metaclass mechanism at the time was that it didn't require any new basic principles for Smalltalk. It was all done using the same basic principles of Smalltalk listed above. The idea was to use subclassing to allow for different methods for different instances of Class. A "metaclass" simply became a subclass of Class. Each class object then ended up being a singleton instance (although the "singleton-ness" was not mandatory) of a metaclass (i.e., a subclass of Class). So class objects were no longer all instances of the same class (Class). Each was an instance of a corresponding subclass of Class -- that is to say, an instance of a metaclass.

The Smalltalk-80 class hierarchy looked like the following: (This is actually a simplification. The actually hierarchy has a little more factoring and I changed the names for more clarity).

First a digression on some terminology: o a class is an object that can be instantiated o a metaclass is a class and one such that when it is instantiated, the instanced is itself a class o a plain-object is one that cannot be instantiated (I'm just making this term up). o a plain-class is one that is a class but is not a metaclass (making this up, too).

In the list below, indentation indicates class hieararchy (superclass -- subclass)

plain-class

o Class o Object isInstanceOf o ObjectMetaClass isInstanceOf MetaClass o Class isInstanceOf o ClassMetaClass isInstanceOf MetaClass o MetaClass isInstanceOf o MetaClassMetaClass isInstanceOf MetaClass . . . o Rectangle isInstanceOf o RectangleMetaClass isInstanceOf MetaClass o SpecializedRectangle isInstanceOf o SpecializedRectangleMetaClass isInstanceOf MetaClass All "metaclasses" are instances of MetaClass. All "plain-classes" (those that are not "metaclasses") are instances of a "metaclass". Because of this there are parallel class hierarchies between "plain-classes" and their corresponding "metaclasses". Note that MetaClass is a "plain-class" and not a "metaclass". Also note that MetaClass (being a "plain-class") is an instance of its corresponding "metaclass" MetaClassMetaClass. And MetaClassMetaClass is an instance of MetaClass (because MetaClassMetaClass _is_ a "metaclass"). The MetaClass / MetaClassMetaClass class/instance relationship is circular.

An example. If you want a Rectangle class you first make a metaclass for it, RectangleMetaClass -- actually, the system does this for you automatically as part of the class creation method implementation (when you define the class Rectangle, for example). RectangleMetaClass is an instance of MetaClass so all the methods defined in MetaClass are available to it. RectangleMetaClass can also define its own methods now (because it is a class) which would be invoked on any (typically one) instance of RectangleMetaClass, which in this case is going to be class Rectangle. You then make your Rectangle class by making an instance of RectangleMetaClass (conceptually doing: Rectangle = RectangleMetaClass.new() ). Now you can make instances of Rectangle, doing: myRect = Rectangle.new() as before. This is not so different from the Smalltalk-76 mechanism. The main advantage is that you now have a specific class, RectangleMetaClass, that can have methods specific to the class Rectangle (the instance of RectangleMetaClass). So you could define a method like "newFromPointToPoint" for example and then do: myRect = Rectangle.newFromPointToPoint(point1,point2). The meaning is the same as always: take the variable "Rectangle", find out what it is pointing to. It is pointing to an instance of the RectangleMetaClass. Find the method "newFromPointToPoint" as part of the definition of RectangleMetaClass (it being a class object). Invoke this method on the Rectangle class object -- which then creates a Rectangle instance. The same would go for the other example: Date.currentDate().

So the bottom line is (I think) that the Smalltalk method lookup mechanism doesn't have to resolve an ambiguity because all methods that get invoked on an object always come from the object's definition class (or superclass) and from no other place.

Hope this helps,

Jim