[Python-3000] pep 3124 plans (original) (raw)

Phillip J. Eby pje at telecommunity.com
Mon Jul 30 21:45:33 CEST 2007

Previous message: [Python-3000] pep 3124 plans
Next message: [Python-3000] pep3115 - metaclasses in python 3000
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

At 02:20 PM 7/30/2007 -0400, Jim Jewett wrote:

On 7/21/07, Phillip J. Eby <pje at telecommunity.com> wrote:

>... If you have to use @somegeneric.before and > @somegeneric.after, you can't decide on your own to add > @somegeneric.debug. > However, if it's @before(somegeneric...), then you can add > @debug and @authorize and @discount and whatever else > you need for your > application, without needing to monkeypatch them in. I honestly don't see any difference here. @somegeneric.method implies that somegeneric is an existing object, and even that it already has rules for combining .before and .after; it can just as easily have a rule for combining arbitrary methods.

I don't understand what you're saying or how it relates to what I said above.

If you define a new kind of method qualifier (e.g. @discount), then all existing generic functions aren't suddenly going to grow a '.discount' attribute. That's what the above discussion is about -- how you access qualifier decorators.

If you're saying that @discount could include its own combination rules, then each method needs to repeat the boilerplate to pick apart the current decision tree.

Still don't understand you. Method combination is done with a generic function called "combine_actions" which takes two arbitrary "method" objects and returns a new "method" representing their combination. There is no boilerplate or picking anything apart.

The only compensating "advantage" I see is that the decision tree could be changed arbitrarily from anywhere, even as "good practice." (Since my new @thumpit decorator would takes the generic as an argument, you won't see the name of the generic in my file; you might never see it there was iteration involved.)

Decision trees are generated from a flat collection of rules; they're not directly manipulated. In the default implementation (based on Guido's prototype), the "tree" is just a big dictionary mapping tuples of types to "method" objects created by combining all the methods whose signatures are implied by that tuple of types. It's also sparse, in that it doesn't contain type combinations that haven't been looked up yet. So there isn't really any tree that you could "change" here.

There's just a collection of rules, where a rule consists of a predicate, a definition order, a "body" (function), and a method factory. A predicate is a collection of possible signatures (e.g. the sequence of applicable types) -- i.e., an OR of ANDs.

To actually build a tree, rules are turned into a set of "cases", where each case consists of one signature from the rule's predicate, plus a method instance created using the signature, body, and definition order. (Not all methods care about definition order, just ones like before/after.)

In the default engine (loosely based on Guido's prototype), these cases are merged by using combine_actions() on any cases with the same signature, and stored in a dictionary called the "registry". The registry is built up incrementally as you add methods.

When you call the function, a type tuple is built and looked up in the cache. If nothing is found in the cache, we loop over the entire registry, and build up a derived method, like this (actual code excerpt):

 try:
     f = cache[types]
 except KeyError:
     # guard against re-entrancy looking for the same thing...
     action = cache[types] = self.rules.default_action
     for sig in self.registry:
         if sig==types or implies(types, sig):
             action = combine_actions(action, self.registry[sig])
     f = cache[types] = action
 return f(*args)

The 'self.rules.default_action' is to method objects what zero is to numbers -- the start of the summing. Ordinarily, the default action is a NoMethodFound object -- a perfectly valid "method" implementation whose behavior is to raise an error. All other method types have higher combination precedence than NoMethodFound, so it always sinks to the end of any combination of methods.

The relevant generic functions here are implies(), combine_actions(), and overrides() -- where combine_actions() calls overrides() to find out which action should override the other, and then returns overriding_action.override(overridden_action).

The overrides() relationship of two actions of the same type (e.g. two Around methods), is defined by the implies() relationship of the action signatures. For Before/After methods, the definition order is used to resolve any ambiguity in the implies().

The .override() of a method is usually a new instance of the same method type, but with a "tail" that points to the overridden method, so that next_method will do the right thing.

There are more details than this, of course, but the point is that method combination is 100% orthogonal to the dispatch tree mechanism. You can build any kind of dispatch engine you want, just by using combine_actions to combine the actions. The action types themselves only need to know how to .override() a lower precedence method and .merge() with a same-precedence method. And there needs to be an overrides() relationship defined between all pairs of method types, but in my current version of the implementation, overrides() is automatically transitive for any type-level relationship.

So if you define a type that overrides Around, then it also overrides anything that Around overrides. So, for the most part you just say what types you want to override (and/or be overridden by), and maybe add a rule for how to compare two methods of your type (if the default of comparing by the implies() of signatures isn't sufficient).

The way that generic functions make this incredible orthogonality and flexibility possible is itself an argument for generic functions, IMO. Certainly, it's a hell of an argument for implementing generic functions in terms of other generic functions, which is why I did it. It beats the crap out of my previous implementation approaches, which had way too much coupling between method combination and tree-building and rules and cases and whatnot.

Separating these ideas into different functional/conceptual domains makes the whole thing easier to understand -- as long as you're not locked into procedural-implementation thinking. If you want to think step-by-step, it's potentially a vast increase in complication. On the other hand, it's like thinking about reference counting while writing Python code. Sure, you need to drop down to that level every now and then, but it's a waste of time to think about it 90% of the time. Being able to have a class of things that you don't think about is what makes Python a higher-level language than the C it's implemented with.

In the same way, generic functions are a higher-level version of OO -- you get to think in terms of a domain's abstract operations, like implication, overriding, and combination in this example.

The domain abstractions are not an "interface", nor are they methods or object types. They're more like "concepts", except that the term "concept" has been abused to refer to much lower-level things that can attach to only one object within an operation.

The concept of implication is that there are imply-ers and imply-ees -- a role for each argument, each of which is an implicit interface or abstract object type.

In traditional OO and even interfaces, there are considerable limits to your ability to specify such partial interfaces and the relationships between them, forcing you to choose arbitrary and implementation-defined organization to put them in. You then have to force-fit objects to have the right methods, because you didn't define an x.is_implied_by(y) relationship, only a x.implies(y) relationship.

Thing is, a relationship doesn't belong to one side or the other -- it's a relationship. A third, independent thing. Like a GF method.

In any program, these relationships already exist, and you still have to understand them. They're just forced into whatever pattern the designer chose or had thrust upon them to make them fit the at-best-binary nature of OO methods, instead of called out as explicit relationships, following the form of the problem domain.

I realize that subclasses are theoretically just as arbitrary, but they aren't in practice.

Right -- and neither are generic functions in normal usage. The only reason you think that subclasses aren't arbitrary is because you're used to the ways that things get force-fitted into those relationships. Whereas, with GF's, the program can simply model the application domain relationships, and you're going to know what patterns will follow because they'll reflect the application domain.

For example, if you see implies() and combine_actions() and overrides(), are you going to have any problems knowing when you see a type, whether these GF's might have methods for that type? You'll know when to look for such a method, because you know what roles the arguments play in each GF. If the type might play such a role, then you'll want to know how it plays that role in connection with specific collaborators or circumstances -- and you'll know what method implementations to look for.

It's ridiculously simple in practice, even though it sounds hard in theory. That's the very problem in fact -- in neither subclassing nor GF's can you solve such problems in theory. You can only solve them in practice, because it's only in the context of a specific program that you have any domain knowledge to apply -- i.e., knowledge about what general kinds of things the program is supposed to do and what general kinds of things it does them with.

If you have that general knowledge, it's just as easy to handle one organization as the other -- but the GF-based version gives you the option of having a module that defines lots of basic "kinds of things it's supposed to do" up front, so that you have an idea of how to understand the "things it does them with" when you encounter them.

You can certainly say now that configuration specialization should be in one place, and that dispatching on parameter patterns like

(* # ignored , :int # actual int subclass , :Container # meets the Container ABC , 4<val<17.3 # value-specific rule ) is a bad idea

But I don't say that. What I say is that in practice, there are only a few natural places to put such a definition:

near the definition of Container (or int, but that's a builtin in this case)
near the definition of the generic function being overloaded
in a "concern-based" grouping, e.g. an appropriate module that groups together matters for some application-domain concept. (For example, an "ordering_policy" module might contain overrides for a variety of generic functions that relate to inventory, shipping, and billing, within the context of placing orders.)
in an application-designated catchall location

Which of these locations is "best" depends on the overall size of the program. A one-module program is certainly small enough to not need to pick one. As a system gets bigger, some of the other usage patterns become more applicable.

-- but whenever I look at an application from the outside, well-organized configuration data is a rare exception.

That may be -- but one enormous advantage of generic functions is that you can always relocate your method definitions to a different module or different part of the same module without affecting the meaning of the program, as long as all the destination modules are imported by the time you execute any of the functions.

In other words, if a program is messy, you can clean it up -- heck, it's potentially safer to do with an automatic refactoring tool, than other types of refactorings in Python. (e.g., changing the signature of a 'foo()' method is difficult to do safely because you don't necessarily know whether two arbitrary methods named 'foo' are semantically the same, whereas generic functions are objects, not names.)

Previous message: [Python-3000] pep 3124 plans
Next message: [Python-3000] pep3115 - metaclasses in python 3000
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-3000 mailing list