[Python-Dev] PEP 246, redux (original) (raw)

Alex Martelli aleax at aleax.it
Tue Jan 11 10:34:08 CET 2005


The volume of these discussions is (as expected) growing beyond any reasonable bounds; I hope the BDFL can find time to read them but I'm starting to doubt he will. Since obviously we're not going to convince each other, and it seems to me we're at least getting close to pinpointing our differences, maybe we should try to jointly develop an "executive summary" of our differences and briefly stated pros and cons -- a PEP is supposed to have such a section, after all. Anyway, for now, here goes another LONG mail...:

On 2005 Jan 10, at 22:38, Phillip J. Eby wrote: ...

If interfaces can ensure against Liskov violations in instances of their subclasses, then they can follow the "case (a)" fast path, sure. Inheriting from an interface (in Guido's current proposal, as per his Artima blog) is a serious commitment from the inheritor's part; inheriting from an ordinary type, in real-world current practice, need not be -- too many cases of assumed covariance, for example, are around in the wild, to leave NO recourse in such cases and just assume compliance. I understand that, sure. But I don't understand why we should add complexity to PEP 246 to support not one but two bad practices: 1) implementing Liskov violations and 2) adapting to concrete classes. It is only if you are doing both of these that this extra feature is needed.

s/support/deal with/ . If I was designing a "greenfield" system, I'd love nothing better than making a serious split between concrete classes (directly instantiable), abstract classes (subclassable), and protocols (adaptable-to). Scott Meyer's suggestion, in one of his Effective C++ books, to never subclass a concrete class, has much to recommend itself, in particular. But in the real world people do want to subclass concrete classes, just as much as they want covariance AND, I'll bet, adapting to classes, too, not just to interfaces seen as a separate category from classes. I think PEP 246 should deal with the real world. If the BDFL thinks otherwise, and believes that in this case Python should impose best practices rather than pragmatically deal with the way people's minds (as opposed to type-system maths) appear to work, I'll be glad to recast the PEP in that light.

Contrary to what you state, adapting to concrete (instantiable) classes rather than abstract (not directly instantiable) one is not necessary to make the mechanism required, by the way. Consider an abstract class such as:

class Abstract(object): def tp1(self): ''' template method 1 (calls hook method 1) ''' def tp2(self): ''' template method 2 (calls hook method 2) ''' def hook1(self): raise NotImplementedError def hook2(self): raise NotImplementedError

One could subclass it just to get tp1...:

class Dubious(Abstract): def hook1(self): ''' implementing just hook1 '''

Now, instantiating d=Dubious() is dubious practice, but, absent specific checks, it "works", as long as only d.hook1() and d.tp1() are ever called -- never d.hook2() nor d.tp2().

I would like adapt(d, Abstract) to fail. I'm not claiming that the ability to have a conform method in Dubious to specifically block this adaptation is anywhere like a perfect solution, mind you -- it does require some change to the source of Dubious, for example. I'm just saying that I think it's better than nothing, and that is where we seem to disagree.

If it were to support some kind of backward compatibility, that would be understandable. However, in practice, I don't know of anybody using adapt(x,ConcreteClass), and even if they did, the person subclassing ConcreteClass will need to change their subclass to raise LiskovViolation, so why not just switch to delegation?

Because delegation doesn't give you easy access to Template Method design patterns which may well be the one reason you're subclassing Abstract in the first place. TP hinges on a method calling some self.dothis(), self.dothat() hook methods; to get it via delegation rather than inheritance requires a more complicated arrangement where that 'self' actually belongs to a "private" concrete class which delegates some things back to the "real" class. In practice, inheritance as a means of code reuse (rather than as a pristine Liskovian assertion of purity) is quite popular because of that. C++ essentially acknowledges this fact by allowing private inheritance, essentially meaning "I'm reusing that code but don't really mean to assert IS-A"; the effects of private inheritance could be simulated by delegation to a private auxiliary class, but the extra indirections and complications aren't negligible costs in terms of code complexity and maintainability. In Python, we don't distinguish between private inheritance (to just reuse code) and ordinary inheritance (assumed to imply Liskov sub.), but that doesn't make the need go away. The conform raising LiskovViolation could be seen as a way to give the subclass the ability to say "this inheritance here is ``private'', an issue of implementation only and not Liskov compliant".

Maybe the ability to ``fake'' class can help, but right now I don't see how, because setting class isn't fake at all -- it really affects object behavior and type:

class A(object): ... def x(self): print "A" ... a = A() class B(object): ... def x(self): print "B" ... a.class = B a.x() B type(a) <class '__main__.B'>

So, it doesn't seem to offer a way to fake out isinstance only, without otherwise affecting behavior.

Anyway, it seems to me a bad idea to add complexity to support this case. Do you have a more specific example of a situation in which a Liskov violation coupled to concrete class adaptation is a good idea? Or am I missing something here?

I can give no example at all in which adapting to a concrete class is a good idea, and I tried to indicate that in the PEP. I just believe that if adaptation does not offer the possibility of using concrete classes as protocols, but rather requires the usage as protocols of some specially blessed 'interface' objects or whatever, then PEP 246 will never fly, (a) because it would then require waiting for the interface thingies to appear, and (b) because people will find it pragmatically useful to just reuse the same classes as protocols too, and too limiting to have to design protocols specifically instead. So, I see the ability to adapt to concrete (or, for that matter, abstract) classes as a "practicality beats purity" idea, needed to deal with the real world and the way people's minds work.

In practice we need covariance at least until a perfect system of parameterized interfaces is in place, and you can see from the Artima discussion just how difficult that is. I want to reuse (say) DictMixin on my mappings which restrict keys to be strings, for example, even though such a mapping does not conform to an unrestricted, unparameterized Mapping protocol.

I need to add it to the reference implementation in the PEP. I'm reluctant to just get conform from the object, though; it leads to all sort of issues with a class conforming vs its instances, etc. Maybe Guido can Pronounce a little on this sub-issue... Actually, if you looked at the field-tested implementations of the old PEP 246, they actually have code that deals with this issue effectively, by recognizing TypeError when raised by attempting to invoke adapt or conform with the wrong number of arguments or argument types. (The traceback for such errors does not include a frame for the called method, versus a TypeError raised within the function, which does have such a frame. AFAIK, this technique should be compatible with any Python implementation that has traceback objects and does signature validation in its "native" code rather than in a new Python frame.)

I do not like the idea of making 'adapt' such a special case compared with other built-in functions which internally call special methods.
What I mean is, for example:

class H(object): ... def hash(self): return 23 ... h = H() h.hash = lambda: 42 hash(h) 23

For hash, and all kinds of other built-in functions and operations, it does not matter whether instance h has its own per-instance hash -- H.hash is what gets called anyway. Making adapt work differently gives me the shivers.

Moreover, the BDFL is thinking of removing the "unbound method" concept and having such accesses as Aclass.somemethod return just a plain function. The internal typechecks done by unbound methods, on which such techniques as you mention may depend, might therefore be about to go away; this doesn't make it look nice to depend on them in a reference implementation.

If Guido pronounces otherwise, I'll gladly change the reference implementation accordingly (or remove said reference implementation, as you appear to suggest elsewhere), but unless and until this happens, I'm not convinced.

I don't see the benefit of LiskovViolation, or of doing the exact type check vs. the loose check. What is the use case for these? Is it to allow subclasses to say, "Hey I'm not my superclass?" It's also a bit confusing to say that if the routines "raise any other exceptions" they're propagated. Are you saying that LiskovViolation is not propagated?

Indeed I am -- I thought that was very clearly expressed! The PEP just said that it would be raised by conform or adapt, not that it would be caught by adapt() or that it would be used to control the behavior in that way. Re-reading, I see that you do mention it much farther down. But at the point where conform and adapt are explained, it has not been explained that adapt() should catch the error or do anything special with it. It is simply implied by the "to prevent this default behavior" at the end of the section. If this approach is accepted, the description should be made explicit, becausse for me at least it required a retroactive re-interpretation of the earlier part of the spec.

OK, I'll add more repetition to the specs, trying to make it more "sequentially readable", even though there were already criticized because they do repeat some aspects more than once.

The previous version treated TypeError specially, but I think (on the basis of just playing around a bit, admittedly) that offers no real added value and sometimes will hide bugs. See http://peak.telecommunity.com/protocolref/node9.html for an analysis of the old PEP 246 TypeError behavior, and the changes made the by PyProtocols and Zope to deal with the situation better, while still respecting the fact that conform and adapt may be retrieved from the wrong "meta level" of descriptor.

I've read that, and I'm not convinced, see above.

Your new proposal does not actually fix this problem in the absence of tpconform/tpadapt slots; it merely substitutes possible confusion at the metaclass/class level for confusion at the class/instance level. The only way to actually fix this is to detect when you have called the wrong level, and that is what the PyProtocols and Zope implementations of "old PEP 246" do. (PyProtocols also introduces a special descriptor for methods defined on metaclasses, to help avoid creating this possible confusion in the first place, but that is a separate issue.)

Can you give an example of "confusion at metaclass/class level"? I can't see it.

This should either be fleshed out to a concrete proposal, or dropped. There are many details that would need to be answered, such as whether "type" includes subtypes and whether it really means type or class. (Note that isinstance() now uses class, allowing proxy objects to lie about their class; the adaptation system should support this too, and both the Zope and PyProtocols interface systems and PyProtocols' generic functions support it.)

I disagree: I think the strawman-level proposal as fleshed out in the pep's reference implementation is far better than nothing. I'm not proposing to flesh out the functionality, just the specification; it should not be necessary to read the reference implementation and try to infer intent from it. What part is implementation accident, and what is supposed to be the specification? That's all I'm talking about here. As currently written, the proposal is just, "we should have a registry", and is not precise enough to allow someone to implement it based strictly on the specification.

Wasn't python supposed to be executable pseudocode, and isn't pseudocode an acceptable way to express specs?-) Ah well, I see your point, so that may well require more repetitious expansion, too.

I mention the issue of subtypes explicitly later, including why the pep does NOT do anything special with them -- the reference implementation deals with specific types. And I use type(X) consistently, explicitly mentioning in the reference implementation that old-style classes are not covered. As a practical matter, classic classes exist and are useful, and PEP 246 implementations already exist that work with them. Dropping that functionality is a major step backward for PEP 246, IMO.

I disagree that entirely new features of Python (as opposed to external third party add-ons) should add complications to deal with old-style classes. Heh, shades of the "metatype conflict removal" recipe discussion a couple months ago, right?-) But then that recipe WAS a "third-party add-on". If Python grew an intrinsic way to deal with metaclass conflicts, I'd be DELIGHTED if it didn't work for old-style classes, as long as this simplified it.

Basically, we both agree that adaptation must accept some complication to deal with practical real-world issues that are gonna stay around, we just disagree on what those issues are. You appear to think old-style classes will stay around and need to be supported by new core Python functionality, while I think they can be pensioned off; you appear to think that programmers' minds will miraculously shift into a mode where they don't need covariance or other Liskov violations, and programmers will happily extract the protocol-ish aspects of their classes into neat pristine protocol objects rather than trying to double-use the classes as protocols too, while I think human nature won't budge much on this respect in the near future.

Having, I hope, amply clarified the roots of our disagreements, so we can wait for BDFL input before the needed PEP 246 rewrites. If his opinions are much closer to yours than to mine, then perhaps the best next step would be to add you as the first author of the PEP and let you perform the next rewrite -- would you be OK with that?

I didn't know about the "let the object lie" quirk in isinstance. If that quirk is indeed an intended design feature, It is; it's in one of the "what's new" feature highlights for either 2.3 or 2.4, I forget which. It was intended to allow proxy objects (like security proxies in Zope 3) to pretend to be an instance of the class they are proxying.

I just grepped through whatsnew23.tex and whatsnew24.tex and could not find it. Can you please help me find the exact spot? Thanks!

The issue isn't that adaptation isn't casting; why would casting a string to a file mean that you should open that filename?

Because, in most contexts, "casting" object X to type Y means calling Y(X). Ah; I had not seen that called "casting" in Python, at least not to my immediate recollection. However, if that is what you mean, then why not say it? :)

What have you seen called "casting" in Python?

Maybe we're using different definitions of "casting"? I'm most accustomed to the C and Java definitions of casting, so that's probably why I can't see how it relates at all. :)

Well, in C++ you can call (int)x or int(x) with the same semantics -- they're both casts. In C or Java you must use the former syntax, in Python the latter, but they still relate.

If I were going to say anything about that case, I'd say that adaptation should not be "lossy"; adapting from a designator to a file loses information like what mode the file should be opened in. (Similarly, I don't see adapting from float to int; if you want a cast to int, cast it.) Or to put it another way, adaptability should imply substitutability: a string may be used as a filename, a filename may be used to designate a file. But a filename cannot be used as a file; that makes no sense.

I don't understand this "other way" -- nor, to be honest, what you "would say" earlier, either. I think it's pretty normal for adaptation to be "lossy" -- to rely on some but not all of the information in the original object: that's the "facade" design pattern, after all. It doesn't mean that some info in the original object is lost forever, since the original object need not be altered; it just means that not ALL of the info that's in the original object used in the adapter -- and, what's wrong with that?! I think we're using different definitions of "lossy", too. I mean that defining an adaptation relationship between two types when there is more than one "sensible" way to get from one to the other is "lossy" of semantics/user choice. If I have a file designator (such as a filename), I can choose how to open it. If I adapt directly from string to file by way of filename, I lose this choice (it is "lossy" adaptation).

You could have specified some options (such as the mode) but they took their default value instead ('r' in this case). What's ``lossy'' about accepting defaults?!

The adjective "lossy" is overwhelmingly often used in describing compression, and in that context it means, can every bit of the original be recovered (then the compression is lossless) or not (then it's lossy). I can't easily find "lossy" used elsewhere than in compression, it's not even in American Heritage. Still, when you describe a transformation such as 12.3 -> 12 as "lossy", the analogy is quite clear to me. When you so describe the transformation 'foo.txt' -> file('foo.txt'), you've lost me completely: every bit of the original IS still there, as the .name attribute of the file object, so by no stretch of the imagination can I see the "lossiness" -- what bits of information are LOST?

I'm not just belaboring a term, I think the concept is very important, see later.

Here's a better way of phrasing it (I hope): adaptation should be unambiguous. There should only be one sensible way to interpret a thing as implementing a particular interface, otherwise, adaptation itself has no meaning. Whether an adaptation adds or subtracts behavior, it does not really change the underlying intended meaning of a thing, or else it is not really adaptation. Adapting 12.0 to 12 does not change the meaning of the value, but adapting from 12.1 to 12 does. Does that make more sense? I think that some people start using adaptation and want to use

Definitely more sense than 'lossy', but that's only because the latter didn't make any sense to me at all (when stretched to include, e.g., opening files). Again, see later.

it for all kinds of crazy things because it seems cool. However, it takes a while to see that adaptation is just about removing unnecessary accidents-of-incompatibility; it's not a license to transform arbitrary things into arbitrary things. There has to be some meaning to a particular adaptation, or the whole concept rapidly degenerates into an undifferentiated mess.

We agree, philosophically. Not sure how the PEP could be enriched to get this across. We still disagree, pragmatically, see later.

(Or else, you decide to "fix" it by disallowing transitive adaptation, which IMO is like cutting off your hand because it hurts when you punch a brick wall. Stop punching brick walls (i.e. using semantic-lossy adaptations), and the problem goes away. But I realize that I'm in the minority here with regards to this opinion.)

I'm not so sure about your being in the minority, having never read for example Guido's opinion in the matter. But, let's take an example of Facade. (Here's the 'later' I kept pointing to;-).

I have three data types / protocols: LotsOfInfo has a bazillion data fields, including personFirstName, personMiddleName, personLastName, ... PersonName has just two data fields, theFirstName and theLastName. FullName has three, itsFirst, itsMiddle, itsLast.

The adaptation between such types/protocols has meaning: drop/ignore redundant fields, rename relevant fields, make up missing ones by some convention (empty strings if they have to be strings, None to mean "I dunno" like SQL NULL, etc). But, this IS lossy in some cases, in the normal sense: through the facade (simplified interface) I can't access ALL of the bits in the original (information-richer).

Adapting LotsOfInfo -> PersonName is fine; so does LotsOfInfo -> FullName.

Adapting PersonName -> FullName is iffy, because I don't have the deuced middlename information. But that's what NULL aka None is for, so if that's allowed, I can survive.

But going from LotsOfInfo to FullName transitively, by way of PersonName, cannot give the same result as going directly -- the middle name info disappears, because there HAS been a "lossy" step.

So the issue of "lossy" DOES matter, and I think you muddy things up when you try to apply it to a string -> file adaptation ``by casting'' (opening the file thus named).

Forbidding lossy adaptation means forbidding facade here; not being allowed to get adaptation from a rich source of information when what's needed is a subset of that info with some renaming and perhaps mixing.
I would not like that AT ALL; I believe it's unacceptable.

Forbidding indications of "I don't know" comparable to SQL's NULL (thus forbidding the adaptation PersonName -> FullName) might make the whole scheme incompatible with the common use of relational databases and the like -- probably not acceptable, either.

Allowing both lossy adaptations, NULLs, and transitivity inevitably leads sooner or later to ACCIDENTAL info loss -- the proper adapter to go directly LotsOfInfo -> FullName was not registered, and instead of getting an exception to point out that error, your program limps along having accidentally dropped a piece of information, here the middle-name.

So, I'd like to disallow transitivity.

For example, say that I have some immutable "record" types. One, type Person, defined in some framework X, has a huge lot of immutable data fields, including firstName, middleName, lastName, and many, many others. Another, type Employee, defines in some separate framework Y (that has no knowlege of X, and viceversa), has fewer data fields, and in particular one called 'fullName' which is supposed to be a string such as 'Firstname M. Lastname'. I would like to register an adapter factory from type Person to protocol Employeee. Since we said Person has many more data fields, adaptation will be "lossy" -- it will look upon Employee essentially as a "facade" (a simplified-interface) for Person. But it doesn't change the meaning. I realize that "meaning" is not an easy concept to pin down into a nice formal definition. I'm just saying that adaptation is about semantics-preserving transformations, otherwise you could just tack an arbitrary object on to something and call it an adapter. Adapters should be about exposing an object's existing semantics in terms of a different interface, whether the interface is a subset or superset of the original object's interface. However, they should not add or remove arbitrary semantics that are not part of the difference in interfaces.

OK, but then 12.3 -> 12 should be OK, since the loss of the fractionary part IS part of the difference in interfaces, right? And yet it doesn't SMELL like adaptation to me -- which is why I tried to push the issue away with the specific disclaimer about numbers.

For example, adding a "current position" to a string to get a StringIO is a difference that is reflected in the difference in interface: a StringIO is just a string of characters with a current position that can be used in place of slicing.

But interpreting a string as a file doesn't make sense because of added semantics that have to be "made up", and are not merely interpreting the string's semantics "as a" file. I suppose you could say that this is "noisy" adaptation rather than "lossy". That is, to treat a string as a file by using it as a filename, you have to make up things that aren't present in the string. (Versus the StringIO, where there's a sensible interpretation of a string "as a" StringIO.) IOW, adaptation is all about "as a" relationships from concrete objects to abstract roles, and between abstract roles. Although one may colloquially speak of using a screwdriver "as a" hammer, this is not the case in adaptation. One may use a screwdriver "as a" pounder-of-nails. The difference is that a hammer might also be usable "as a" remover-of-nails. Therefore, there is no general "as a" relationship between pounder-of-nails and remover-of-nails, even though a hammer is usable "as" either one. Thus, it does not make sense to say that a screwdriver is usable "as a" hammer, because this would imply it's also usable to remove nails.

I like the "as a" -- but it can't ignore Facade, I think.

This is why I don't believe it makes sense in the general case to adapt to concrete classes; such classes usually have many roles where they are usable. I think the main difference in your position and mine is that I think one should adapt primarily to interfaces, and

I fully agree here. I see the need to adapt to things that aren't protocols as an unpleasant reality we have to (heh) adapt to, not ideal by any means.

interface-to-interface adaptation should be reserved for non-lossy, non-noisy adapters.

No Facade, no NULLs? Yes, we disagree about this one: I believe adaptation that occurs by showing just a subset of the info, with renaming etc, is absolutely fine (Facade); and adaptation by using an allowed NULL (say None) to mean "missing information", when going to a "wider" interface, is not pleasant but is sometimes indispensable in the real world -- that's why SQL works in the real world, even though SQL beginners and a few purists hate NULLs with a vengeance.

Where if I understand the opposing position correctly, it is instead that one should avoid transitivity so that loss and noise do not accumulate too badly.

In a sense -- but that has nothing to do with concrete classes etc, in this context. All of the "records"-like datatypes I'm using around here may perfectly well be as interfacey as you please, as long as interfaces/protocols let you access attributes property-like, and if they don't just transliterate to getThis, getThat, getTheOther, no big deal.

The points are rather that adaptation that "loses" (actually "hides") some information is something we MUST have; and adaptation that supplies "I don't know" markers (NULL-like) for some missing information, where that's allowed, is really very desirable. Call this lossy and noisy if you wish, we still can't do without.

Transitivity is a nice convenience, IF it could be something that an adapter EXPLICITLY claims rather than something just happening by default. I might live with it, grudgingly, if it was the default with some nice easy way to turn it off; my problem with that is -- even if 90% of the cases could afford to be transitive, people will routinely forget to mark the other 10% and mysterious, hard-to-find bugs will result. The identical objection can be raised about the LiskovViolation mechanism, which is why I say it's not perfect by any stretch of the imagination, btw (I just think SOME mechanism to turn off the default is needed and can't think of a better one yet).

In PyProtocols docs you specifically warn against adapting from an adapter... yet that's what transitivity intrinsically does!

So, can you please explain your objections to what I said about adapting vs casting in terms of this example? Do you think the example, or some variation thereof, should go in the PEP? I'm not sure I see how that helps. I think it might be more useful to say that adaptation is not conversion, which is not the same thing (IME) as casting. Casting in C and Java does not actually "convert" anything; it simply treats a value or object as if it were of a

Uh? (int)12.34 DOES "convert" to the integer 12, creating an entirely new object. SOME casting does convert, other doesn't (C++ clears up this mess by introducing many separate casts such as reinterpret_cast when you specifically want reinterpretation of bits, etc, etc, but for backwards compatibility keeps supporting the mess too).

different type. ISTM that bringing casting into the terminology just complicates the picture, because e.g. casting in Java actually corresponds to the subset of PEP 246 adaptation for cases where adapt() returns the original object or raises an error. (That is, if adapt() could only ever return the original object or raise an error, it would be precisely equivalent to Java casting, if I understand it correctly.) Thus, at least with regard to object casting in Java, adaptation is a superset, and saying that it's not casting is just confusing.

OK, I'll try to rephrase that part. Obviously "casting" is too overloaded.

Obviously, some changes would need to be made to implement your newly proposed functionality, but this one does support classic classes, modules, and functions, and it has neither the TypeError-hiding problem of the original PEP 246 nor the TypeError-raising problem of your new version.

...but it DOES break the normal semantics of relationship between builtins and special methods, as I exemplified above with hash and hash.

Transitivity of adaptation is in fact somewhat controversial, as is the relationship (if any) between adaptation and inheritance.

The issue is simply this: what is substitutability? If you say that interface B is substitutable for A, and C is substitutable for B, then C must be substitutable for A, or we have inadequately defined "substitutability". Not necessarily, depending on the pragmatics involved. In that case, I generally prefer to be explicit and use conversion rather than using adaptation. For example, if I really mean to truncate the fractional part of a number, I believe it's then appropriate to use 'int(someNumber)' and make it clear that I'm intentionally using a lossy conversion rather than simply treating a number "as an" integer without changing its meaning.

That's how it feels to me FOR NUMBERS, but I can't generalize the feeling to the general case of facade between "records" with many fields of information, see above.

Alex



More information about the Python-Dev mailing list