[Python-Dev] PEP 455 -- TransformDict (original) (raw)

Raymond Hettinger raymond.hettinger at gmail.com
Thu May 14 16:29:55 CEST 2015

Previous message (by thread): [Python-Dev] Repository builds as 2.7.8+
Next message (by thread): [Python-Dev] PEP 455 -- TransformDict
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Before the Python 3.5 feature freeze, I should step-up and formally reject PEP 455 for "Adding a key-transforming dictionary to collections".

I had completed an involved review effort a long time ago and I apologize for the delay in making the pronouncement.

What made it a interesting choice from the outset is that the idea of a "transformation" is an enticing concept that seems full of possibility. I spent a good deal of time exploring what could be done with it but found that it mostly fell short of its promise.

There were many issues. Here are some that were at the top:

Most use cases don't need or want the reverse lookup feature (what is wanted is a set of one-way canonicalization functions). Those that do would want to have a choice of what is saved (first stored, last stored, n most recent, a set of all inputs, a list of all inputs, nothing, etc). In database terms, it models a many-to-one table (the canonicalization or transformation function) with the one being a primary key into another possibly surjective table of two columns (the key/value store). A surjection into another surjection isn't inherently reversible in a useful way, nor does it seem to be a common way to model data.
People are creative at coming up with using cases for the TD but then find that the resulting code is less clear, slower, less intuitive, more memory intensive, and harder to debug than just using a plain dict with a function call before the lookup: d[func(key)]. It was challenging to find any existing code that would be made better by the availability of the TD.
The TD seems to be all about combining data scrubbing (case-folding, unicode canonicalization, type-folding, object identity, unit-conversion, or finding a canonical member of an equivalence class) with a mapping (looking-up a value for a given key). Those two operations are conceptually orthogonal. The former doesn't get easier when hidden behind a mapping API and the latter loses the flexibility of choosing your preferred mapping (an ordereddict, a persistentdict, a chainmap, etc) and the flexibility of establishing your own rules for whether and how to do a reverse lookup.

Raymond Hettinger

P.S. Besides the core conceptual issues listed above, there are a number of smaller issues with the TD that surfaced during design review sessions. In no particular order, here are a few of the observations:

It seems to require above average skill to figure-out what can be used as a transform function. It is more expert-friendly than beginner friendly. It takes a little while to get used to it. It wasn't self-evident that transformations happen both when a key is stored and again when it is looked-up (contrast this with key-functions for sorting which are called at most once per key).
The name, TransformDict, suggests that it might transform the value instead of the key or that it might transform the dictionary into something else. The name TransformDict is so general that it would be hard to discover when faced with a specific problem. The name also limits perception of what could be done with it (i.e. a function that logs accesses but doesn't actually change the key).
The tool doesn't self describe itself well. Looking at the help(), or the repr(), or the tooltips did not provide much insight or clarity. The dir() shows many of the _abc implementation details rather than the API itself.
The original key is stored and if you change it, the change isn't stored. The _original dict is private (perhaps to reduce the risk of putting the TD in an inconsistent state) but this limits access to the stored data.
The TD is unsuitable for bijections because the API is inherently biased with a rich group of operators and methods for forward lookup but has only one method for reverse lookup.
The reverse feature is hard to find (getitem vs getitem) and its output pair is surprising and a bit awkward to use. It provides only one accessor method rather that the full dict API that would be given by a second dictionary. The API hides the fact that there are two underlying dictionaries.
It was surprising that when d[k] failed, it failed with transformation exception rather than a KeyError, violating the expectations of the calling code (for example, if the transformation function is int(), the call d["12"] transforms to d[12] and either succeeds in returning a value or in raising a KeyError, but the call d["12.0"] fails with a TypeError). The latter issue limits its substitutability into existing code that expects real mappings and for exposing to end-users as if it were a normal dictionary.
There were other issues with dict invariants as well and these affected substitutability in a sometimes subtle way. For example, the TD does not work with missing(). Also, "k in td" does not imply that "k in list(td.keys())".
The API is at odds with wanting to access the transformations. You pay a transformation cost both when storing and when looking up, but you can't access the transformed value itself. For example, if the transformation is a function that scrubs hand entered mailing addresses and puts them into a standard format with standard abbreviations, you have no way of getting back to the cleaned-up address.
One design reviewer summarized her thoughts like this: "There is a learning curve to be climbed to figure out what it does, how to use it, and what the applications [are]. But, the [working out the same] examplea with plain dicts requires only basic knowledge." -- Patricia

Previous message (by thread): [Python-Dev] Repository builds as 2.7.8+
Next message (by thread): [Python-Dev] PEP 455 -- TransformDict
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list