[Python-Dev] Currently baking idea for dict.sequpdate(iterable, value=True) (original) (raw)
Raymond Hettinger python@rcn.com
Mon, 25 Nov 2002 02:02:55 -0500
- Previous message: [Python-Dev] Problems installing Python
- Next message: [Python-Dev] Currently baking idea for dict.sequpdate(iterable, value=True)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Here's a write-up for a proposed dictionary update method. Comments are invited while I work on a patch, docs, tests, etc.
Raymond Hettinger
METHOD SPECIFICATION
def sequpdate(self, iterable, value=True):
for elem in iterable:
self[elem] = value
return self
USE CASES
Fast Membership testing
termwords = dict('End Quit Stop Abort'.split()) . . . if lexeme in termwords: sys.exit(0)
Removing duplicates
seq = dict().sequpdate(seq).keys()
Initializing or resetting mapped accumulators
absences = dict('Tom Dick Harry'.split(), 0) for name, date in classlog: absences[name] += 1 # Intentionally raises KeyError for invalid names
RATIONALE
Examining code in the library, the cookbook, and the vaults of parnassus, the most common practice for membership testing is a sequential search of a list, tuple, or string. Where the writers were sophisticated and cared about performance, they used in-line versions of the python fragment shown above. I have not found a single case where people took the time to "roll their own" pure python function as shown above. Even if they had, the construction time would take twice as long as with a C coded method. IOW, practices are tending toward slow or verbose approaches in the absence of a quick, convenient, standardized tool for moving sequences into dictionaries.
The problem of removing duplicates from a sequence is most frequently solved with inline use of a code fragment similar to the one shown above. Again, you almost never see people taking the time to build their own uniq function, factoring all the effort into a single, tested, reusable code fragment. The exception is Tim's wonderful, general purpose recipe for uniquifying anything -- unfortunately, I don't think you ever see his code used in practice.
Also, since all containers support iter, they can be fed to list(); however, in the absence of the above method, they cannot be fed to dict() without shenanigans for building item lists. The proposed method makes todict() as doable as tolist() and encourages switching to appropriate data structures.
OBJECTIONS TO THE PRIOR IDEA OF EXPANDING THE DICT CONSTRUCTOR (SF 575224)
- The constructor was already overloaded.
- Weird syntax was needed to fit with other constructor syntax.
- The sets module was going to meet all needs.
The first two comments led to this revision in method form.
The Sets module met several needs centering around set mathematics; however, for membership testing, it is so slow that it is almost always preferable to use dictionaries instead (even without this proposed method). The slowness is intrinsic because of the time to search for the contains method in the class and the time to setup a try/except to handle mutable elements. Another reason to prefer dictionaries is that there is one less thing to import and expect readers to understand. My experiences applying the Sets module indicates that it will never replace dictionaries for membership testing and will have only infrequent use for uniquification.
- Previous message: [Python-Dev] Problems installing Python
- Next message: [Python-Dev] Currently baking idea for dict.sequpdate(iterable, value=True)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]