[Python-Dev] Currently baking idea for dict.sequpdate(iterable, value=True) (original) (raw)

M.-A. Lemburg mal@lemburg.com
Mon, 25 Nov 2002 09:14:30 +0100

Previous message: [Python-Dev] Currently baking idea for dict.sequpdate(iterable, value=True)
Next message: [Python-Dev] Currently baking idea for dict.sequpdate(iterable, value=True)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

aymond Hettinger wrote:

Here's a write-up for a proposed dictionary update method. Comments are invited while I work on a patch, docs, tests, etc.

Just as data point: you may want to have a look at the setdict() API in mxTools. It does pretty much what you are suggesting.

If you want to add this functionality as method, I'd also suggest to add invdict():

setdict(sequence,value=None) Constructs a dictionary from the given sequence. The sequence must contain hashable objects which are used as keys. The values are all set to value. Multiple keys are silently ignored. The function comes in handy whenever you need to work with a sequence in a set based context (e.g. to determine the set of used values).

invdict(dictionary) Constructs a new dictionary from the given one with inverted mappings. Keys become values and vice versa. Note that no exception is raised if the values are not unique. The result is undefined in this case (there is a value:key entry, but it is not defined which key gets used).

Other useful extensions would be extract() and iremove() as method on lists:

extract(object,indices[,defaults]) Builds a list with entries object[index] for each index in the sequence indices.

 If a lookup fails and the sequence defaults is given, then defaults[nth_index] is used, where nth_index is the

index of index in indices (confused ? it works as expected !). defaults should have the same length as indices.

 If you need the indices as well, try the irange function. The function raises an IndexError in case it can't find

an entry in indices or defaults.

iremove(object,indices) Removes the items indexed by indices from object.

 This changes the object in place and thus is only possible for mutable types.

 For sequences the index list must be sorted ascending; an IndexError will be raised otherwise (and the object left

in an undefined state).

(Note that the above work on all kinds of sequences/mappings, not just lists or dictionaries.)

Raymond Hettinger ------------------------------------------------------------------------ METHOD SPECIFICATION def sequpdate(self, iterable, value=True): for elem in iterable: self[elem] = value return self USE CASES # Fast Membership testing termwords = dict('End Quit Stop Abort'.split()) . . . if lexeme in termwords: sys.exit(0) # Removing duplicates seq = dict().sequpdate(seq).keys() # Initializing or resetting mapped accumulators absences = dict('Tom Dick Harry'.split(), 0) for name, date in classlog: absences[name] += 1 # Intentionally raises KeyError for invalid names RATIONALE Examining code in the library, the cookbook, and the vaults of parnassus, the most common practice for membership testing is a sequential search of a list, tuple, or string. Where the writers were sophisticated and cared about performance, they used in-line versions of the python fragment shown above. I have not found a single case where people took the time to "roll their own" pure python function as shown above. Even if they had, the construction time would take twice as long as with a C coded method. IOW, practices are tending toward slow or verbose approaches in the absence of a quick, convenient, standardized tool for moving sequences into dictionaries. The problem of removing duplicates from a sequence is most frequently solved with inline use of a code fragment similar to the one shown above. Again, you almost never see people taking the time to build their own uniq function, factoring all the effort into a single, tested, reusable code fragment. The exception is Tim's wonderful, general purpose recipe for uniquifying anything -- unfortunately, I don't think you ever see his code used in practice. Also, since all containers support iter, they can be fed to list(); however, in the absence of the above method, they cannot be fed to dict() without shenanigans for building item lists. The proposed method makes todict() as doable as tolist() and encourages switching to appropriate data structures. OBJECTIONS TO THE PRIOR IDEA OF EXPANDING THE DICT CONSTRUCTOR (SF 575224) 1. The constructor was already overloaded. 2. Weird syntax was needed to fit with other constructor syntax. 3. The sets module was going to meet all needs. The first two comments led to this revision in method form. The Sets module met several needs centering around set mathematics; however, for membership testing, it is so slow that it is almost always preferable to use dictionaries instead (even without this proposed method). The slowness is intrinsic because of the time to search for the contains method in the class and the time to setup a try/except to handle mutable elements. Another reason to prefer dictionaries is that there is one less thing to import and expect readers to understand. My experiences applying the Sets module indicates that it will never replace dictionaries for membership testing and will have only infrequent use for uniquification.

Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev

-- Marc-Andre Lemburg CEO eGenix.com Software GmbH

eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Previous message: [Python-Dev] Currently baking idea for dict.sequpdate(iterable, value=True)
Next message: [Python-Dev] Currently baking idea for dict.sequpdate(iterable, value=True)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]