[Python-Dev] collections module (original) (raw)

Martin v. Loewis martin at v.loewis.de
Sat Jan 10 07:07:51 EST 2004


Raymond Hettinger wrote:

I truly don't understand your answer. The bag and queue collections have been tried and true in many languages.

I'm not so sure about a queue class, but I do believe that bag classes got into libraries only for academic reasons such as symmetry. In particular, they got into Smalltalk because of that reason, and everybody copies it because Smalltalk has it. This has to stop :-)

Do I need to turn this into a PEP, take it to comp.lang.python, cheerlead for it, and make the total effort ten times harder than it needs to be?

If everybody else thinks bags are a great idea, go ahead. YAGNI.

Or am I missing something basic like it being detrimental to the language in some way?

They are perhaps detrimental because they violate the principle

There should be one-- and preferably only one --obvious way to do it.

For queues, this is debatable, as the obvious way to do it (list.append, list.pop(0)) is inefficient if the list is long.

In the specific case of bags, I think a different generalization would be more useful. The only use case of bags is when you want to count things, so the current pattern is

count them

d = {} for t in things: d[t] = d.get(t, 0) + 1

print them sorted by item

for k, v in sorted(d.iteritems()): print k, v

print them sorted by frequency

for k, v in sorted(d.iteritems(), key=operator.itemgetter(1)): print k,v

With a bag, this could be "simplified" to

d = Bag() for t in things: d.add(t)

print them sorted by item

for k, v in sorted(d.iteritems()): print k, v

print them sorted by frequency

for k, v in d.sortedByFrequency(): print k,v

I don't think this is reasonably simpler.

What's worse: this type does not support a very similar activity, namely collecting similar items

d = {} for t in things: k = category(d) # e.g. if t are files, one may collect them # by size class (empty, <1k, <10k, <100k, ...) try: d[k].append(t) except KeyError: d[k] = [t]

Both applications could be implemented if dictionaries had the notion of a default value, which would be generated by a callable (whether with or without argument is debatable)

count things

d = {} d.defaultmaker = lambda k:0 for t in things: d[t] += 1

sort things by category

d = {} d.defaultmaker = lambda k:[] for t in things: d[category(t)].append(t)

Having a key in the lambda is probably better, as it would also allow lazily-filled dictionaries.

Regards, Martin



More information about the Python-Dev mailing list