Issue 1517509: filter() implementation does not match docs (original) (raw)

Issue1517509

Created on 2006-07-05 13:09 by collinwinter, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (5)
msg29048 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2006-07-05 13:09
The docs for the built-in function filter() claim that "filter(function, list) is equivalent to [item for item in list if function(item)] if function is not None and [item for item in list if item] if function is None". >>> class infinite_str(str): ... def __getitem__(self, index): ... return "a" ... >>> filter(None, infinite_str("1234")) 'aaaa' Now, if we translate this to a listcomp according to the docs: >>> [x for x in infinite_str("1234") if x] The listcomp version proceeds to chew up memory until it exhausts the system resources or is killed by the user. If the docs are to be believed, the filter() version should do the same thing.
msg29049 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-07-05 14:58
Logged In: YES user_id=80475 Please take a larger view when reviewing the docs. The listcomp analogy is very helpful in explaining what filter() does and readers would not benefit by its removal. Throughtout the docs, the phrase "is equivalent to" does not mean "is identical to" or "exactly the same as". In this case, you have isolated a non-guaranteed implementation detail that is almost always irrelevant. When an object such as infinite_str lies about its length, the consequent behavior is undefined. It is not hard to produce weird results when objects violate basic invariants such as len(istr)!=len(list(istr)) or the expected relation between __eq__ and __hash__.
msg29050 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2006-07-05 15:16
Logged In: YES user_id=1344176 I wasn't just reviewing the docs; I came upon this while re-implementing filter() in pure Python for a compatibility module I'm working on. I discovered this particular implementation detail when my naive implementation (as a listcomp) didn't pass test_builtin. My question is why filter() even cares about __len__, ie, why it doesn't simply use the iterator protocol. This is especially curious since filter() actively ignores the iterator protocol when dealing with subclasses of str, unicode and tuple: >>> class badstr(str): ... def __len__(self): ... return 4 ... def __getitem__(self, index): ... return "a" ... def __iter__(self): ... for i in range(5): ... yield "b" ... >>> filter(None, badseq("cccc")) aaaa >>> [x for x in badseq("cccc") if x] ['b', 'b', 'b', 'b', 'b'] Ignore the difference in return types (which is expected), these two do not do the same thing; filter() uses a combination of __len__ and __getitem__ while the listcomp version uses the iterator protocol. Clearly, in this case, "is equivalent to" means "bears only a superficial relationship to".
msg29051 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2006-07-13 19:26
Logged In: YES user_id=1344176 I've posted a patch for the __len__ issue; it's SF #1522016.
msg29052 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-07-13 22:35
Logged In: YES user_id=80475 It's a speed hack. The result list for filter() is pre-allocated to the input length and then resized downward to the actual length. Stop posting patches for non-bugs.
History
Date User Action Args
2022-04-11 14:56:18 admin set github: 43616
2006-07-05 13:09:25 collinwinter create