msg343983 - (view) |
Author: Dale Visser (dwvisser) * |
Date: 2019-05-30 15:49 |
Alternative version of Filterer.filter(...) would look like this, which takes advantage of efficient Python 3.x built-ins, and is immediately understandable: def _filter_callable(filter): return filter.filter if hasattr(filter, 'filter') else filter def filter(self, record): filters = map(_filter_callable, self.filters) return all(f(record) for f in filters) I will add a tested pull request on GitHub. |
|
|
msg343985 - (view) |
Author: Zachary Ware (zach.ware) *  |
Date: 2019-05-30 16:01 |
This could simplify even further with the following: def filter(self, record): return all(getattr(f, 'filter', f)(record) for f in self.filters) |
|
|
msg343986 - (view) |
Author: Dale Visser (dwvisser) * |
Date: 2019-05-30 17:22 |
Great! My PR now uses getattr as @zach.ware has suggested. |
|
|
msg344045 - (view) |
Author: Vinay Sajip (vinay.sajip) *  |
Date: 2019-05-31 09:20 |
I grant that your version is more compact, but I don't find it more readable in general (perhaps it is for very experienced Python developers/developers who like a functional style, but I persoanlly don't think it improves readability). Since the rationale proposed for making this change is "efficiency", I would like to see some real-world performance numbers to back this up. Can you please provide some quantitative data on the amount of speedup achievable in real-world scenarios? |
|
|
msg344075 - (view) |
Author: Dale Visser (dwvisser) * |
Date: 2019-05-31 13:26 |
It is with great surprise and sadness that I report that I found that performance measurements don't support merging this change. I'm attaching my performance test script. I ran it with/without the logging filters for 1000000 iterations, e.g., ./python all 1000000 The results on my system were as follows, where the numbers may be interpreted as milliseconds per 1000 log messages: No Filters 3 Filters master 17.9 19.8 CBO-37101 19.6 23.1 I imagine I could put a guard 'if' statement that would restore the 'no filters' performance to match 'master', but my use of generators apparently causes the use of significantly more cycles than the existing code. |
|
|
msg344076 - (view) |
Author: Dale Visser (dwvisser) * |
Date: 2019-05-31 13:45 |
Correction on that example of running the test script: ./python test_filter_perf.py all 1000000 I simply prepended this with the Linux `time` command for measuring. |
|
|
msg344098 - (view) |
Author: Dale Visser (dwvisser) * |
Date: 2019-05-31 18:03 |
Adding the patch file associated with GitHub pull request #13683, as well as the performance test I just reported. |
|
|
msg344130 - (view) |
Author: Dale Visser (dwvisser) * |
Date: 2019-05-31 21:42 |
FWIW, when I tried this instead, I got 21 to 22 msec per 1000 log messages (still a 5-10% performance hit): def filter(self, record): rv = True if self.filters: rv = all(getattr(f, 'filter', f)(record) for f in self.filters) return rv I don't see that getattr(...) should be less efficient than hasattr(...), so the generator expression is the likely culprit. |
|
|
msg344171 - (view) |
Author: Dale Visser (dwvisser) * |
Date: 2019-06-01 13:08 |
I've learned a lot about the performance trade-offs of generator expressions. The only way of shortening this code (readability is subjective) that I've found not to negatively impact performance is this: def filter(self, record): rv = True for f in self.filters: if not getattr(f, 'filter', f)(record): rv = False break return rv |
|
|