Fix footnote ordering by aeskildsen · Pull Request #1546 · Python-Markdown/markdown (original) (raw)

So I managed to do a bit of benchmarking.

I benchmarked the processing of two markdown documents using pyperf:

sample_doc.md: A document attempting to simulate a big chunk of realistic README-like markdown. Contains all the markdown from the top level of Python-Markdown's docs directory.
nested_doc.md: A document with 22 examples of nested structures which requires the 'extra' extension for footnotes, md_in_html, etc. It contains the loose list example @waylan provided, the contents of the aforementioned raw-html.txt test file, a bunch of AI-generated markdown with nested structures, and a single line containing 2000 footnote references (adapted from the original test file footnote_many_footnotes.txt), a few of which are nested within ** and __.

Let me know if you want to see the contents of these files.

The stack.pop(0) fix seems to have a very negligible impact on performance. I'm no statistician, but I would guess the difference is within the margin of error for the benchmarks I ran. collections.deque seems to show a very minor performance improvement in the case of the nested document, but again, likely not far from the margin of error.

I am no expert on how the inline processor works, but I think the reason we are not seeing big differences here is that inline nesting mostly does not go very deep. I believe this is what @waylan mentioned earlier. As I understand it, list.pop(0) is O(n) and deque.popleft() is O(1), but when the list is short the difference is probably negligible.

Data

sample_doc.md, default pyperf timeit benchmark

v3.8.2 with no changes: Mean +- std dev: 50.6 ms +- 0.9 ms
commit d8ecae6 (stack.pop(0) fix): Mean +- std dev: 51.4 ms +- 1.9 ms
commit d8ecae6 + collections.deque instead of list: Mean +- std dev: 51.9 ms +- 2.3 ms

nested_doc.md, default pyperf timeit benchmark

Here I used extensions=['extra'].

v3.8.2 with no changes: Mean +- std dev: 156 ms +- 3 ms
commit d8ecae6 (with stack.pop(0) fix): Mean +- std dev: 160 ms +- 5 ms
commit d8ecae6 + collections.deque instead of list: Mean +- std dev: 158 ms +- 3 ms

nested_doc.md, rigorous pyperf timeit benchmark

Here I used extensions=['extra'].

v3.8.2 with no changes: Mean +- std dev: 160 ms +- 9 ms
commit d8ecae6 (with stack.pop(0) fix): Mean +- std dev: 164 ms +- 11 ms
commit d8ecae6 + collections.deque instead of list: Mean +- std dev: 160 ms +- 7 ms

TLDR

There does not appear to be a noteworthy performance impact from the stack.pop(0) fix. I think it can be merged without major concerns about performance.

It is possible to use collections.deque as a drop-in replacement for the list, which could offer a slight improvement in documents with very complex nested inline structures. We could consider testing how it performs on a wider range of documents first, and so perhaps a different PR would be more appropriate for that.