toc: Unusual characters in heading ids not well supported · Issue #1493 · Python-Markdown/markdown (original) (raw)

I noticed that toc encodes characters like * as \x0242\x03, 42 being the index of * in the ASCII table. This causes a discrepancy between the permalink of a heading and the link in the table of contents.

mkdir /tmp/toc cd /tmp/toc python -m venv .venv . .venv/bin/activate python -m pip install mkdocs python -m mkdocs new .

index file:

Welcome

Demonstrating an issue with HTML ids and toc.

*Foo* { id="*Foo*" }

mkdocs config:

site_name: My Docs

markdown_extensions:

Serve and observe the behavior described in the index page.

I'm not saying this is a bug. I'm just curious if this is expected, and whether there would be a way improve support for headings with such "unusual" ids. This would help for the work I'm doing with mkdocstrings, where we try to expand our languages support, and some languages might use uncommon characters in object identifiers. Not only toc would have to work, but also mkdocs-autorefs, which picks up ids from the table of contents when registering URLs and anchors to objects.

I believe HTML5 supports any kind of characters in ids. Some of them just cause a bit of pain, like . or #, because they then need to be escaped in CSS selectors.