Titles in search index can contain HTML and escaped characters (original) (raw)

Describe the bug

We seem to be passing escaped HTML to the search indexer from the parser.

You can see this in the searchindex.js for the Python docs if you search for (for example) <code:

https://docs.python.org/3.14/searchindex.js

I have a PR to address this, will post.

How to Reproduce

This rst file trivially reproduces the issue:

`escaped` title with < and > in it
==================================

this document has escaped content in the title but also the characters < and > in it

Environment Information

Sphinx main as of Feb 17 2024

Sphinx extensions

Additional context

No response