Docutils To Do List (original) (raw)

David Goodger (with input from many); open to all Docutils developers

Contact:

docutils-develop@lists.sourceforge.net

Date:

2025-04-29

Revision:

10113

Copyright:

This document has been placed in the public domain.

Contents

Priority items are marked with "@" symbols. The more @s, the higher the priority. Items in question form (containing "?") are ideas which require more thought and debate; they are potential to-do's.

Many of these items are awaiting champions. If you see something you'd like to tackle, please do! Please see also the Bugs document for a list of bugs in Docutils.

Minimum Requirements for Python Standard Library Candidacy

Below are action items that must be added and issues that must be addressed before Docutils can be considered suitable to be proposed for inclusion in the Python standard library.

Many of these are now handled by Sphinx

Repository

Move to a Git repository.

See feature requests #58(with pointers to Sphinx issues and discussion).

Convert with reposurgeon?

If you are doing a full import rather than gatewaying, reposurgeon is probably what you want. It has been tested against a lot of large, old, nasty repositories and is thus known to be robust in the presence of repository malformations (a property regularly checked by a test suite that is a rogue's gallery of Subversion botches).

Git Wiki

The comprehensive Reposurgeon documentation comes witha guide to repository conversionas well as info about reading Subversion repositories. Converting from an SVN dump file is faster than from a checkout.

Adam Turner wrote a conversion Makefile and .lift scripts that downloads the repo from SF with rsync, converts it to a SVN mirror and finally to Git, splitting sandbox, prest, and web from docutils.

Sourceforge supports multiple Git repositories per project, so we can switch the version control system independent of the decision on an eventual switch of the host (cf. https://sourceforge.net/p/forge/documentation/Git/).

General

Miscellaneous

object numbering and object references

For equations, tables & figures.

These would be the equivalent of DocBook's "formal" elements.

In LaTeX, automatic counters are implemented for sections, equations and floats (figures, tables) (configurable via stylesheets or in the latex-preamble). Objects can be given reference names with the\label{<refname} command, \ref{} inserts the corresponding number.

No such mechanism exists in HTML/CSS (there is "target-counter" for paged media but this is not supported by browsers as of 2024). Cf. https://stackoverflow.com/questions/16453488/ andhttps://stackoverflow.com/questions/9463523/.

Documentation

User Docs

Developer Docs

How-Tos

PEPs

Python Source Reader

General:

Miscellaneous ideas:

Constants

=========

a = 1
b = 2

Exception Classes

=================

class MyException(Exception): pass

etc.

#!/usr/bin/env python

:Author: Me

:Copyright: whatever

"""This is the public module docstring (__doc__)."""

More docs, in comments.

All comments at the beginning of a module could be

accumulated as docstrings.

We can't have another docstring here, because of the

__future__ statement.

from future import division
Using the JavaDoc convention of a doc-comment block beginning with## is useful though. It allows doc-comments and implementation comments.

Docutils:setting=value

Could be used to turn on/off function parameter comment recognition & other marginal features. Could be used as a general mechanism to augment config files and command-line options (but which takes precedence?).

reStructuredText Parser

Also see the ... Or Not To Do? list.

Misc

This ends up as a definition list. This is more of a usability issue.

  1. line one
  2. line two

Adaptable file extensions

Questions

Should Docutils support adaptable file extensions in hyperlinks?

In the rST source, sister documents are ".rst" files. If we're generating HTML, then ".html" is appropriate; if PDF, then ".pdf"; etc.

Handle documents only, or objects (images, etc.) also?

Different output formats support different sets of image formats (HTML supports ".svg" but not ".pdf", pdfLaTeX supports ".pdf" but not ".svg", LaTeX supports only ".eps").

This is less urgent 2020 than 2004, as pdflatex and lualatex are now standard and support most image formats. Also, a wrapper likerubber that provides on-the-fly image conversion depends on the "wrong" extension in the LaTeX source.

At what point should the extensions be substituted?

Transforms:

Fits well in the Reader → Transformer → Writer processing framework.

Pre- or post-processing:

Can be implemented independent of Docutils -- keeps Docutils simple.

... those who need more sophisticated filename extension tweaking can simply use regular expressions, which isn't too difficult due to the determinability of the writers. So there is no need to add a complex filename-extension-handling feature to Docutils.

Lea Wiemann in docutils-users 2004-06-04

Proposals

How about using ".*" to indicate "choose the most appropriate filename extension"? For example:

.. _Another Document: another.*

Chris Liechti suggests a new 🔗 role in more-universal links?:

.. role:: link(rewrite) :transform: .rst|.html

and then to use it::

for more information see 🔗README.rst

it would be useful if it supported an additional option :format: html so that separate rules for each format can be defined. (like the "raw" role)

Idea from Jim Fulton: an external lookup table of targets:

I would like to specify the extension (e.g. .rst) [in the source, rather than filename.*], but tell the converter to change references to the files anticipating that the files will be converted too.

For example:

.. _Another Document: another.rst

rst2html --convert-links "another.rst bar.rst" foo.rst

That is, name the files for which extensions should be converted.

Note that I want to refer to original files in the original text (another.rst rather than another.*) because I want the unconverted text to stand on its own.

Note that in most cases, people will be able to use globs:

rst2html --convert-link-extensions-for "echo *.rst" foo.rst

It might be nice to be able to use multiple arguments, as in:

rst2html --convert-link-extensions-for *.rst -- foo.rst

> Handle documents only, or objects (images, etc.) also?

No, documents only, but there really is no need for guesswork. Just get the file names as command-line arguments. EIBTI [explicit is better than implicit].

In Patch #169 Hyperlink extension rewriting, John L. Clark suggests command line options that map to-be-changed file extensions, e.g.:

rst2html --map-extension rst html --map-extension jpg png
input-filename.rst

Specifying the mapping as regular expressions would make this approach more generic and easier to implement (use re.replaceand refer to the "re" module's documentation instead of coding and documenting a home-grown extraction and mapping procedure).

Math Markup

alternative input formats

Use a directive option to specify an alternative input format, e.g. (but not limited to):

MathML

Not for hand-written code but maybe useful when pasted in (or included from a file)

For an overview of MathML implementations and tests, see, e.g., the mathweb wiki or the ConTeXT MathML page.

A MathML to LaTeX XSLT sheet:https://github.com/davidcarlisle/web-xslt/tree/master/pmml2tex

ASCIIMath

Simple, ASCII based math input language (see also ASCIIMath tutorial).

Unicode Nearly Plain Text Encoding of Mathematics

format for lightly marked-up representation of mathematical expressions in Unicode.

(Unicode Technical Note. Sole responsibility for its contents rests with the author(s). Publication does not imply any endorsement by the Unicode Consortium.)

itex

See the culmination of a relevant discussion in 2003.

LaTeX output

Which equation environments should be supported by the math directive?

See http://www.math.uiuc.edu/~hildebr/tex/displays.html.

HTML output

There is no native math support in HTML4. HTML5 has built-in support for MathML. MathML is supported by all major browsers since 2023.

For supported math output variants see the math-output setting.

MathML

Additional converters from LaTeX to MathML

HTML/CSS

format math in standard HTML enhanced by CSS rules (Examples and experiments). The math-output=html option uses the converter from eLyXer(included with Docutils).

Alternatives: LaTeX-math to HTML/CSS converters include

OpenOffice output

Directives

Directives below are often referred to as "module.directive", the directive function. The "module." is not part of the directive name when used in a document.

Interpreted Text

Interpreted text is entirely a reStructuredText markup construct, a way to get around built-in limitations of the medium. Some roles are intended to introduce new doctree elements, such as "title-reference". Others are merely convenience features, like "RFC".

All supported interpreted text roles must already be known to the Parser when they are encountered in a document. Whether pre-defined in core/client code, or in the document, doesn't matter; the roles just need to have already been declared. Adding a new role may involve adding a new element to the DTD and may require extensive support, therefore such additions should be well thought-out. There should be a limited number of roles.

The only place where no limit is placed on variation is at the start, at the Reader/Parser interface. Transforms are inserted by the Reader into the Transformer's queue, where non-standard elements are converted. Once past the Transformer, no variation from the standard Docutils doctree is possible.

An example is the Python Source Reader, which will use interpreted text extensively. The default role will be "Python identifier", which will be further interpreted by namespace context into , , , , etc. elements (see pysource.dtd), which will be transformed into standard hyperlink references, which will be processed by the various Writers. No Writer will need to have any knowledge of the Python-Reader origin of these elements.

Doctree pruning

[DG 2017-01-02: These are not definitive to-dos, just one developer's opinion. Added 2009-10-13 by Günter Milde, in r6178.] [Updated by GM 2017-02-04]

The number of doctree nodes can be reduced by "normalizing" some related nodes. This makes the document model and the writers somewhat simpler.