GitHub - mjdominus/mod: Variation on Perl's “pod” markup language, was used to write Higher-Order Perl (original) (raw)

This is very rough, because I haven't distributed it to anyone before. It's not quite finished. There are some changes I want to make to the internals, and the TeX and LaTeX output drivers aren't done. (The LaTeX one is barely begun.)

If I left anything out, let me know.

The conversion program is called 'mod'. To use it:

    mod -o output-format [ -c chapter-number ] input-file

'm2t' is the same as 'mod -o text' 'm2T' is the same as 'mod -o TeX' 'm2h' is the same as 'mod -o html' 'm2g' is the same as 'mod -o generic' 'm2c' is the same as 'mod -o crappy'

'crappy' is a trivial conversion to plain text. If you want to modify one of the drivers, you should probably have a look at Mod::Crappy first. 'generic' is the base class from which the others inherit. It contains the MOD parsing code, escape sequence translation, and so on. One day I will write API documentation for it.

If the input file is foo.mod, the output file is foo.tex, or foo.html, foo.txt, or foo.crap, as appropriate.

The -c option is used for numbering sections. (So that the sections in chapter 4 get numbered 4.1, 4.2, etc.) If you omit it, mod tries to guess the chapter number from the input filename.

MOD is Pod-like. Input is structured as paragraphs, separated by blank lines.

If a paragraph is indented, it's program source code; if not it's prose. Source code has one special feature: If the leftmost character in a source code line is a '', the '' is removed. The driver gets an array of source code lines and an array indicating which lines were starred. Starred lines are 'special'. In my book, this means that they are lines that have changed since the last time the reader saw this same code. The HTML driver sets the special lines in bold purple font. The text driver sets special lines with a '*' in the left margin. The TeX driver will set the special lines with a change par in the margin.

If a paragraph begins with an = sign, it's a command. MOD defines a syntax and a recommended command set, but commands are implemented by the various drivers, each in its own way. Commands are case-insensitive, but you should use all lowercase so that I can reserve the capitalized names for nonstandard drivers.

Commands that my drivers understand at present:

=startlisting listingname =endlisting listingname =listing listingname =inline listingname

All program listings between these two marks are considered to be part of the named program listing. The text driver will dump the code to a file named Programs/listingname and inline it. The HTML driver will inline the code, with a link to Programs/listingname.

Prose paragraphs between =startlisting and =endlisting are typeset normally, but are not included in the listing. So you can do:

    The following function takes an argument and returns the next
    largest prime number:

    =startlisting next_prime

            sub next_prime {
              my $n = shift;
              $n++;

    We need to increment C<$n> immediately, because the function
    is guaranteed to return a prime I<larger> than C<$n>.

              while (1) {
                return <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi><mi>i</mi><mi>f</mi><mi>i</mi><msub><mi>s</mi><mi>p</mi></msub><mi>r</mi><mi>i</mi><mi>m</mi><mi>e</mi><mo stretchy="false">(</mo></mrow><annotation encoding="application/x-tex">n if is_prime(</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0361em;vertical-align:-0.2861em;"></span><span class="mord mathnormal">ni</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">i</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">p</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.02778em;">r</span><span class="mord mathnormal">im</span><span class="mord mathnormal">e</span><span class="mopen">(</span></span></span></span>n);

    We saw F<is_prime> on page R<is_prime|page>.

                $n++;    
              }
            }
    =endlisting next_prime

The code, and only the code, will be inserted into
Programs/next_prime.  

Note that =startlisting and =endlisting commands need not nest
correctly.  It is permissible to have listings overlap.  If you
have two listings with the same name, the later one will be
appended to the earlier.

=listing is synonymous with =startlisting.

=inline listingname inserts the code for a previously defined
listing:

    Recall the function F<next_prime>, which we saw back in
    Chapter R<next_prime|chapter>:

    =inline next_prime

    This could be made more flexible by...

=startlisting...=endlisting may also make an entry in the index,
and may also define a crossreference for the R<...> sequence.

=chapter chaptertitle
=section sectiontitle
=subsection subsectiontitle
=subsubsection subsubsectiontitle

Start a new chapter, section, etc.  Typesets the title
appropriately and adjusts the sections numbers.  The text file
driver makes an appropriate entry in the .toc file.  Drivers may
set up crossreference information for the R<...> sequence,
described below.

=startpicture picturename =endpicture picturename

The contents are ascii-art.  The text driver inserts this
art verbatim.  The HTML driver looks for an image file named
'picturename' and inlines that instead.  The TeX driver will
probably look for picturename.ps or something of the sort.

=note TEXT

 Ignored.   This is a comment.

=comment =endcomment

 Intervening text is discarded.  This may go away in the next
 version because =note was good enough.

=bulletedlist =endbulletedlist =numberedlist =endnumberedlist =item

 Start or close a list of the specified type.  =item typesets a
 list item.  Lists must be nested correctly; failure to do so will
 yield a warning.

=indented =endindented

  Prose text that is indented, as for example a block quotation.

  The existing drivers accept =maxim and =endmaxim as synonyms for
  =indented.  This will probably go away.

=stop

  End of the file; everything following is ignored.

Escape codes in prose:

In a prose paragraph, and in certain command texts, certain escape sequences are recognized. They are all podlike: X<text...> where 'X' is some single letter. For example:

B<...> Typeset text in boldface. (The text driver uses asterisks) I<...> Typeset text in italics. (The text driver uses underscores) C<...> Typeset text as code. (The text driver uses 'quotes')

Some replacements are carried out recursively. For example, B<foo I baz> generates "foo bar baz". But no recursion is done inside of C<...>. I don't have a spec for this yet, but I do think it is likely to do what you expect.

To put a literal < or > mark inside an escape sequence, backslash it. To put a literal \ inside an escape sequence, backslash it:

    To call a method, use C<$object-\>method(arguments...)>

The backslashes are mandatory.

The rest of the escape sequences are:

F<...> This is a function name. It is probably typeset like code. Some drivers may choose to append notation like '()' to the function name. Some drivers may automatically make a special index entry for the function. This may get moved out of the standard drivers into my own private drivers.

N<...> Typeset a footnote. The text driver just does [[Footnote: ...]]. The HTML driver generates a hotlinked footnote that appears at the bottom of the page. The teX driver will use \footnote{...}.

V<...> Typeset a mathematical variable name. This usually means italics or the equivalent. For example:

          The argument to F<next_prime> is the smallest prime number 
          V<p> which is larger than the specified argument, V<n>.

M<...> An extended math formula. The contents should be in TeX format. The TeX driver will insert this verbatim. The text driver maintains a database showing an approximate plain text representation of the formula. Formulas that are listed in the database have their equivalent translations inserted into the output. Formulas that are not listed in the database are appended to the 'badmath' file, a warning is printed, and the TeX version of the formula is inserted directly.

      After the run is over, you can run the program 'defmath',
      which reads the 'badmath' file and prompts you for additions
      to the database.

      The HTML driver presently does the same thing that the text
      driver does, but this might change.  It might have a
      database if inlinable gifs, or use TeX to generate the gifs
      if one is missing.

R generates a crossreference. This isn't fully implemented yet. The intent is that you'll be able to write

            R<carrots|section>

      and the driver will insert the section number of the section
      titled 'Carrots'; similarly C<Iterators|chapter> or
      C<next_prime|page> or whatever.  At present, there is no way
      to define a cross-reference.

T<...> Generate a reprensentation of an HTML tag. The text driver turns T into "". The HTML driver turns it into "<font>".

X<...> Generate an index entry. An index entry has four properties: The text that is actually typeset inline, the text that is typeset in the index, the alphabetizing key for the index entry, and the style that is used for the page number in the index. Support isn't complete yet. However, the following do work:

        X<fish>  typesets 'fish' inline and inserts it into the
                 index.  For example:

            It is not known whether X<fish> obey the X<Poisson
            distribution>.  

        X<fish|i>  is the same, but nothing is typeset inline.
                  (The 'i' is for 'invisible.)  For example:

            Some might say that our scaly friends of the deep are
            ignorant of statistics.X<fish|i> It is not known
            whether X<fish> obey the X<Poisson distribution>.
            X<Distribution, Poisson|i>

        X<fish|d> is the same, but the intent is that this is the
                  index entry for the *definition* of 'fish'.
                  'fish' is typeset inline in italics, and there
                  may be a special annotation in the index (an
                  italicized page number, for example) to indicate
                  that this reference is special.    Example:

            These water-dwelling vertebrates are known as X<fish|d>.

        X<fish|(> 
        X<fish|)>  Material pertaining to 'fish' starts/ends here.
                   '(' and ')' imply 'i'.  The index entry will
                   come out with a range of page numbers, as 
                   "fish, 273--284" 

        X<fish|K<foo>> like X<fish>, but alphabetize under 'foo'.

        X<fish|B> like X<fish>, but  the page number in the index
                  should be in bold face.
        X<fish|I> like X<fish>, but  the page number in the index
                  should be in italics.

        X<fish|T<see also ichthyes> inserts the note "see also
                  ichthyes" in the index under 'fish'
      
      The intent is that flags can be combined, so that you might
      write X<fish|K<ichtheys>id> to indicate that the current page is the
      point at which 'fish' is defined, but the word 'fish' is not
      actually inserted at that point on the page, and the entry
      for 'fish' should be alphabetized under 'ichthyes'.

      In general, if there is a | in the sequence, everything
      after the last | is taken to be a flag option.  Any other |s
      separate subentries.  For example,

            X<fruits|apples|i>

      inserts a subentry in the index under 'fruits':

            fruits
              apples, 357

      If you want to omit the 'i' here, the word 'fruits' will be
      inserted inline.  You have to say X<fruits|apples|>.  
      X<fruits|apples> won't work because mod will try to
      interpret the 'apples' as a set of flags.

      All of the drivers generate the inline text correctly, but
      none actually builds an index.  I'm planning to write
      another driver, Mod::Index, which extracts the index terms
      and builds an index data file, which the other drivers can
      then read and transform into a formatted index.

I think that's all, except for the API documentation.