GitHub - leandrocp/mdex: Fast and Extensible Markdown for Elixir. Compliant with CommonMark spec. Formats to HTML, JSON, and XML. Built on top of comrak, ammonia, and autumnus. (original) (raw)

MDEx

MDEx logo

Fast and Extensible Markdown for Elixir.

Hex Version Hex Docs MIT

Features

Foundation

The library is built on top of:

Installation

Add :mdex dependency:

def deps do [ {:mdex, "~> 0.7"} ] end

Usage

Mix.install([{:mdex, "~> 0.7"}])

iex> MDEx.to_html!("# Hello") "

Hello

"

iex> MDEx.to_html!("# Hello :smile:", extension: [shortcodes: true]) "

Hello πŸ˜„

"

Plugins

Req-like Pipeline

MDEx.Pipe provides a high-level API to manipulate a Markdown document and build plugins that can be attached to a pipeline:

document = ~s|

Super Diagram

```mermaid graph TD A[Enter Chart Definition] --> B(Preview) B --> C{decide} C --> D[Keep] C --> E[Edit Definition] E --> B D --> F[Save Image and Code] F --> B ``` |

MDEx.new() |> MDExMermaid.attach(mermaid_version: "11") |> MDEx.to_html(document: document)

~MD Sigil

Convert and generate AST (MDEx.Document), Markdown (CommonMark), HTML, HEEx, JSON, and XML formats.

iex> import MDEx.Sigil iex> MD|# Hello from `MD` sigil| %MDEx.Document{ nodes: [ %MDEx.Heading{ nodes: [ %MDEx.Text{literal: "Hello from "}, %MDEx.Code{num_backticks: 1, literal: "~MD"}, %MDEx.Text{literal: " sigil"} ], level: 1, setext: false } ] }

iex> import MDEx.Sigil iex> MD|`MD` also converts to HTML format|HTML "

~MD also converts to HTML format

"

iex> import MDEx.Sigil iex> ~MD|and to XML as well|XML "\n\n<document xmlns="\n" title="undefined" rel="noopener noreferrer">http://commonmark.org/xml/1.0\">\n \n <text xml:space="preserve">and to XML as well\n \n"

~MD also accepts an assigns map to pass variables to the document:

iex> import MDEx.Sigil iex> assigns = %{lang: "Elixir"} iex> ~MD|Running <%= @lang %>|HTML "

Running Elixir

"

See more info at https://hexdocs.pm/mdex/MDEx.Sigil.html

Safety

For security reasons, every piece of raw HTML is omitted from the output by default:

iex> MDEx.to_html!("

Hello

") ""

That's not very useful for most cases, but you have a few options:

Escape

The most basic is render raw HTML but escape it:

iex> MDEx.to_html!("

Hello

", render: [escape: true]) "<h1>Hello</h1>"

Sanitize

But if the input is provided by external sources, it might be a good idea to sanitize it:

iex> MDEx.to_html!("Elixir", render: [unsafe: true], sanitize: MDEx.default_sanitize_options()) "

<a href="https://elixir-lang.org\" rel="noopener noreferrer">Elixir

"

Note that you must pass the unsafe: true option to first generate the raw HTML in order to sanitize it.

It does clean HTML with a conservative set of defaultsthat works for most cases, but you can overwrite those rules for further customization.

For example, let's modify the link rel attribute to add "nofollow" into the rel attribute:

iex> MDEx.to_html!("External", render: [unsafe: true], sanitize: [link_rel: "nofollow noopener noreferrer"]) "

<a href="https://someexternallink.com\" rel="nofollow noopener noreferrer">External

"

In this case the default rule set is still applied but the link_rel rule is overwritten.

Unsafe

If those rules are too strict and you really trust the input, or you really need to render raw HTML, then you can just render it directly without escaping nor sanitizing:

iex> MDEx.to_html!("", render: [unsafe: true]) ""

Parsing

Converts Markdown to an AST data structure that can be inspected and manipulated to change the content of the document programmatically.

The data structure format is inspired on Floki (with :attributes_as_maps = true) so we can keep similar APIs and keep the same mental model when working with these documents, either Markdown or HTML, where each node is represented as a struct holding the node name as the struct name and its attributes and children, for eg:

%MDEx.Heading{ level: 1 nodes: [...], }

The parent node that represents the root of the document is the MDEx.Document struct, where you can find more more information about the AST and what operations are available.

The complete list of nodes is listed in the documentation, section Document Nodes.

Formatting

Formatting is the process of converting from one format to another, for example from AST or Markdown to HTML. Formatting to XML and to Markdown is also supported.

You can use MDEx.parse_document/2 to generate an AST or any of the to_* functions to convert to Markdown (CommonMark), HTML, JSON, or XML.

Examples

GitHub Flavored Markdown with emojis

MDEx.to_html!(~S"""

GitHub Flavored Markdown πŸš€

Feature Status
Fast :white_check_mark:
GFM :white_check_mark:

Check out the spec at https://github.github.com/gfm/ """, extension: [ strikethrough: true, tagfilter: true, table: true, autolink: true, tasklist: true, footnotes: true, shortcodes: true, ], parse: [ smart: true, relaxed_tasklist_matching: true, relaxed_autolinks: true ], render: [ github_pre_lang: true, unsafe: true, ]) |> IO.puts() """

GitHub Flavored Markdown πŸš€

Feature Status
Fast βœ…
GFM βœ…

Check out the spec at https://github.github.com/gfm/

"""

Code Syntax Highlighting

MDEx.to_html!(~S"""

String.upcase("elixir")

""", syntax_highlight: [ formatter: {:html_inline, theme: "catppuccin_latte"} ] ) |> IO.puts() """

  
    String.upcase("elixir")
  

"""

Pre-compilation

Pre-compiled binaries are available for the following targets, so you don't need to have Rust installed to compile and use this library:

Note: The pre-compiled binaries for Linux are compiled using Ubuntu 22 on libc 2.35, which requires minimum Ubuntu 22, Debian Bookworm or a system with a compatible libc version. For older Linux systems, you'll need to compile manually.

Compile manually

But in case you need or want to compile it yourself, you can do the following:

  1. Install Rust
  2. Install a C compiler or build packages

It depends on your OS, for example in Ubuntu you can install the build-essential package.

  1. Run:

export MDEX_BUILD=1 mix deps.get mix compile

Legacy CPUs

Modern CPU features are enabled by default but if your environment has an older CPU, you can use legacy artifacts by adding the following configuration to your config.exs:

config :mdex, use_legacy_artifacts: true

Used By

Are you using MDEx and want to list your project here? Please send a PR!

Motivation

MDEx was born out of the necessity of parsing CommonMark files, to parse hundreds of files quickly, and to be easily extensible by consumers of the library.

Note that MDEx is the only one that syntax highlights out-of-the-box which contributes to make it slower than cmark.

Comparison

Feature MDEx Earmark md cmark
Active βœ… βœ… βœ… ❌
Pure Elixir ❌ βœ… βœ… ❌
Extensible βœ… βœ… βœ… ❌
Syntax Highlighting βœ… ❌ ❌ ❌
AST βœ… βœ… βœ… ❌
AST to Markdown βœ… ⚠️² ❌ ❌
To HTML βœ… βœ… βœ… βœ…
To JSON βœ… ❌ ❌ ❌
To XML βœ… ❌ ❌ βœ…
To Manpage ❌ ❌ ❌ βœ…
To LaTeX ❌ ❌ ❌ βœ…
Emoji βœ… ❌ ❌ ❌
GFMΒ³ βœ… βœ… ❌ ❌
GitLab⁴ ⚠️¹ ❌ ❌ ❌
Discord⁡ ⚠️¹ ❌ ❌ ❌
  1. Partial support
  2. Possible with earmark_reversal
  3. GitHub Flavored Markdown
  4. GitLab Flavored Markdown
  5. Discord Flavored Markdown

Benchmark

A simple script is available to compare existing libs:

Name              ips        average  deviation         median         99th %
cmark         22.82 K      0.0438 ms    Β±16.24%      0.0429 ms      0.0598 ms
mdex           3.57 K        0.28 ms     Β±9.79%        0.28 ms        0.33 ms
md             0.34 K        2.95 ms    Β±10.56%        2.90 ms        3.62 ms
earmark        0.25 K        4.04 ms     Β±4.50%        4.00 ms        4.44 ms

Comparison:
cmark         22.82 K
mdex           3.57 K - 6.39x slower +0.24 ms
md             0.34 K - 67.25x slower +2.90 ms
earmark        0.25 K - 92.19x slower +4.00 ms

To finish, a friendly reminder that all libs have their own strengths and trade-offs so use the one that better suit your needs.

Acknowledgements