FSharp.Formatting


F# Formatting: Markdown parser

This page demonstrates how to use FSharp.Markdown.dll to parse a Markdown document, process the obtained document representation and how to turn the code into a nicely formatted HTML.

First, we need to load the assembly and open necessary namespaces:

1: 
2: 
3: 
4: 
#r "../../bin/FSharp.Formatting.Common.dll"
#r "../../bin/FSharp.Markdown.dll"
open FSharp.Markdown
open FSharp.Formatting.Common

Parsing documents

The F# Markdown parser recognizes the standard Markdown syntax and it is not the aim of this tutorial to fully document it. The following snippet creates a simple string containing a document with several elements and then parses it using the Markdown.Parse method:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
let document = """
# F# Hello world
Hello world in [F#](http://fsharp.net) looks like this:

    printfn "Hello world!"

For more see [fsharp.org][fsorg].

  [fsorg]: http://fsharp.org "The F# organization." """

let parsed = Markdown.Parse(document)

The sample document consists of a first-level heading (written using one of the two alternative styles) followed by a paragraph with a direct link, code snippet and one more paragraph that includes an indirect link. The URLs of indirect links are defined by a separate block as demonstrated on the last line (and they can then be easily used repeatedly from multiple places in the document).

Working with parsed documents

The F# Markdown processor does not turn the document directly into HTML. Instead, it builds a nice F# data structure that we can use to analyze, transform and process the document. First of all the DefinedLinks property returns all indirect link definitions:

1: 
2: 
3: 
parsed.DefinedLinks
val it : IDictionary<string,(string * string option)> =
  dict [("fsorg", ("http://fsharp.org", Some "The F# organization."))]

The document content can be accessed using the Paragraphs property that returns a sequence of paragraphs or other first-level elements (headings, quotes, code snippets, etc.). The following snippet prints the heading of the document:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
// Iterate over all the paragraph elements
for par in parsed.Paragraphs do
  match par with
  | Heading(size=1; body=[Literal(text=text)]) -> 
      // Recognize heading that has a simple content
      // containing just a literal (no other formatting)
      printfn "%s" text
  | _ -> ()

You can find more detailed information about the document structure and how to process it in the book F# Deep Dives.

Processing the document recursively

The library provides active patterns that can be used to easily process the Markdown document recursively. The example in this section shows how to extract all links from the document. To do that, we need to write two recursive functions. One that will process all paragraph-style elements and one that will process all inline formattings (inside paragraphs, headings etc.).

To avoid pattern matching on every single kind of span and every single kind of paragraph, we can use active patterns from the Matching module. These can be use to recognize any paragraph or span that can contain child elements:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
/// Returns all links in a specified span node
let rec collectSpanLinks span = seq {
  match span with
  | DirectLink(link=url) -> yield url
  | IndirectLink(key=key) -> yield fst (parsed.DefinedLinks.[key])
  | Matching.SpanLeaf _ -> ()
  | Matching.SpanNode(_, spans) ->
      for s in spans do yield! collectSpanLinks s }
      
/// Returns all links in the specified paragraph node
let rec collectParLinks par = seq {
  match par with
  | Matching.ParagraphLeaf _ -> ()
  | Matching.ParagraphNested(_, pars) -> 
      for ps in pars do 
        for p in ps do yield! collectParLinks p 
  | Matching.ParagraphSpans(_, spans) ->
      for s in spans do yield! collectSpanLinks s }

/// Collect links in the entire document
Seq.collect collectParLinks parsed.Paragraphs
val it : seq<string> =
  seq ["http://fsharp.net"; "http://fsharp.org"]

The collectSpanLinks function works on individual span elements that contain inline formatting (emphasis, strong) and also links. The DirectLink node represents an inline link like the one pointing to http://fsharp.net while IndirectLink represents a link that uses one of the link definitions. The function simply returns the URL associated with the link.

Some span nodes (like emphasis) can contain other formatting, so we need to recursively process children. This is done by matching against Matching.SpanNodes which is an active pattern that recognizes any node with children. The library also provides a function named Matching.SpanNode that can be used to reconstruct the same node (when you want to transform document). This is similar to how the ExprShape module for working with F# quotations works.

The function collectParLinks processes paragraphs - a paragraph cannot directly be a link so we just need to process all spans. This time, there are three options. ParagraphLeaf represents a case where the paragraph does not contain any spans (a code block or, for example, a <hr> line); the ParagraphNested case is used for paragraphs that contain other paragraphs (quotation) and ParagraphSpans is used for all other paragraphs that contain normal text - here we call collectSpanLinks on all nested spans.

Generating HTML output

Finally, the Markdown type also includes a method WriteHtml that can be used to generate an HTML document from the Markdown input. The following example shows how to call it:

1: 
let html = Markdown.WriteHtml(parsed)

In addition, you can also use Markdown.TransformHtml to directly turn an input document in the Markdown format into an HTML document (without the intermediate step).

Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
namespace FSharp.Markdown
namespace FSharp.Formatting
namespace FSharp.Formatting.Common
val document : string

Full name: Markdown.document
val parsed : MarkdownDocument

Full name: Markdown.parsed
type Markdown =
  static member Parse : text:string -> MarkdownDocument
  static member Parse : text:string * newline:string -> MarkdownDocument
  static member TransformHtml : text:string -> string
  static member TransformHtml : text:string * newline:string -> string
  static member TransformHtml : text:string * writer:TextWriter -> unit
  static member TransformHtml : text:string * writer:TextWriter * newline:string -> unit
  static member TransformLatex : text:string -> string
  static member TransformLatex : text:string * newline:string -> string
  static member TransformLatex : text:string * writer:TextWriter -> unit
  static member TransformLatex : text:string * writer:TextWriter * newline:string -> unit
  ...

Full name: FSharp.Markdown.Markdown
static member Markdown.Parse : text:string -> MarkdownDocument
static member Markdown.Parse : text:string * newline:string -> MarkdownDocument
property MarkdownDocument.DefinedLinks: System.Collections.Generic.IDictionary<string,(string * string option)>
val par : MarkdownParagraph
property MarkdownDocument.Paragraphs: MarkdownParagraphs
union case MarkdownParagraph.Heading: size: int * body: MarkdownSpans * range: MarkdownRange option -> MarkdownParagraph
Multiple items
union case MarkdownSpan.Literal: text: string * range: MarkdownRange option -> MarkdownSpan

--------------------
type LiteralAttribute =
  inherit Attribute
  new : unit -> LiteralAttribute

Full name: Microsoft.FSharp.Core.LiteralAttribute

--------------------
new : unit -> LiteralAttribute
val text : string
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
val collectSpanLinks : span:MarkdownSpan -> seq<string>

Full name: Markdown.collectSpanLinks


 Returns all links in a specified span node
val span : MarkdownSpan
Multiple items
val seq : sequence:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Core.Operators.seq

--------------------
type seq<'T> = System.Collections.Generic.IEnumerable<'T>

Full name: Microsoft.FSharp.Collections.seq<_>
union case MarkdownSpan.DirectLink: body: MarkdownSpans * link: string * title: string option * range: MarkdownRange option -> MarkdownSpan
val url : string
union case MarkdownSpan.IndirectLink: body: MarkdownSpans * original: string * key: string * range: MarkdownRange option -> MarkdownSpan
val key : string
val fst : tuple:('T1 * 'T2) -> 'T1

Full name: Microsoft.FSharp.Core.Operators.fst
module Matching

from FSharp.Markdown
Multiple items
val SpanLeaf : Matching.SpanLeafInfo -> MarkdownSpan

Full name: FSharp.Markdown.Matching.SpanLeaf

--------------------
active recognizer SpanLeaf: MarkdownSpan -> Choice<Matching.SpanLeafInfo,(Matching.SpanNodeInfo * MarkdownSpans)>

Full name: FSharp.Markdown.Matching.( |SpanLeaf|SpanNode| )
Multiple items
val SpanNode : Matching.SpanNodeInfo * spans:MarkdownSpans -> MarkdownSpan

Full name: FSharp.Markdown.Matching.SpanNode

--------------------
active recognizer SpanNode: MarkdownSpan -> Choice<Matching.SpanLeafInfo,(Matching.SpanNodeInfo * MarkdownSpans)>

Full name: FSharp.Markdown.Matching.( |SpanLeaf|SpanNode| )
val spans : MarkdownSpans
val s : MarkdownSpan
val collectParLinks : par:MarkdownParagraph -> seq<string>

Full name: Markdown.collectParLinks


 Returns all links in the specified paragraph node
Multiple items
val ParagraphLeaf : Matching.ParagraphLeafInfo -> MarkdownParagraph

Full name: FSharp.Markdown.Matching.ParagraphLeaf

--------------------
active recognizer ParagraphLeaf: MarkdownParagraph -> Choice<Matching.ParagraphLeafInfo,(Matching.ParagraphNestedInfo * MarkdownParagraphs list),(Matching.ParagraphSpansInfo * MarkdownSpans)>

Full name: FSharp.Markdown.Matching.( |ParagraphLeaf|ParagraphNested|ParagraphSpans| )
Multiple items
val ParagraphNested : Matching.ParagraphNestedInfo * pars:MarkdownParagraphs list -> MarkdownParagraph

Full name: FSharp.Markdown.Matching.ParagraphNested

--------------------
active recognizer ParagraphNested: MarkdownParagraph -> Choice<Matching.ParagraphLeafInfo,(Matching.ParagraphNestedInfo * MarkdownParagraphs list),(Matching.ParagraphSpansInfo * MarkdownSpans)>

Full name: FSharp.Markdown.Matching.( |ParagraphLeaf|ParagraphNested|ParagraphSpans| )
val pars : MarkdownParagraphs list
val ps : MarkdownParagraphs
val p : MarkdownParagraph
Multiple items
val ParagraphSpans : Matching.ParagraphSpansInfo * spans:MarkdownSpans -> MarkdownParagraph

Full name: FSharp.Markdown.Matching.ParagraphSpans

--------------------
active recognizer ParagraphSpans: MarkdownParagraph -> Choice<Matching.ParagraphLeafInfo,(Matching.ParagraphNestedInfo * MarkdownParagraphs list),(Matching.ParagraphSpansInfo * MarkdownSpans)>

Full name: FSharp.Markdown.Matching.( |ParagraphLeaf|ParagraphNested|ParagraphSpans| )
module Seq

from Microsoft.FSharp.Collections
val collect : mapping:('T -> #seq<'U>) -> source:seq<'T> -> seq<'U>

Full name: Microsoft.FSharp.Collections.Seq.collect
val html : string

Full name: Markdown.html


 Collect links in the entire document
static member Markdown.WriteHtml : doc:MarkdownDocument -> string
static member Markdown.WriteHtml : doc:MarkdownDocument * writer:System.IO.TextWriter -> unit
static member Markdown.WriteHtml : doc:MarkdownDocument * newline:string -> string
static member Markdown.WriteHtml : doc:MarkdownDocument * writer:System.IO.TextWriter * newline:string -> unit
Fork me on GitHub