PrettyExpressive
Usage
Step 1: Implement a CostFactory.
PrettyExpressive provides a default implementation, but you can implement your own.
import PrettyExpressive as P
cf : P.CostFactory P.DefaultCostTuple
cf =
P.defaultCostFactory { pageWidth = 80, computationWidth = Nothing }
Step 2: Use Doc constructors to create your document.
document : P.Doc
document =
(P.text "hello" |> P.concat (P.text " world"))
For writing a pretty printer for a programming language, the essential kit comes down to five constructors and one combinator:
text -- the atomic unit. Every keyword, identifier, operator,
literal, and punctuation mark becomes a text node. Everything else is
just about how text nodes get arranged.
concat -- puts two documents next to each other. Every
compound structure (expressions, statements, declarations) is built by
concatenating smaller pieces. It's the only way to join things.
nl -- a line break that flattens to a space. This is the standard
separator for anything that could go on one line: between a function
name and its arguments, between items in a list, between then and its
expression. The printer automatically decides whether to use the space
or the newline.
hardNl -- a line break that cannot be flattened. Used wherever a
newline is mandatory regardless of width: between top-level definitions,
between statements in a do block, after a { that opens a multi-line
body.
nest -- indents the contents of a block. Every let, in,
where, of, function body, and case branch needs nest to push its
contents right by the appropriate number of spaces.
group -- the key decision point. group d says "try to fit d
on one line; if it doesn't fit, break it." Almost every place where the
layout is optional -- argument lists, record literals, if-then-else,
the arms of a case -- gets wrapped in group.
Everything else in the library is either derived from these six
(e.g. hardNlConcat = concat d (concat hardNl d), vcat = foldDoc hardNlConcat), or handles specialized situations (align for hanging
indentation, breakDoc when tokens should touch rather than have a space,
twoColumns for things like = alignment in record definitions).
group uses choice, to decide between the more optimal of two documents.
The choice constructor may be useful in your own specialized situations.
Step 3: Render the document.
Use a printing function, usually prettyFormat, to render your
Doc as a string.
printed : Maybe String
printed =
P.prettyFormat cf document
The result from the printing functions is a Maybe String, because technically a document may fail to print. This can only happen if the you create your document incorrectly.
For any document built from the normal constructors without explicitly
embedding failDoc, prettyFormat will always return Just. The
Nothing case is only reachable if you deliberately construct an
unflattenable document and then use it as both branches of a choice,
which is a document construction error, not something the printer
decides. The Nothing return exists to make that error visible rather
than silently picking a wrong layout.
In practice, only one thing causes a document to fail: every branch
of every choice in the document must contain a failDoc or evaluate
to one.
Terminology
flattening
A document like group (text "foo" |> concat nl |> concat (text "bar")) represents two possible renderings:
foo bar = everything on one line
foo
bar = broken across lines
"Flat form" just means the one-line version. "Flatten" is the operation that transforms a document so that it only knows how to produce the one-line version. What flatten actually does It walks the document tree and replaces every newline node with its designated substitute string:
- nl => text " " (the space is the substitute)
- breakDoc => text "" (empty string is the substitute)
- hardNl => failDoc (no substitute exists; flattening fails)
Everything else -- concat, nest, align, choice -- is recursed into. The indentation-control wrappers (nest, align, reset) are simply dropped, because in a flat document there are no newlines and therefore indentation is irrelevant.
flatten is mostly an implementation detail of group:
group d = choice d (flatten d)
This says: "offer the printer a choice between the original document d
(which may have real newlines) and flatten d (which has all newlines
replaced with their substitutes)." The printer evaluates both and picks
whichever costs less -- typically the flat version if it fits on the
page, the broken version if it doesn't. You would rarely call flatten
directly. The main case where you would is flattenAlignedConcat and hcat,
where the left operand is explicitly flattened to guarantee it stays on
one line before being aligned-concatenated with the right operand.
The CostFactory
Types
A cost factory defines how printing costs are computed and compared.
Pass a value of this type to the rendering functions (prettyFormat etc.)
to control the optimization objective.
You can supply your own instance or use defaultCostFactory.
A cost factory is a record of functions rather than a type class, because Gren (like Elm) does not have type classes. Each field corresponds to one operation in the abstract cost algebra described in the paper.
Contracts that a valid cost factory must satisfy:
leis a total ordering.- If
le a bandle c dthenle (combine a c) (combine b d). - If
a <= bthenle (text a l) (text b l). - If
a <= bthenle (newline a) (newline b). text c (a+b) = combine (text c a) (text (c+a) b).combineis associative with identitytext 0 0.text c 0 = text 0 0for anyc.- If
a <= bthenle (twoColumnsOverflow a) (twoColumnsOverflow b). - If
a <= bthenle (twoColumnsBias a) (twoColumnsBias b).
The cost record used by defaultCostFactory.
badness-- sum of squared overflows past the page width. Squaring means one large overflow is penalized more heavily than several small ones, which steers the printer toward layouts where overflows are spread evenly when they cannot be avoided entirely.columnOverflow-- sum of overflows past column separators intwoColumnslayouts.height-- total number of newlines in the layout (i.e. number of output lines minus one).
Functions
The default cost factory.
Parameters:
pageWidth-- the page width limit. Text extending past this column incursbadnesscost.computationWidth-- optional computation width limit. Defaults tofloor(1.2 * pageWidth). When the current column or indentation level exceeds this during rendering, the result is markedTaintedand the printer stops maintaining the full candidate set.
Cost ordering: Costs are compared lexicographically -- badness is
minimized first, then columnOverflow, then height. The printer will
always prefer a layout that fits within the page width (even at the cost of
more newlines), and among equally bad layouts it prefers fewer column
overflows, then fewer newlines.
twoColumnsOverflow: Charges columnOverflow = w and height = 1
when a left cell overshoots the separator by w columns. The extra height
penalty discourages the printer from choosing a separator so narrow that every
left cell overflows.
twoColumnsBias: Returns zero for all inputs in the default factory. The
preference for leftmost separators is achieved by the order in which loopLimit
tries candidates, not by a non-zero bias cost.
Documents
This section describes every Doc variant you can construct, what it renders to,
how it interacts with flatten, and when to use it.
Primitives
text s
Renders the string s literally at the current column position.
s must not contain a newline character.
text "hello" -> hello
Flattens to itself -- text is already "flat".
Use for any literal string content: keywords, identifiers, punctuation, numbers.
empty
Equivalent to text "". Renders nothing and advances the column by zero.
Useful as a neutral element when building documents with foldDoc or as an
explicit "nothing here" placeholder in conditional logic.
failDoc
A document that always fails to render.
On its own it causes rendering to return Nothing. Its power comes from pairing
it with choice: any branch containing failDoc is pruned, so the printer
falls back to the other branch automatically.
-- Always renders "Hades" because the left branch fails
choice (concat (text "Sea of Stars") failDoc) (text "Hades")
-- -> Hades
Use for marking layouts that are logically impossible or that you explicitly
want to forbid (e.g. the flattened form of hardNl).
Newlines
All newline variants insert a real \n followed by indentation spaces
(the number of spaces equals the current indentation level set by nest and
align). They differ only in what they do when flatten is applied.
nl
A newline that flattens to a single space " ".
-- page wide enough to fit on one line:
flatten (text "Fire Emblem" `concat` nl `concat` text "Awakening")
-- -> Fire Emblem Awakening
-- page too narrow; newline is kept:
-- -> Fire Emblem
-- Awakening
Use for the common case of "a space when it fits, a newline when it doesn't" --
the bread-and-butter of group.
breakDoc
A newline that flattens to an empty string "".
flatten (text "Mario" `concat` breakDoc `concat` text "Bros")
-- -> MarioBros
Use for optional line breaks where no separator is wanted in the flat form, e.g. breaking a long identifier at a natural boundary without inserting a space.
hardNl
A newline that cannot be flattened. Applying flatten to any document
containing hardNl produces failDoc.
flatten (text "x" `concat` hardNl `concat` text "y")
-- -> failDoc (renders as Nothing)
Use for mandatory line breaks that must appear regardless of available width --
e.g. separating top-level definitions, or the newlines between rows built by vcat.
newline (Maybe String)
The general newline primitive that the three shorthands above are built from.
| Argument | Flattens to |
|---|---|
Just s |
text s |
Nothing |
failDoc |
nl is newline (Just " "), breakDoc is newline (Just ""), and
hardNl is newline Nothing.
Use for custom flat-form separators, e.g. newline (Just ", ") to get a
comma-space when the content fits on one line and a newline otherwise.
Concatenation
concat a b
Appends document b immediately after document a, without aligning b
to the column where a ended. The right document flows freely -- it is not
treated as a rigid box.
let left = concat (text "Splatoon") (concat nl (text "Nier"))
let right = concat (text "Automata") (concat nl (text "FEZ"))
concat left right
-- -> Splatoon
-- NierAutomata <- "Nier" and "Automata" are on the same line
-- FEZ
This "unaligned" behaviour makes concat the right tool for C-style code where
curly braces and brackets should not create rigid indentation.
Use for joining any two documents. It is the fundamental building block -- all other combinators ultimately use it.
Layout choices
choice a b
Lets the printer choose between document a and document b, picking
whichever has the lower cost according to the cost factory.
-- With a wide page, "Chrono Trigger" fits:
choice (text "Chrono Trigger") (concat (text "Octopath") (concat nl (text "Traveler")))
-- -> Chrono Trigger
-- With a narrow page, the two-line version wins:
-- -> Octopath
-- Traveler
Both branches are evaluated; failDoc branches are pruned automatically.
When the two options have equal cost the first branch wins.
Use for any "try this layout first, fall back to that one" decision.
group d is the most common special case: choice d (flatten d).
Indentation control
nest n d
Increases the indentation level by n spaces for the duration of document d.
Only affects where newlines land -- it does not insert spaces immediately.
nest 4 (concat (text "if true:") (concat nl (text "pass")))
-- -> if true:
-- pass <- 4 extra spaces of indent
nest has no effect on text, align, or reset nodes at the top level of d,
because those nodes manage their own indentation.
Use for block bodies, continuation lines, and any structure where child content should be indented relative to its parent.
align d
Sets the indentation level to the current column position for the duration
of d. This creates a "box" whose left edge is anchored to where it starts.
-- Printing starts at column 11 ("Languages: " is 11 chars):
concat (text "Languages: ")
(align (concat (text "Racket")
(concat nl
(concat (text "OCaml")
(concat nl
(text "Pyret"))))))
-- -> Languages: Racket
-- OCaml <- lined up under "Racket"
-- Pyret
align has no effect on text, another align, or reset at the top level of d.
Use for hanging indentation, argument lists, and any structure that should remain visually aligned to its opening position.
reset d
Resets the indentation level to 0 for the duration of d.
nest 8 (concat (text "header:") (concat nl (reset (text "body"))))
-- -> header:
-- body <- back at column 0 despite the nest
Like nest and align, reset has no effect on text, align, or another
reset at the top level of d.
Use for top-level declarations or other constructs that must always start at the left margin, even when they appear inside indented context.
Cost annotation
addCost c d
Renders d normally but adds the extra cost c to whatever cost d incurs.
Does not change the layout in any way.
-- Penalise a particular layout choice by adding a large cost
addCost (1000, 0, 0) (text "verbose form")
Use for nudging the optimiser toward or away from specific layouts when the default cost model does not capture your preferences precisely.
Multi-column layout
twoColumns (List (Doc, Doc))
Formats a list of (left, right) pairs as an aligned two-column table.
The printer searches for the optimal column separator -- the column at which
the right side begins -- and picks the position that minimises the total cost.
twoColumns
[ (text "empty", text " :: Doc")
, (text "nest", text " :: Int -> Doc -> Doc")
, (text "linebreak", text " :: Doc")
]
-- Wide page:
-- empty :: Doc
-- nest :: Int -> Doc -> Doc
-- linebreak :: Doc
-- Narrow page (separator pushed right or rows allowed to overflow):
-- empty :: Doc
-- nest :: Int -> Doc -> Doc
-- linebreak :: Doc
Two cost-factory hooks govern the search:
twoColumnsOverflow w-- charged when a left cell is wider than the chosen separator and overflows into the right column. Typically set higher than the normal overflow cost so the printer prefers going past the page width over clobbering the right column.twoColumnsBias w-- a small penalty proportional to how far right the separator sits, so that among equally good layouts the tightest (leftmost) separator wins.
A list of one pair degenerates to align (concat left right).
An empty list degenerates to empty.
Any pair containing failDoc causes the whole node to become failDoc.
Use for type-signature blocks, pattern-match tables, record fields, or any structure where vertical alignment across rows matters.
Derived combinators (not Doc constructors, but built from them)
These are not separate Doc variants -- they are functions that produce
ordinary Doc trees -- but they are listed here for completeness.
| Combinator | Definition | Use for |
|---|---|---|
group d |
choice d (flatten d) |
Try flat; fall back to broken |
flatten d |
Replace newlines with flat alternatives | Force single-line rendering |
alignedConcat a b |
concat a (align b) |
Aligned ("box") concatenation |
hardNlConcat a b |
concat a (concat hardNl b) |
Mandatory line between two docs |
flattenAlignedConcat a b |
alignedConcat (flatten a) b |
Aligned concat with flat left side |
vcat ds |
Fold with hardNlConcat |
Stack documents vertically |
hcat ds |
Fold with flattenAlignedConcat |
Stack documents horizontally |
foldDoc f ds |
Left-fold with f, identity empty |
Custom fold over a doc list |
nest n d |
Increase indent by n |
Block indentation |
reset d |
Set indent to 0 | Top-level declarations |
addCost c d |
Attach extra cost | Fine-tune cost model |
Types
The document type, parameterized by the cost type.
You can implement the cost to be any type. It is your CostFactory which calculates and compares costs. The PrettyExpressive library code doesn't use the cost directly.
A Doc cost is an abstract description of a piece of formatted text. It
does not contain a concrete string; instead it encodes choices, indentation
rules, and cost annotations that the renderer resolves into an optimal string.
Construct Doc values exclusively with the functions in this module (text,
concat, choice, nest, etc.). The constructors of the type are exposed so
that pattern matching is possible in callers (e.g. unit tests), but you should
not construct DocContext, DocEvaled, or DocBlank directly -- those are
internal nodes used by the renderer and twoColumns.
A pair of documents used as the payload of DocConcat, DocChoice, and
rows of DocTwoColumns. Named fields (a for the left/first document and b
for the right/second) are used instead of a plain tuple because Gren record
patterns are more readable in deep when expressions.
Functions
Attach an extra cost c to document d without changing its layout.
The additional cost is combined with d's ordinary layout cost using
CostFactory.combine. This lets you nudge the optimizer toward or away from
specific layouts when the default cost model does not capture your preferences
precisely.
addCost on DocFail returns DocFail unchanged, since a failing document
produces no layout to charge.
Set the indentation level to the current column position for d.
This "pins" d so that its internal newlines indent back to the column where
d starts, regardless of the surrounding nest context. It is the key
combinator for hanging indentation and argument-list alignment.
Pass-through cases are the same as nest: DocFail, DocAlign, DocReset,
DocText, and DocAddCost (floated outward).
Aligned concatenation: append b to a with b anchored to the column
where a ends.
Equivalent to concat a (align b). Also known as box concatenation -- b
is treated as a rigid box whose left edge is pinned to a's end column.
A newline that flattens to an empty string.
Use this when adjacent tokens should touch with no separator in the flat form, but may be placed on separate lines in the broken form. It produces an empty string only when it appears inside group (or an explicit flatten).
Choose between two document layouts; the printer picks the cheaper one.
Both branches are fully evaluated by the resolver. The resulting measure sets are merged with Pareto-dominance filtering: a layout from one branch is discarded if the other branch has a layout that is both cheaper and ends no further right.
If either branch is failDoc it is dropped and the other is returned
unconditionally, without creating a DocChoice node.
A comma. Equivalent to text ",".
Unaligned concatenation of two documents.
concat a b renders a followed immediately by b. The right document is
not treated as a rigid box: it does not inherit any special alignment from
where a ended. This "unaligned" semantics makes it straightforward to format
C-style code where braces, brackets, and parentheses should flow freely.
The smart constructor applies several normalization rules at build time to keep the document tree compact:
- Either operand being
DocFailshort-circuits toDocFail. - Either operand being
DocText { width = 0 }(i.e.empty) is dropped. - Two adjacent
DocTextnodes are fused into one. - A
DocAddCoston either operand is floated outward, keeping cost annotations above structural nodes. This normalizes the tree shape so the resolver sees consistent node forms.
Concatenate an array of documents right-associatively.
concat_array [ a, b, c, d ]
== concat a (concat b (concat c d))
Right-associativity matches the natural reading order and avoids building a
left-leaning tree that would require deep recursion to resolve. An empty array
returns empty; a singleton array returns its sole element unchanged.
A double-quote character. Equivalent to text "\"".
The empty document. Equivalent to text "".
empty is the identity element of concat: concat empty d == d and
concat d empty == d. These equalities are enforced structurally in the
concat smart constructor, so no extra node is allocated.
A document that always fails to render.
On its own, rendering a failDoc returns Nothing. Its power comes from
pairing it with choice: the printer prunes any branch that contains a
failDoc, automatically falling back to the other branch.
Flatten a, then concatenate with b aligned to a's end column.
Equivalent to alignedConcat (flatten a) b. Useful when you want aligned
(box) concatenation but want to guarantee that the left operand never spans
multiple lines -- if flatten a fails, the whole expression fails and a
surrounding choice can select an alternative layout.
Fold an array of documents with a binary combinator.
The fold is left-associative: foldDoc f [a, b, c] is f (f a b) c.
Returns empty for an empty array.
Try the document as-is; fall back to its flattened form if that is cheaper.
group d is shorthand for choice d (flatten d). It is the standard
combinator for "break this if necessary, otherwise keep it on one line" -- the
core idiom of most pretty printers.
Because choice drops failDoc branches, if flatten d fails (e.g. because
d contains a hardNl) then group d is equivalent to d alone.
A newline that cannot be flattened.
flatten hardNl yields failDoc, causing any layout that tried to flatten
across a hardNl to be pruned. Use this wherever a line break is mandatory
regardless of available width -- for instance between top-level definitions.
Concatenate a and b with a mandatory line break between them.
Equivalent to concat a (concat hardNl b). Used by vcat to stack
documents vertically.
Concatenate documents horizontally with flatten-aligned concatenation.
hcat [a, b, c] places each document to the right of the previous one,
aligned to the column where the previous document ended, with each left
operand flattened first. If any left operand fails to flatten, the
surrounding choice can select an alternative.
Returns empty for an empty array.
A left curly brace. Equivalent to text "{".
A left square bracket. Equivalent to text "[".
A left parenthesis. Equivalent to text "(".
Increase the indentation level by n spaces for the duration of d.
nest only affects where newlines land -- it does not insert any spaces
immediately. If multiple nest calls are nested, their amounts accumulate.
The smart constructor has several pass-through cases where wrapping in
DocNest would have no effect:
DocFail-- propagate failure.DocAlign--alignhas already fixed the indentation to the current column; an outernestcannot override that.DocReset--resetfixes indentation to 0; same reasoning.DocText-- plain text carries no newlines, so indentation is irrelevant.DocAddCost-- float the cost annotation outward (same normalization as inconcat).
The general newline primitive. All other newline combinators are built from this one.
newline Nothing-- a hard newline:flattenturns it intofailDoc.newline (Just s)-- a soft newline:flattenturns it intotext s.
After the newline is emitted, indentation spaces are inserted according to
the current indentation level (as set by nest, align, and reset).
A newline that flattens to a single space.
This is the most common newline in practice: it means "break the line here if
necessary; otherwise substitute a space". It is the building block of group.
A right curly brace. Equivalent to text "}".
A right square bracket. Equivalent to text "]".
Reset the indentation level to 0 for d.
Any newlines inside d will indent to column 0, regardless of the surrounding
nest or align context. Useful for top-level declarations or other constructs
that must always begin at the left margin.
Pass-through cases match nest and align.
A right parenthesis. Equivalent to text ")".
A single space character. Equivalent to text " ".
A document for literal text content. The string must not contain a newline
character; use nl, hardNl, or newline for line breaks.
Unicode width caveat: The width field is currently set to
String.count s, which counts Unicode scalar values (code points). This is
incorrect for scripts such as Hindi or Thai where multiple code points combine
into a single rendered character cell (a grapheme cluster). Once a Gren
Unicode grapheme-cluster package is available this should be replaced with a
function that counts rendered column widths.
Two-column table layout for an array of { a, b } document pairs.
The printer searches for the optimal column separator -- the column at which
the right (b) cells begin -- and picks the position that minimizes total cost
across all rows simultaneously.
Each row is rendered as:
concat left_cell (concat padding right_cell)
where padding is either blank spaces (if the left cell ended before the
separator) or a twoColumnsOverflow cost charge (if the left cell overshot
the separator).
Degenerate cases:
- Empty array ->
empty. - Single pair ->
align (concat a b)(no separator search needed). - Any pair where
aorbisfailDoc->failDoc.
Cost-factory hooks:
twoColumnsOverflow w-- charged when a left cell overshoots the separator bywcolumns. Setting this higher than the normal page-overflow cost makes the printer prefer exceeding the page width over clobbering the right column.twoColumnsBias w-- a small penalty proportional tow, the distance of the separator from the left edge. This breaks ties in favour of the leftmost (tightest) valid separator.
Stack documents vertically with hard newlines between them.
vcat [a, b, c] is equivalent to hardNlConcat a (hardNlConcat b c).
Returns empty for an empty array.
Printing
Types
Debugging information returned alongside a rendered string.
isTainted--Truewhen the computation limit was exceeded during rendering. A tainted result is still valid, but the printer may not have found the globally optimal layout because it stopped tracking some candidate measures beyondCostFactory.limit.cost-- the cost of the chosen layout, in whatever cost typecostthe chosenCostFactoryuses.
Functions
Render a document to a string, starting at column 0.
Returns Nothing if the document fails to render.
Like prettyFormat, but printing begins at column initC.
Render a document to a string with debug information appended.
The debug output includes a column ruler, the rendered content with a |
marker at the page width, and the isTainted flag and cost of the chosen
layout. The exact format is determined by CostFactory.debugFormat; the
default implementation is makeDebugFormat.
Returns Nothing if the document fails to render.
Like prettyFormatDebug, but printing begins at column initC.
Render a document to a string, starting at column 0, and return both the rendered string and layout metadata.
Returns Nothing if the document fails to render (e.g. it is failDoc or
every layout branch fails).
Use prettyFormatInfoAt if the output will be appended after existing content
that starts at a non-zero column.
Like prettyFormatInfo, but printing begins at column initC.
Use this when the rendered output will be placed after content that is already
initC columns wide -- for example, when the document follows a label or
prompt on the same line.
Other Functions
You aren't likely to use these, except for testing or debugging.
Replace all newlines in a document with their "flat" alternative. That is, convert the document to a one-line version.
The mapping for each newline variant is:
nl->text " "(wasDocNewline (Just " "))breakDoc->text ""(wasDocNewline (Just ""))hardNl->failDoc(wasDocNewline Nothing)newline (Just s)->text snewline Nothing->failDoc
For structural nodes, flatten recurses into the children:
DocConcat,DocChoice-- recurse into both children.DocNest,DocAlign,DocReset-- these only affect indentation; in a flat document there are no newlines to indent to, so the wrapper is dropped and only the inner document is flattened.DocAddCost-- cost annotation is preserved; inner doc is flattened.DocTwoColumns-- a two-column layout always has at least two rows and therefore contains mandatory newlines between rows. ReturnsfailDoc.DocContext,DocEvaled-- internal nodes that should not appear in user-constructed documents. ReturnfailDocdefensively.
Produce a debug string from a rendered document.
The output has four parts:
- A column ruler: the digits 1-9, 0, 1-9, 0, ... up to
pageWidthcolumns. - The rendered content, with each line padded (or truncated) to
pageWidthand a|marker appended. Lines longer thanpageWidthhave the marker inserted at the overflow point, with the overflow shown to the right. is_tainted: trueoris_tainted: false.cost: <stringOfCost output>.
Example for pageWidth = 10:
1234567890
Hello! |
World |
is_tainted: false
cost: (0 0 1)
This function is used as the debugFormat field of defaultCostFactory but
can also be called directly or adapted for custom cost factories.