Parser.Advanced

Parsers

type Parser

An advanced Parser gives two ways to improve your error messages:

  • problem — Instead of all errors being a String, you can create a custom type like type Problem = BadIndent | BadKeyword String and track problems much more precisely.
  • context — Error messages can be further improved when precise problems are paired with information about where you ran into trouble. By tracking the context, instead of saying “I found a bad keyword” you can say “I found a bad keyword when parsing a list” and give folks a better idea of what the parser thinks it is doing.

I recommend starting with the simpler Parser module though, and when you feel comfortable and want better error messages, you can create a type alias like this:

import Parser.Advanced

type alias MyParser a =
  Parser.Advanced.Parser Context Problem a

type Context = Definition String | List | Record

type Problem = BadIndent | BadKeyword String

All of the functions from Parser should exist in Parser.Advanced in some form, allowing you to switch over pretty easily.

run : Parser c x a -> String -> Result (Array (DeadEnd c x)) a

This works just like Parser.run. The only difference is that when it fails, it has much more precise information for each dead end.

type alias DeadEnd context problem =
{ row : Int, col : Int, problem : problem, contextStack : Array { row : Int, col : Int, context : context } }

Say you are parsing a function named viewHealthData that contains a list. You might get a DeadEnd like this:

{ row = 18
, col = 22
, problem = UnexpectedComma
, contextStack =
    [ { row = 14
      , col = 1
      , context = Definition "viewHealthData"
      }
    , { row = 15
      , col = 4
      , context = List
      }
    ]
}

We have a ton of information here! So in the error message, we can say that “I ran into an issue when parsing a list in the definition of viewHealthData. It looks like there is an extra comma.” Or maybe something even better!

Furthermore, many parsers just put a mark where the problem manifested. By tracking the row and col of the context, we can show a much larger region as a way of indicating “I thought I was parsing this thing that starts over here.” Otherwise you can get very confusing error messages on a missing ] or } or ) because “I need more indentation” on something unrelated.

Note: Rows and columns are counted like a text editor. The beginning is row=1 and col=1. The col increments as characters are chomped. When a \n is chomped, row is incremented and col starts over again at 1.

inContext : context -> Parser context x a -> Parser context x a

This is how you mark that you are in a certain context. For example, here is a rough outline of some code that uses inContext to mark when you are parsing a specific definition:

import Char
import Parser.Advanced exposing (..)
import Set

type Context
  = Definition String
  | List

definition : Parser Context Problem Expr
definition =
  functionName
    |> andThen definitionBody

definitionBody : String -> Parser Context Problem Expr
definitionBody name =
  inContext (Definition name) <|
    succeed (Function name)
      |= arguments
      |. symbol (Token "=" ExpectingEquals)
      |= expression

functionName : Parser c Problem String
functionName =
  variable
    { start = Char.isLower
    , inner = Char.isAlphaNum
    , reserved = Set.fromList ["let","in"]
    , expecting = ExpectingFunctionName
    }

First we parse the function name, and then we parse the rest of the definition. Importantly, we call inContext so that any dead end that occurs in definitionBody will get this extra context information. That way you can say things like, “I was expecting an equals sign in the view definition.” Context!

type Token
= Token String x

With the simpler Parser module, you could just say symbol "," and parse all the commas you wanted. But now that we have a custom type for our problems, we actually have to specify that as well. So anywhere you just used a String in the simpler module, you now use a Token Problem in the advanced module:

type Problem
  = ExpectingComma
  | ExpectingListEnd

comma : Token Problem
comma =
  Token "," ExpectingComma

listEnd : Token Problem
listEnd =
  Token "]" ExpectingListEnd

You can be creative with your custom type. Maybe you want a lot of detail. Maybe you want looser categories. It is a custom type. Do what makes sense for you!


Everything past here works just like in the Parser module, except that String arguments become Token arguments, and you need to provide a Problem for certain scenarios.


Building Blocks

int : x -> x -> Parser c x Int

Just like Parser.int where you have to handle negation yourself. The only difference is that you provide a two potential problems:

int : x -> x -> Parser c x Int
int expecting invalid =
  number
    { int = Ok identity
    , hex = Err invalid
    , octal = Err invalid
    , binary = Err invalid
    , float = Err invalid
    , invalid = invalid
    , expecting = expecting
    }

You can use problems like ExpectingInt and InvalidNumber.

float : x -> x -> Parser c x Float

Just like Parser.float where you have to handle negation yourself. The only difference is that you provide a two potential problems:

float : x -> x -> Parser c x Float
float expecting invalid =
  number
    { int = Ok toFloat
    , hex = Err invalid
    , octal = Err invalid
    , binary = Err invalid
    , float = Ok identity
    , invalid = invalid
    , expecting = expecting
    }

You can use problems like ExpectingFloat and InvalidNumber.

number : { int : Result x (Int -> a), hex : Result x (Int -> a), octal : Result x (Int -> a), binary : Result x (Int -> a), float : Result x (Float -> a), invalid : x, expecting : x } -> Parser c x a

Just like Parser.number where you have to handle negation yourself. The only difference is that you provide all the potential problems.

symbol : Token x -> Parser c x {}

Just like Parser.symbol except you provide a Token to clearly indicate your custom type of problems:

comma : Parser Context Problem {}
comma =
  symbol (Token "," ExpectingComma)
keyword : Token x -> Parser c x {}

Just like Parser.keyword except you provide a Token to clearly indicate your custom type of problems:

let_ : Parser Context Problem {}
let_ =
  symbol (Token "let" ExpectingLet)

Note that this would fail to chomp letter because of the subsequent characters. Use token if you do not want that last letter check.

variable : { start : Char -> Bool, inner : Char -> Bool, reserved : Set String, expecting : x } -> Parser c x String

Just like Parser.variable except you specify the problem yourself.

end : x -> Parser c x {}

Just like Parser.end except you provide the problem that arises when the parser is not at the end of the input.

Pipelines

succeed : a -> Parser c x a

Just like Parser.succeed

(|=) : Parser c x (a -> b) -> Parser c x a -> Parser c x b

Just like the (|=) from the Parser module.

(|.) : Parser c x keep -> Parser c x ignore -> Parser c x keep

Just like the (|.) from the Parser module.

lazy : ({} -> Parser c x a) -> Parser c x a

Just like Parser.lazy

andThen : (a -> Parser c x b) -> Parser c x a -> Parser c x b

Just like Parser.andThen

problem : x -> Parser c x a

Just like Parser.problem except you provide a custom type for your problem.

Branches

oneOf : Array (Parser c x a) -> Parser c x a

Just like Parser.oneOf

map : (a -> b) -> Parser c x a -> Parser c x b

Just like Parser.map

backtrackable : Parser c x a -> Parser c x a

Just like Parser.backtrackable

commit : a -> Parser c x a

Just like Parser.commit

token : Token x -> Parser c x {}

Just like Parser.token except you provide a Token specifying your custom type of problems.

Loops

sequence : { start : Token x, separator : Token x, end : Token x, spaces : Parser c x {}, item : Parser c x a, trailing : Trailing } -> Parser c x (Array a)

Just like Parser.sequence except with a Token for the start, separator, and end. That way you can specify your custom type of problem for when something is not found.

type Trailing
= Forbidden
| Optional
| Mandatory

What’s the deal with trailing commas? Are they Forbidden? Are they Optional? Are they Mandatory? Welcome to shapes club!

loop : state -> (state -> Parser c x (Step state a)) -> Parser c x a

Just like Parser.loop

type Step
= Loop state
| Done a

Just like Parser.Step

Whitespace

spaces : Parser c x {}

Just like Parser.spaces

lineComment : Token x -> Parser c x {}

Just like Parser.lineComment except you provide a Token describing the starting symbol.

multiComment : Token x -> Token x -> Nestable -> Parser c x {}

Just like Parser.multiComment except with a Token for the open and close symbols.

type Nestable
= NotNestable
| Nestable

Works just like Parser.Nestable to help distinguish between unnestable /* */ comments like in JS and nestable {- -} comments like in Gren.

Chompers

getChompedString : Parser c x a -> Parser c x String

Just like Parser.getChompedString

chompIf : (Char -> Bool) -> x -> Parser c x {}

Just like Parser.chompIf except you provide a problem in case a character cannot be chomped.

chompWhile : (Char -> Bool) -> Parser c x {}

Just like Parser.chompWhile

chompUntil : Token x -> Parser c x {}

Just like Parser.chompUntil except you provide a Token in case you chomp all the way to the end of the input without finding what you need.

chompUntilEndOr : String -> Parser c x {}

Just like Parser.chompUntilEndOr

mapChompedString : (String -> a -> b) -> Parser c x a -> Parser c x b

Just like Parser.mapChompedString

Indentation

withIndent : Int -> Parser c x a -> Parser c x a

Just like Parser.withIndent

getIndent : Parser c x Int

Just like Parser.getIndent

Positions

getPosition : Parser c x { row : Int, col : Int }

Just like Parser.getPosition

getRow : Parser c x Int

Just like Parser.getRow

getCol : Parser c x Int

Just like Parser.getCol

getOffset : Parser c x Int

Just like Parser.getOffset

getSource : Parser c x String

Just like Parser.getSource