Bytes.Encode

Functions for turning things into bytes.

type Encoder

Describes how to generate a sequence of bytes.

These encoders snap together with sequence so you can start with small building blocks and put them together into a more complex encoding.

encode : Encoder -> Bytes

Turn an Encoder into Bytes.

encode (unsignedInt8     7) -- <07>
encode (unsignedInt16 BE 7) -- <0007>
encode (unsignedInt16 LE 7) -- <0700>

The encode function is designed to minimize allocation. It figures out the exact width necessary to fit everything in Bytes and then generate that value directly. This is valuable when you are encoding more elaborate data:

import Bytes exposing (Endianness(..))
import Bytes.Encode as Encode

type alias Person =
  { age : Int
  , name : String
  }

toEncoder : Person -> Encode.Encoder
toEncoder person =
  Encode.sequence
    [ Encode.unsignedInt16 BE person.age
    , Encode.unsignedInt16 BE (Encode.getStringWidth person.name)
    , Encode.string person.name
    ]

-- encode (toEncoder ({ age = 33, name = "Tom" })) == <00210003546F6D>

Did you know it was going to be seven bytes? How about when you have a hundred people to serialize? And when some have Japanese and Norwegian names? Having this intermediate Encoder can help reduce allocation quite a lot!

sequence : Array Encoder -> Encoder

Put together a bunch of builders. So if you wanted to encode three Float values for the position of a ball in 3D space, you could say:

import Bytes exposing (Endianness(..))
import Bytes.Encode as Encode

type alias Ball = { x : Float, y : Float, z : Float }

ball : Ball -> Encode.Encoder
ball {x,y,z} =
  Encode.sequence
    [ Encode.float32 BE x
    , Encode.float32 BE y
    , Encode.float32 BE z
    ]

Integers

signedInt8 : Int -> Encoder

Encode integers from -128 to 127 in one byte.

signedInt16 : Endianness -> Int -> Encoder

Encode integers from -32768 to 32767 in two bytes.

signedInt32 : Endianness -> Int -> Encoder

Encode integers from -2147483648 to 2147483647 in four bytes.

unsignedInt8 : Int -> Encoder

Encode integers from 0 to 255 in one byte.

unsignedInt16 : Endianness -> Int -> Encoder

Encode integers from 0 to 65535 in two bytes.

unsignedInt32 : Endianness -> Int -> Encoder

Encode integers from 0 to 4294967295 in four bytes.

Floats

float32 : Endianness -> Float -> Encoder

Encode 32-bit floating point numbers in four bytes.

float64 : Endianness -> Float -> Encoder

Encode 64-bit floating point numbers in eight bytes.

Bytes

bytes : Bytes -> Encoder

Copy bytes directly into the new Bytes sequence. This does not record the width though! You usually want to say something like this:

import Bytes exposing (Bytes, Endianness(..))
import Bytes.Encode as Encode

png : Bytes -> Encode.Encoder
png imageData =
  Encode.sequence
    [ Encode.unsignedInt32 BE (Bytes.width imageData)
    , Encode.bytes imageData
    ]

This allows you to represent the width however is necessary for your protocol. For example, you can use Base 128 Varints for ProtoBuf, Variable-Length Integers for SQLite, or whatever else they dream up.

Strings

string : String -> Encoder

Encode a String as a bunch of UTF-8 bytes.

encode (string "$20")   -- <24 32 30>
encode (string "£20")   -- <C2A3 32 30>
encode (string "€20")   -- <E282AC 32 30>
encode (string "bread") -- <62 72 65 61 64>
encode (string "brød")  -- <62 72 C3B8 64>

Some characters take one byte, while others can take up to four. Read more about UTF-8 to learn the details!

But if you just encode UTF-8 directly, how can you know when you get to the end of the string when you are decoding? So most protocols have an integer saying how many bytes follow, like this:

sizedString : String -> Encoder
sizedString str =
  sequence
    [ unsignedInt32 BE (getStringWidth str)
    , string str
    ]

You can choose whatever representation you want for the width, which is helpful because many protocols use different integer representations to save space. For example:

In both cases, small numbers can fit just one byte, saving some space. (The SQLite encoding has the benefit that the first byte tells you how long the number is, making it faster to decode.) In both cases, it is sort of tricky to make negative numbers small.

getStringWidth : String -> Int

Get the width of a String in UTF-8 bytes.

getStringWidth "$20"   == 3
getStringWidth "£20"   == 4
getStringWidth "€20"   == 5
getStringWidth "bread" == 5
getStringWidth "brød"  == 5

Most protocols need this number to come directly before a chunk of UTF-8 bytes as a way to know where the string ends!

Read more about how UTF-8 works here.