Bytes.Encode
Functions for turning things into bytes.
Describes how to generate a sequence of bytes.
These encoders snap together with sequence
so you can start with
small building blocks and put them together into a more complex encoding.
Turn an Encoder
into Bytes
.
encode (unsignedInt8 7) -- <07>
encode (unsignedInt16 BE 7) -- <0007>
encode (unsignedInt16 LE 7) -- <0700>
The encode
function is designed to minimize allocation. It figures out the
exact width necessary to fit everything in Bytes
and then generate that
value directly. This is valuable when you are encoding more elaborate data:
import Bytes exposing (Endianness(..))
import Bytes.Encode as Encode
type alias Person =
{ age : Int
, name : String
}
toEncoder : Person -> Encode.Encoder
toEncoder person =
Encode.sequence
[ Encode.unsignedInt16 BE person.age
, Encode.unsignedInt16 BE (Encode.getStringWidth person.name)
, Encode.string person.name
]
-- encode (toEncoder ({ age = 33, name = "Tom" })) == <00210003546F6D>
Did you know it was going to be seven bytes? How about when you have a hundred
people to serialize? And when some have Japanese and Norwegian names? Having
this intermediate Encoder
can help reduce allocation quite a lot!
Put together a bunch of builders. So if you wanted to encode three Float
values for the position of a ball in 3D space, you could say:
import Bytes exposing (Endianness(..))
import Bytes.Encode as Encode
type alias Ball = { x : Float, y : Float, z : Float }
ball : Ball -> Encode.Encoder
ball {x,y,z} =
Encode.sequence
[ Encode.float32 BE x
, Encode.float32 BE y
, Encode.float32 BE z
]
Integers
Encode integers from -128
to 127
in one byte.
Encode integers from -32768
to 32767
in two bytes.
Encode integers from -2147483648
to 2147483647
in four bytes.
Encode integers from 0
to 255
in one byte.
Encode integers from 0
to 65535
in two bytes.
Encode integers from 0
to 4294967295
in four bytes.
Floats
Encode 32-bit floating point numbers in four bytes.
Encode 64-bit floating point numbers in eight bytes.
Bytes
Copy bytes directly into the new Bytes
sequence. This does not record the
width though! You usually want to say something like this:
import Bytes exposing (Bytes, Endianness(..))
import Bytes.Encode as Encode
png : Bytes -> Encode.Encoder
png imageData =
Encode.sequence
[ Encode.unsignedInt32 BE (Bytes.width imageData)
, Encode.bytes imageData
]
This allows you to represent the width however is necessary for your protocol. For example, you can use Base 128 Varints for ProtoBuf, Variable-Length Integers for SQLite, or whatever else they dream up.
Strings
Encode a String
as a bunch of UTF-8 bytes.
encode (string "$20") -- <24 32 30>
encode (string "£20") -- <C2A3 32 30>
encode (string "€20") -- <E282AC 32 30>
encode (string "bread") -- <62 72 65 61 64>
encode (string "brød") -- <62 72 C3B8 64>
Some characters take one byte, while others can take up to four. Read more about UTF-8 to learn the details!
But if you just encode UTF-8 directly, how can you know when you get to the end of the string when you are decoding? So most protocols have an integer saying how many bytes follow, like this:
sizedString : String -> Encoder
sizedString str =
sequence
[ unsignedInt32 BE (getStringWidth str)
, string str
]
You can choose whatever representation you want for the width, which is helpful because many protocols use different integer representations to save space. For example:
- ProtoBuf uses Base 128 Varints
- SQLite uses Variable-Length Integers
In both cases, small numbers can fit just one byte, saving some space. (The SQLite encoding has the benefit that the first byte tells you how long the number is, making it faster to decode.) In both cases, it is sort of tricky to make negative numbers small.
Get the width of a String
in UTF-8 bytes.
getStringWidth "$20" == 3
getStringWidth "£20" == 4
getStringWidth "€20" == 5
getStringWidth "bread" == 5
getStringWidth "brød" == 5
Most protocols need this number to come directly before a chunk of UTF-8 bytes as a way to know where the string ends!
Read more about how UTF-8 works here.