Simple binary serialization¶

Simple binary serialization (SBS) features:

schema based
small schema language optimized for human readability
schema re-usability and organization into modules
small number of built in types
polymorphic data types
binary serialization
designed to enable simple and efficient implementation
optimized for small encoded data size

Example of SBS schema:

module Module

Entry(K, V) = Record {
    key: K
    value: V
}

Collection(K) = Choice {
    null: None
    bool: Entry(K, Boolean)
    int: Entry(K, Integer)
    float: Entry(K, Float)
    str: Entry(K, String)
    bytes: Entry(K, Bytes)
}

IntKeyCollection = Collection(Integer)

StrKeyCollection = Collection(String)

Schema definition¶

SBS shemas are written as UTF-8 encoded files with .sbs file extension. Characters ,, space, \t, \r and \n are considered white-space characters and are ignored. Characters (, ), {, }, :, = and white-space characters are used as delimiters between other identifiers. All other valid identifiers are defined by regex [A-Za-z][A-Za-z0-9_]*. Character # is used as start of comment that spans to the end of line.

Each file represents single SBS schema module. Name of SBS module is defined by module <name> directive where <name> represents user-defined module name. This directive is only mandatory part of each SBS schema and should be placed at the beginning of each .sbs file. Example of minimal valid SBS schema:

module ModuleName

Rest of .sbs files contains arbitrary number of user-defined types. Each type definition is written as <new_type>(<t1> <t2> ...) = <other_type> where:

<new_type>

Name of new user-defined type.
<t1>, <t2>, …

Identifiers representing parametric data type arguments used in <other_type> definition. If these arguments are not used, parenthesis can be omitted.
<other_type>

Other user defined or built in type which encoding should be used for encoding of <new_type>. User defined types are specified as <module_name>.<type_name>(<t1> <t2> ...). If <type_name> refers to type defined in same module, <module_name>. can be omitted. If user defined type is not parametric data type, parenthesis should be omitted.

Data types¶

Builtin data types include:

simple data types
- None
  
  Data type without value.
- Boolean
  
  Data type with two possible values representing true and false.
- Integer
  
  Unconstrained signed integer value.
- Float
  
  Floating point value that can be encoded with 8 bytes according to IEEE 754.
- String
  
  UTF-8 encoded string value.
- Bytes
  
  Array of byte values of arbitrary length.
composite data types
- Array(<t>)
  
  Parametric data type that defines arbitrary length Array where all elements are of type defined by <t>.
- Record { <entry1>: <t1>, <entry2>: <t2>, ... }
  
  Collection of user-defined entries where each entry has entry identifier (<entry1>, <entry2>, …) and entry type (<t1>, <t2>, …). Encoded data must contain all entries specified by type definition. Number of entries should be greather than zero.
- Choice { <entry1>: <t1>, <entry2>: <t2>, ... }
  
  Type that can represent one of types defined by <t1>, <t2>, … Encoded data must contain only single entry identified by entry identifier (<entry1>, <entry2>, …). Choice definition should contain at least one entry definition.
derived data types

These include predefined types that can be expressed as:
```
Optional(a) = Choice {
    none: None
    value: a
}
```

PEG grammar¶

Module          <- OWS 'module' MWS Identifier TypeDefinitions OWS EOF
TypeDefinitions <- (MWS TypeDefinition (MWS TypeDefinition)*)?
TypeDefinition  <- Identifier OWS ArgNames? OWS '=' OWS Type

Type            <- TSimple
                 / TArray
                 / TRecord
                 / TChoice
                 / TIdentifier
TSimple         <- 'None'
                 / 'Boolean'
                 / 'Integer'
                 / 'Float'
                 / 'String'
                 / 'Bytes'
TArray          <- 'Array' OWS '(' OWS Type OWS ')'
TRecord         <- 'Record' OWS '{' OWS Entries OWS '}'
TChoice         <- 'Choice' OWS '{' OWS Entries OWS '}'
TIdentifier     <- Identifier ('.' Identifier)? (OWS ArgTypes)?

Entries         <- Entry (MWS Entry)*
Entry           <- Identifier OWS ':' OWS Type

ArgNames        <- '(' OWS Identifiers? OWS ')'
ArgTypes        <- '(' OWS Types? OWS ')'
Identifiers     <- Identifier (MWS Identifier)*
Types           <- Type (MWS Type)*

Identifier      <- [A-Za-z][A-Za-z0-9_]*

# mandatory white-space
MWS             <- (WS / Comment)+
# optional white-space
OWS             <- (WS / Comment)*

Comment         <- '#' (!EOL .)* EOL
WS              <- ',' / ' ' / '\t' / EOL
EOL             <- '\r\n' / '\n' / '\r'
EOF             <- !.

Data encoding¶

None¶

None value is represented with empty byte array.

Boolean¶

Boolean value is encoded as single byte with value 0x01 as true and 0x00 as false.

Integer¶

Signed integer values are encoded as variable length byte array. Most significant bit in all bytes, except last one, is set to 0 (last bytes most significant bit is 1). Concatenation of other bits represent big-endian encoded two’s complement binary representation of integer value.

+-----------------+-------+-----------------+
|        0        |       |        m        |
| 7 6 5 4 3 2 1 0 |       | 7 6 5 4 3 2 1 0 |
+-----------------+  ...  +-----------------+
| 0 xn ... x(n-7) |       | 1   x6 ... x0   |
+-----------------+-------+-----------------+

Float¶

Floating point values are encoded according to IEEE 754 binary64 (double precision) format.

Bytes¶

Bytes array is encoded “as is” and prefixed with bytes count encoded as Integer.

String¶

String value is encoded as UTF-8 encoded Bytes.

Array¶

Array is encoded as sequential concatenation of each element encoding. This concatenated bytes are prefixed with array’s element count encoded as Integer.

Record¶

Record is encoded as sequential concatenation of record’s elements encoding according to elements order defined by schema.

Choice¶

Choice encodes single element prefixed with encoded element’s zero-based index as Integer.