Blosc2 Contiguous Frame Format#

Blosc (as of version 2.0.0) has a contiguous frame format (cframe for short) that allows for the storage of different Blosc data chunks contiguously, either in-memory or on-disk.

The frame is composed of a header, a chunks section, and a trailer:

+---------+--------+---------+
|  header | chunks | trailer |
+---------+--------+---------+

Each of the three parts of the frame are variable length; with the header and trailer both stored using the msgpack format.

Note: Integer types are stored in big endian for msgpack format. All the rest are stored in little endian.

Chunks#

The chunks section is composed of one or more Blosc data chunks followed by an index chunk:

+========+========+========+========+===========+
| chunk0 | chunk1 |   ...  | chunkN | chunk idx |
+========+========+========+========+===========+

Each chunk is stored contiguously one after the other, and each follows the format described in the chunk format document.

The chunk idx is a Blosc2 chunk containing the offsets (starting from the beginning of the header) to each chunk in this section. The data in the chunk is a list of offsets (they can be 32-bit, 64-bit or more, see above; currently only 64-bit are implemented) to each chunk. The index chunk follows the regular Blosc2 chunk format and can be compressed (the default).

Note: The offsets can take special values so as to represent chunks with run-length (equal) values. The codification for the offsets is as follows:

+========+========+========+========+
| byte 0 | byte 1 |   ...  | byte N |
+========+========+========+========+
                               ^
                               |
                               +--> Byte for special values

If the most significant bit (7) of the most significant byte above (byte N, as little endian is used) is set, that represents a chunk with a run-length of special values.

More specifically the byte for special values has this format:

bits 0, 1 and 2:

Indicate special values for the entire chunk.

0:

Reserved.

1:

A run of zeros.

2:

A run of NaN (Not-a-Number) floats (whether f32 or f64 depends on typesize).

3:

Reserved.

4:

Values that are not initialized.

5:

Reserved.

6:

Reserved.

7:

Reserved.

bit 3 (0x08):

Reserved.

bit 4 (0x10):

Reserved.

bit 5 (0x20):

Reserved.

bit 6 (0x40):

Reserved.

bit 7 (0x80):

Indicates a special value. If not set, a regular value.

Trailer#

The trailer for the frame is encoded via msgpack and contains a user meta data chunk and a fingerprint.:

|-0-|-1-|================|---|---------------|---|---|---------------|
| 9X| aX| vlmetalayers   | ce| trailer_len   | d8|fpt| fingerprint   |
|---|---|================|---|---------------|---|---|---------------|
  ^   ^   ^    ^           ^       ^           ^   ^
  |   |   |    |           |       |           |   +-- fingerprint type
  |   |   |    |           |       |           +--[msgpack] fixext 16
  |   |   |    |           |       +-- trailer length
  |   |   |    |           +--[msgpack] uint32 for trailer length
  |   |   |    +--Variable-length metalayers (See header metalayers)
  |   |   +---[msgpack] bin32 for vlmetalayers
  |   +------[msgpack] int8 for trailer version
  +---[msgpack] fixarray with X=4 elements

The vlmetalayers object which stores the variable-length user meta data can change in size during the lifetime of the frame. This is an important feature and the reason why the vlmetalayers are stored in the trailer and not in the header. However, the vlmetalayers follows the same format as the ones stored in the header.

trailer_len:

(uint32) Size of the trailer of the frame (including vlmetalayers chunk).

fpt:

(int8) Fingerprint type: 0 -> no fp; 1 -> 32-bit; 2 -> 64-bit; 3 -> 128-bit

fingerprint:

(uint128) Fix storage space for the fingerprint (16 bytes), padded to the left.