BatchStore

Overview

BatchStore is a batch-oriented container for variable-length Python items backed by a single Blosc2 SChunk.

Each batch is stored in one compressed chunk:

  • batches contain one or more Python items

  • each chunk may contain one or more internal variable-length blocks

  • the store itself is indexed by batch

  • item-wise traversal is available via BatchStore.iter_items()

BatchStore is a good fit when data arrives naturally in batches and you want:

  • efficient batch append/update operations

  • persistent .b2b stores

  • item-level reads inside a batch

  • compact summary information about batches and internal blocks via .info

Serializer support

BatchStore currently supports two serializers:

  • "msgpack": the default and general-purpose choice for Python items

  • "arrow": optional and requires pyarrow; mainly useful when data is already Arrow-shaped before ingestion

Quick example

import blosc2

store = blosc2.BatchStore(urlpath="example_batch_store.b2b", mode="w", contiguous=True)
store.append([{"red": 1, "green": 2, "blue": 3}, {"red": 4, "green": 5, "blue": 6}])
store.append([{"red": 7, "green": 8, "blue": 9}])

print(store[0])  # first batch
print(store[0][1])  # second item in first batch
print(list(store.iter_items()))

reopened = blosc2.open("example_batch_store.b2b", mode="r")
print(type(reopened).__name__)
print(reopened.info)

Note

BatchStore is batch-oriented by design. store[i] returns a batch, not a single item. Use BatchStore.iter_items() for flat item-wise traversal.

class blosc2.BatchStore(max_blocksize: int | None = None, serializer: str = 'msgpack', _from_schunk: SChunk | None = None, **kwargs: Any)[source]

A batched container for variable-length Python items.

BatchStore stores data as a sequence of batches, where each batch contains one or more Python items. Each batch is stored in one compressed chunk, and each chunk is internally split into one or more variable-length blocks for efficient item access.

The main abstraction is batch-oriented:

  • indexing the store returns batches

  • iterating the store yields batches

  • iter_items() provides flat item-wise traversal

BatchStore is a good fit when:

  • data arrives naturally in batches

  • batch-level append/update operations are important

  • occasional item-level reads are needed inside a batch

Parameters:
  • max_blocksize (int, optional) – Maximum number of items stored in each internal variable-length block. If not provided, a value is inferred from the first batch.

  • serializer ({"msgpack", "arrow"}, optional) – Serializer used for batch payloads. "msgpack" is the default and is the general-purpose choice for Python items. "arrow" is optional and requires pyarrow.

  • _from_schunk (blosc2.SChunk, optional) – Internal hook used when reopening an already-tagged BatchStore.

  • **kwargs – Storage, compression, and decompression arguments accepted by the constructor.

Attributes:
cbytes
contiguous
cparams
cratio
dparams
info

Return an info reporter with a compact summary of the store.

info_items

Return summary information as (name, value) pairs.

items
max_blocksize
meta
nbytes
serializer

Serializer name used for batch payloads.

typesize
urlpath
vlmeta

Methods

append(value)

Append one batch and return the new number of batches.

clear()

Remove all entries from the container.

copy(**kwargs)

Create a copy of the store with optional constructor overrides.

delete(index)

Delete the batch at index and return the new number of batches.

extend(values)

Append all batches from an iterable of batches.

insert(index, value)

Insert one batch at index and return the new number of batches.

iter_items()

Iterate over all items across all batches in order.

pop([index])

Remove and return the batch at index as a Python list.

to_cframe()

Serialize the full store to a Blosc2 cframe buffer.

Constructors

__init__(max_blocksize: int | None = None, serializer: str = 'msgpack', _from_schunk: SChunk | None = None, **kwargs: Any) None[source]

Create a new BatchStore or reopen an existing one.

When a persistent urlpath points to an existing BatchStore and the mode is "r" or "a", the container is reopened automatically. Otherwise a new empty store is created.

Batch Interface

__getitem__(index: int | slice) Batch | list[Batch][source]

Return one batch or a list of batches.

__setitem__(index: int | slice, value: object) None[source]
__delitem__(index: int | slice) None[source]
__len__() int[source]

Return the number of batches stored in the container.

__iter__() Iterator[Batch][source]
iter_items() Iterator[Any][source]

Iterate over all items across all batches in order.

Mutation

append(value: object) int[source]

Append one batch and return the new number of batches.

extend(values: object) None[source]

Append all batches from an iterable of batches.

insert(index: int, value: object) int[source]

Insert one batch at index and return the new number of batches.

pop(index: int = -1) list[Any][source]

Remove and return the batch at index as a Python list.

delete(index: int | slice) int[source]

Delete the batch at index and return the new number of batches.

clear() None[source]

Remove all entries from the container.

copy(**kwargs: Any) BatchStore[source]

Create a copy of the store with optional constructor overrides.

Context Manager

__enter__() BatchStore[source]
__exit__(exc_type, exc_val, exc_tb) bool[source]

Public Members

to_cframe() bytes[source]

Serialize the full store to a Blosc2 cframe buffer.

class blosc2.Batch(parent: BatchStore, nbatch: int, lazybatch: bytes)[source]

A lazy sequence representing one batch in a BatchStore.

Batch provides sequence-style access to the items stored in a single batch. Integer indexing can use block-local reads when possible, while slicing materializes the full batch into Python items.

Batch instances are normally obtained via BatchStore indexing or iteration rather than constructed directly.

Attributes:
cbytes
cratio
lazybatch
nbytes

Methods

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.