BatchStore¶
Overview¶
BatchStore is a batch-oriented container for variable-length Python items
backed by a single Blosc2 SChunk.
Each batch is stored in one compressed chunk:
batches contain one or more Python items
each chunk may contain one or more internal variable-length blocks
the store itself is indexed by batch
item-wise traversal is available via
BatchStore.iter_items()
BatchStore is a good fit when data arrives naturally in batches and you want:
efficient batch append/update operations
persistent
.b2bstoresitem-level reads inside a batch
compact summary information about batches and internal blocks via
.info
Serializer support¶
BatchStore currently supports two serializers:
"msgpack": the default and general-purpose choice for Python items"arrow": optional and requirespyarrow; mainly useful when data is already Arrow-shaped before ingestion
For "msgpack", python-blosc2 supports both ordinary msgpack-safe Python
values and selected Blosc2 objects.
Blosc2 objects serialized by value via to_cframe() /
blosc2.from_cframe():
NDArraySChunkVLArrayBatchStoreEmbedStore
Structured reference-style Blosc2 objects currently supported:
C2ArrayLazyExprLazyUDFbacked by@blosc2.dsl_kernel
LazyExpr values and supported LazyUDF values preserve reference
semantics and are serialized as a recipe plus durable operand references.
Supported operands are:
persistent local Blosc2 operands reopenable from
urlpathremote
C2ArrayoperandsDictStoremembers reopenable from(.b2d|.b2z, key)
Purely in-memory operands are intentionally not supported. Plain Python
LazyUDF callables are not serialized by msgpack.
Quick example¶
import blosc2
store = blosc2.BatchStore(urlpath="example_batch_store.b2b", mode="w", contiguous=True)
store.append([{"red": 1, "green": 2, "blue": 3}, {"red": 4, "green": 5, "blue": 6}])
store.append([{"red": 7, "green": 8, "blue": 9}])
print(store[0]) # first batch
print(store[0][1]) # second item in first batch
print(list(store.iter_items()))
reopened = blosc2.open("example_batch_store.b2b", mode="r")
print(type(reopened).__name__)
print(reopened.info)
Note
BatchStore is batch-oriented by design. store[i] returns a batch, not a
single item. Use BatchStore.iter_items() for flat item-wise traversal.
- class blosc2.BatchStore(items_per_block: int | None = None, serializer: str = 'msgpack', _from_schunk: SChunk | None = None, **kwargs: Any)[source]¶
A batched container for variable-length Python items.
BatchStore stores data as a sequence of batches, where each batch contains one or more Python items. Each batch is stored in one compressed chunk, and each chunk is internally split into one or more variable-length blocks for efficient item access.
The main abstraction is batch-oriented:
indexing the store returns batches
iterating the store yields batches
iter_items()provides flat item-wise traversal
BatchStore is a good fit when:
data arrives naturally in batches
batch-level append/update operations are important
occasional item-level reads are needed inside a batch
- Parameters:
items_per_block¶ (int, optional) – Maximum number of items stored in each internal variable-length block. The last block in a batch may contain fewer items than this cap. If not provided, a value is inferred from the first batch.
serializer¶ ({"msgpack", "arrow"}, optional) – Serializer used for batch payloads.
"msgpack"is the default and is the general-purpose choice for Python items, including nested Blosc2 containers such asblosc2.NDArray,blosc2.SChunk,blosc2.VLArray,blosc2.BatchStore, andblosc2.EmbedStore, which are serialized transparently viato_cframe()/blosc2.from_cframe(). Msgpack also supports structured Blosc2 reference objects, currentlyblosc2.C2Array,blosc2.LazyExpr, andblosc2.LazyUDFbacked byblosc2.dsl_kernel(). These lazy objects preserve reference semantics, so only persistent local operands,blosc2.C2Arrayoperands, andblosc2.DictStoremembers are supported; purely in-memory operands are rejected. Plain Pythonblosc2.LazyUDFcallables are not serialized by msgpack."arrow"is optional and requirespyarrow._from_schunk¶ (blosc2.SChunk, optional) – Internal hook used when reopening an already-tagged BatchStore.
**kwargs¶ – Storage, compression, and decompression arguments accepted by the constructor.
- Attributes:
- cbytes
- contiguous
- cparams
- cratio
- dparams
infoReturn an info reporter with a compact summary of the store.
info_itemsReturn summary information as
(name, value)pairs.- items
items_per_blockMaximum number of items per internal block.
- meta
- nbytes
serializerSerializer name used for batch payloads.
- typesize
- urlpath
- vlmeta
Methods
append(value)Append one batch and return the new number of batches.
clear()Remove all entries from the container.
copy(**kwargs)Create a copy of the store with optional constructor overrides.
delete(index)Delete the batch at
indexand return the new number of batches.extend(values)Append all batches from an iterable of batches.
insert(index, value)Insert one batch at
indexand return the new number of batches.Iterate over all items across all batches in order.
pop([index])Remove and return the batch at
indexas a Python list.Serialize the full store to a Blosc2 cframe buffer.
Constructors¶
- __init__(items_per_block: int | None = None, serializer: str = 'msgpack', _from_schunk: SChunk | None = None, **kwargs: Any) None[source]¶
Create a new BatchStore or reopen an existing one.
When a persistent
urlpathpoints to an existing BatchStore and the mode is"r"or"a", the container is reopened automatically. Otherwise a new empty store is created.
Batch Interface¶
Mutation¶
- insert(index: int, value: object) int[source]¶
Insert one batch at
indexand return the new number of batches.
- delete(index: int | slice) int[source]¶
Delete the batch at
indexand return the new number of batches.
- copy(**kwargs: Any) BatchStore[source]¶
Create a copy of the store with optional constructor overrides.
Context Manager¶
- __enter__() BatchStore[source]¶
Public Members¶
- class blosc2.Batch(parent: BatchStore, nbatch: int, lazybatch: bytes)[source]¶
A lazy sequence representing one batch in a
BatchStore.Batchprovides sequence-style access to the items stored in a single batch. Integer indexing can use block-local reads when possible, while slicing materializes the full batch into Python items.Batch instances are normally obtained via
BatchStoreindexing or iteration rather than constructed directly.- Attributes:
- cbytes
- cratio
- lazybatch
- nbytes
Methods
count(value)index(value, [start, [stop]])Raises ValueError if the value is not present.