BatchStore¶
Overview¶
BatchStore is a batch-oriented container for variable-length Python items
backed by a single Blosc2 SChunk.
Each batch is stored in one compressed chunk:
batches contain one or more Python items
each chunk may contain one or more internal variable-length blocks
the store itself is indexed by batch
item-wise traversal is available via
BatchStore.iter_items()
BatchStore is a good fit when data arrives naturally in batches and you want:
efficient batch append/update operations
persistent
.b2bstoresitem-level reads inside a batch
compact summary information about batches and internal blocks via
.info
Serializer support¶
BatchStore currently supports two serializers:
"msgpack": the default and general-purpose choice for Python items"arrow": optional and requirespyarrow; mainly useful when data is already Arrow-shaped before ingestion
Quick example¶
import blosc2
store = blosc2.BatchStore(urlpath="example_batch_store.b2b", mode="w", contiguous=True)
store.append([{"red": 1, "green": 2, "blue": 3}, {"red": 4, "green": 5, "blue": 6}])
store.append([{"red": 7, "green": 8, "blue": 9}])
print(store[0]) # first batch
print(store[0][1]) # second item in first batch
print(list(store.iter_items()))
reopened = blosc2.open("example_batch_store.b2b", mode="r")
print(type(reopened).__name__)
print(reopened.info)
Note
BatchStore is batch-oriented by design. store[i] returns a batch, not a
single item. Use BatchStore.iter_items() for flat item-wise traversal.
- class blosc2.BatchStore(max_blocksize: int | None = None, serializer: str = 'msgpack', _from_schunk: SChunk | None = None, **kwargs: Any)[source]¶
A batched container for variable-length Python items.
BatchStore stores data as a sequence of batches, where each batch contains one or more Python items. Each batch is stored in one compressed chunk, and each chunk is internally split into one or more variable-length blocks for efficient item access.
The main abstraction is batch-oriented:
indexing the store returns batches
iterating the store yields batches
iter_items()provides flat item-wise traversal
BatchStore is a good fit when:
data arrives naturally in batches
batch-level append/update operations are important
occasional item-level reads are needed inside a batch
- Parameters:
max_blocksize¶ (int, optional) – Maximum number of items stored in each internal variable-length block. If not provided, a value is inferred from the first batch.
serializer¶ ({"msgpack", "arrow"}, optional) – Serializer used for batch payloads.
"msgpack"is the default and is the general-purpose choice for Python items."arrow"is optional and requirespyarrow._from_schunk¶ (blosc2.SChunk, optional) – Internal hook used when reopening an already-tagged BatchStore.
**kwargs¶ – Storage, compression, and decompression arguments accepted by the constructor.
- Attributes:
- cbytes
- contiguous
- cparams
- cratio
- dparams
infoReturn an info reporter with a compact summary of the store.
info_itemsReturn summary information as
(name, value)pairs.- items
- max_blocksize
- meta
- nbytes
serializerSerializer name used for batch payloads.
- typesize
- urlpath
- vlmeta
Methods
append(value)Append one batch and return the new number of batches.
clear()Remove all entries from the container.
copy(**kwargs)Create a copy of the store with optional constructor overrides.
delete(index)Delete the batch at
indexand return the new number of batches.extend(values)Append all batches from an iterable of batches.
insert(index, value)Insert one batch at
indexand return the new number of batches.Iterate over all items across all batches in order.
pop([index])Remove and return the batch at
indexas a Python list.Serialize the full store to a Blosc2 cframe buffer.
Constructors¶
- __init__(max_blocksize: int | None = None, serializer: str = 'msgpack', _from_schunk: SChunk | None = None, **kwargs: Any) None[source]¶
Create a new BatchStore or reopen an existing one.
When a persistent
urlpathpoints to an existing BatchStore and the mode is"r"or"a", the container is reopened automatically. Otherwise a new empty store is created.
Batch Interface¶
Mutation¶
- insert(index: int, value: object) int[source]¶
Insert one batch at
indexand return the new number of batches.
- delete(index: int | slice) int[source]¶
Delete the batch at
indexand return the new number of batches.
- copy(**kwargs: Any) BatchStore[source]¶
Create a copy of the store with optional constructor overrides.
Context Manager¶
- __enter__() BatchStore[source]¶
Public Members¶
- class blosc2.Batch(parent: BatchStore, nbatch: int, lazybatch: bytes)[source]¶
A lazy sequence representing one batch in a
BatchStore.Batchprovides sequence-style access to the items stored in a single batch. Integer indexing can use block-local reads when possible, while slicing materializes the full batch into Python items.Batch instances are normally obtained via
BatchStoreindexing or iteration rather than constructed directly.- Attributes:
- cbytes
- cratio
- lazybatch
- nbytes
Methods
count(value)index(value, [start, [stop]])Raises ValueError if the value is not present.