Save and load¶

`save`(array, urlpath[, contiguous])	Save an array to a file.
`open`(urlpath[, mode, offset])	Open a persistent SChunk, NDArray, a remote C2Array, a Proxy, a DictStore, EmbedStore, or TreeStore.
`load`(urlpath[, offset])	Load a persistent Blosc2 object into memory.
`save_array`(arr, urlpath[, chunksize])	Save a serialized NumPy array to a specified file path.
`load_array`(urlpath[, dparams])	Load a serialized NumPy array from a file.
`save_tensor`(tensor, urlpath[, chunksize])	Save a serialized PyTorch or TensorFlow tensor or NumPy array to a specified file path.
`load_tensor`(urlpath[, dparams])	Load a serialized PyTorch or TensorFlow tensor or NumPy array from a file.
`from_cframe`(cframe[, copy])	Create a EmbedStore, NDArray, SChunk, BatchArray or ObjectArray instance from a contiguous frame buffer.

blosc2.save(array: NDArray, urlpath: str, contiguous=True, **kwargs: Any) → None[source]¶

Save an array to a file.

Parameters:

array¶ (NDArray) – The array to be saved.
urlpath¶ (str) – The path to the file where the array will be saved.
contiguous¶ (bool, optional) – Whether to store the array contiguously.
kwargs¶ (dict, optional) – Keyword arguments that are supported by the save() method.

Examples

>>> import blosc2
>>> import numpy as np
>>> # Create an array
>>> array = blosc2.arange(0, 100, dtype=np.int64, shape=(10, 10))
>>> # Save the array to a file
>>> blosc2.save(array, "array.b2", mode="w")

Open a persistent SChunk, NDArray, a remote C2Array, a Proxy, a DictStore, EmbedStore, or TreeStore.

See the Notes section for more info on opening Proxy objects.

Parameters:

urlpath¶ (str | pathlib.Path | URLPath class) – The path where the SChunk (or NDArray) is stored. If it is a remote array, a URLPath class must be passed.
mode¶ (str, optional) – Persistence mode: ‘r’ means read only (must exist); ‘a’ means read/write (create if it doesn’t exist); ‘w’ means create (overwrite if it exists). Defaults to ‘r’ ( read-only).
offset¶ (int, optional) – An offset in the file where super-chunk or array data is located (e.g. in a file containing several such objects).
kwargs¶ (dict, optional) –

mmap_mode: str, optional
If set, the file will be memory-mapped instead of using the default I/O functions and the mode argument will be ignored. For more info, see blosc2.Storage. Please note that the w+ mode, which can be used to create new files, is not supported here since only existing files can be opened. You can use SChunk.__init__ to create new files.

initial_mapping_size: int, optional
The initial size of the memory mapping. For more info, see blosc2.Storage.

cparams: dict
A dictionary with the compression parameters, which are the same that can be used in the compress2() function. Typesize and blocksize cannot be changed.

dparams: dict
A dictionary with the decompression parameters, which are the same that can be used in the decompress2() function.

Returns:

out – The object found in the path.

Return type:

SChunk, NDArray, C2Array, DictStore, EmbedStore, or TreeStore

Notes

This is just a ‘logical’ open, so there is no close() counterpart because currently, there is no need for it.
If urlpath is a URLPath class instance, mode must be ‘r’, offset must be 0, and kwargs cannot be passed.
If the original object saved in urlpath is a Proxy, this function will only return a Proxy if its source is a local SChunk, NDArray or a remote C2Array. Otherwise, it will return the Python-Blosc2 container used to cache the data which can be a SChunk or a NDArray and may not have all the data initialized (e.g. if the user has not accessed to it yet).
When opening a LazyExpr keep in mind the note above regarding operands.

Examples

>>> import blosc2
>>> import numpy as np
>>> import os
>>> import tempfile
>>> tmpdirname = tempfile.mkdtemp()
>>> urlpath = os.path.join(tmpdirname, 'b2frame')
>>> storage = blosc2.Storage(contiguous=True, urlpath=urlpath, mode="w")
>>> nelem = 20 * 1000
>>> nchunks = 5
>>> chunksize = nelem * 4 // nchunks
>>> data = np.arange(nelem, dtype="int32")
>>> # Create SChunk and append data
>>> schunk = blosc2.SChunk(chunksize=chunksize, data=data.tobytes(), storage=storage)
>>> # Open SChunk
>>> sc_open = blosc2.open(urlpath=urlpath, mode="r")
>>> for i in range(nchunks):
...     dest = np.empty(nelem // nchunks, dtype=data.dtype)
...     schunk.decompress_chunk(i, dest)
...     dest1 = np.empty(nelem // nchunks, dtype=data.dtype)
...     sc_open.decompress_chunk(i, dest1)
...     np.array_equal(dest, dest1)
True
True
True
True
True

To open the same schunk memory-mapped, we simply need to pass the mmap_mode parameter:

>>> sc_open_mmap = blosc2.open(urlpath=urlpath, mode="r", mmap_mode="r")
>>> sc_open.nchunks == sc_open_mmap.nchunks
True
>>> all(sc_open.decompress_chunk(i, dest1) == sc_open_mmap.decompress_chunk(i, dest1) for i in range(nchunks))
True

blosc2.load(urlpath: str | Path, offset: int = 0, **kwargs: dict)[source]¶

Load a persistent Blosc2 object into memory.

This is the in-memory counterpart to open(). It opens urlpath in read-only mode and returns a standalone object that is not backed by the original file. For CTable, this dispatches to CTable.load(); for array-like containers it returns an in-memory copy.

Parameters:

urlpath¶ (str | pathlib.Path) – Path to the persistent Blosc2 object.
offset¶ (int, optional) – Offset in the file where the object is located. This is mainly useful for SChunk/NDArray objects embedded in a larger file.
kwargs¶ (dict, optional) – Additional read-time keyword arguments passed to open(), such as dparams.

Returns:

A standalone in-memory Blosc2 object.

Return type:

out

Raises:

TypeError – If the opened object cannot be loaded as a standalone in-memory object.

Examples

>>> import blosc2
>>> import numpy as np
>>> arr = blosc2.asarray(np.arange(10), urlpath="example.b2nd", mode="w")
>>> loaded = blosc2.load("example.b2nd")
>>> loaded.urlpath is None
True
>>> np.array_equal(loaded[:], arr[:])
True
>>> blosc2.remove_urlpath("example.b2nd")

blosc2.save_array(arr: ndarray, urlpath: str, chunksize: int | None = None, **kwargs: dict) → int[source]¶

Save a serialized NumPy array to a specified file path.

Parameters:

arr¶ (np.ndarray) – The NumPy array to be saved.
urlpath¶ (str) – The path for the file where the array will be saved.
chunksize¶ (int) – The size (in bytes) for the chunks during compression. If not provided, it is computed automatically.
kwargs¶ (dict, optional) – These are the same as the kwargs in SChunk.__init__.

Returns:

out – The number of bytes of the saved array.

Return type:

int

Examples

>>> import numpy as np
>>> a = np.arange(1e6)
>>> serial_size = blosc2.save_array(a, "test.bl2", mode="w")
>>> serial_size < a.size * a.itemsize
True

blosc2.load_array(urlpath: str, dparams: dict | None = None) → ndarray[source]¶

Load a serialized NumPy array from a file.

Parameters:

urlpath¶ (str) – The path to the file containing the serialized array.
dparams¶ (dict, optional) – A dictionary with the decompression parameters, which can be used in the decompress2() function.

Returns:

out – The deserialized NumPy array.

Return type:

np.ndarray

Raises:

TypeError – If urlpath is not in cframe format
RunTimeError – If any other error is detected.

Examples

>>> import numpy as np
>>> a = np.arange(1e6)
>>> serial_size = blosc2.save_array(a, "test.bl2", mode="w")
>>> serial_size < a.size * a.itemsize
True
>>> a2 = blosc2.load_array("test.bl2")
>>> np.array_equal(a, a2)
True

blosc2.save_tensor(tensor: tensorflow.Tensor | torch.Tensor | np.ndarray, urlpath: str, chunksize: int | None = None, **kwargs: dict) → int[source]¶

Save a serialized PyTorch or TensorFlow tensor or NumPy array to a specified file path.

Parameters:

tensor¶ (tensorflow.Tensor, torch.Tensor, or np.ndarray) – The tensor or array to be saved.
urlpath¶ (str) – The file path where the tensor or array will be saved.
chunksize¶ (int) – The size (in bytes) for the chunks during compression. If not provided, it is computed automatically.
kwargs¶ (dict, optional) – These are the same as the kwargs in SChunk.__init__.

Returns:

out – The number of bytes of the saved tensor or array.

Return type:

int

Examples

>>> import numpy as np
>>> th = np.arange(1e6, dtype=np.float32)
>>> serial_size = blosc2.save_tensor(th, "test.bl2", mode="w")
>>> if not os.getenv("BTUNE_TRADEOFF"):
...     assert serial_size < th.size * th.itemsize
...

blosc2.load_tensor(urlpath: str, dparams: dict | None = None) → tensorflow.Tensor | torch.Tensor | np.ndarray[source]¶

Load a serialized PyTorch or TensorFlow tensor or NumPy array from a file.

Parameters:

urlpath¶ (str) – The path to the file where the tensor or array is stored.
dparams¶ (dict, optional) – A dictionary with the decompression parameters, which are the same as those used in the decompress2() function.

Returns:

out – The unpacked PyTorch or TensorFlow tensor or NumPy array.

Return type:

tensor or ndarray

Raises:

TypeError – If urlpath is not in cframe format
RunTimeError – If some other problem is detected.

Examples

>>> import numpy as np
>>> th = np.arange(1e6, dtype=np.float32)
>>> size = blosc2.save_tensor(th, "test.bl2", mode="w")
>>> if not os.getenv("BTUNE_TRADEOFF"):
...     assert size < th.size * th.itemsize
...
>>> th2 = blosc2.load_tensor("test.bl2")
>>> np.array_equal(th, th2)
True

See also

save_tensor(), pack_tensor()

Create a EmbedStore, NDArray, SChunk, BatchArray or ObjectArray instance from a contiguous frame buffer.

Parameters:

cframe¶ (bytes or str) – The bytes object containing the in-memory cframe.
copy¶ (bool) – Whether to internally make a copy. If False, the user is responsible for keeping a reference to cframe. Default is True, which is safer. If you need to save time/memory, you can set it to False, but then you must ensure that the cframe is not garbage collected while the returned object is still in use.

Returns:

out – BatchArray or ObjectArray A new instance of the appropriate type containing the data passed.

Return type:

EmbedStore, NDArray, SChunk,

See also

from_cframe(), from_cframe(), from_cframe()