Save and load

save(array, urlpath[, contiguous])

Save an array to a file.

open(urlpath[, mode, offset])

Open a persistent SChunk, NDArray, a remote C2Array, a Proxy, a DictStore, EmbedStore, or TreeStore.

load(urlpath[, offset])

Load a persistent Blosc2 object into memory.

save_array(arr, urlpath[, chunksize])

Save a serialized NumPy array to a specified file path.

load_array(urlpath[, dparams])

Load a serialized NumPy array from a file.

save_tensor(tensor, urlpath[, chunksize])

Save a serialized PyTorch or TensorFlow tensor or NumPy array to a specified file path.

load_tensor(urlpath[, dparams])

Load a serialized PyTorch or TensorFlow tensor or NumPy array from a file.

from_cframe(cframe[, copy])

Create a EmbedStore, NDArray, SChunk, BatchArray or ObjectArray instance from a contiguous frame buffer.

blosc2.save(array: NDArray, urlpath: str, contiguous=True, **kwargs: Any) None[source]

Save an array to a file.

Parameters:
  • array (NDArray) – The array to be saved.

  • urlpath (str) – The path to the file where the array will be saved.

  • contiguous (bool, optional) – Whether to store the array contiguously.

  • kwargs (dict, optional) – Keyword arguments that are supported by the save() method.

Examples

>>> import blosc2
>>> import numpy as np
>>> # Create an array
>>> array = blosc2.arange(0, 100, dtype=np.int64, shape=(10, 10))
>>> # Save the array to a file
>>> blosc2.save(array, "array.b2", mode="w")
blosc2.open(urlpath: str | ~pathlib.Path | ~blosc2.c2array.URLPath, mode: str = <object object>, offset: int = 0, **kwargs: dict) SChunk | NDArray | BatchArray | ObjectArray | C2Array | LazyArray | Proxy | DictStore | TreeStore | EmbedStore[source]

Open a persistent SChunk, NDArray, a remote C2Array, a Proxy, a DictStore, EmbedStore, or TreeStore.

See the Notes section for more info on opening Proxy objects.

Parameters:
  • urlpath (str | pathlib.Path | URLPath class) – The path where the SChunk (or NDArray) is stored. If it is a remote array, a URLPath class must be passed.

  • mode (str, optional) – Persistence mode: ‘r’ means read only (must exist); ‘a’ means read/write (create if it doesn’t exist); ‘w’ means create (overwrite if it exists). Defaults to ‘a’ for now, but will change to ‘r’ in a future release. Pass mode='a' explicitly to preserve writable behavior, or mode='r' for read-only access.

  • offset (int, optional) – An offset in the file where super-chunk or array data is located (e.g. in a file containing several such objects).

  • kwargs (dict, optional) –

    mmap_mode: str, optional

    If set, the file will be memory-mapped instead of using the default I/O functions and the mode argument will be ignored. For more info, see blosc2.Storage. Please note that the w+ mode, which can be used to create new files, is not supported here since only existing files can be opened. You can use SChunk.__init__ to create new files.

    initial_mapping_size: int, optional

    The initial size of the memory mapping. For more info, see blosc2.Storage.

    cparams: dict

    A dictionary with the compression parameters, which are the same that can be used in the compress2() function. Typesize and blocksize cannot be changed.

    dparams: dict

    A dictionary with the decompression parameters, which are the same that can be used in the decompress2() function.

Returns:

out – The object found in the path.

Return type:

SChunk, NDArray, C2Array, DictStore, EmbedStore, or TreeStore

Notes

  • This is just a ‘logical’ open, so there is no close() counterpart because currently, there is no need for it.

  • If urlpath is a URLPath class instance, mode must be ‘r’, offset must be 0, and kwargs cannot be passed.

  • If the original object saved in urlpath is a Proxy, this function will only return a Proxy if its source is a local SChunk, NDArray or a remote C2Array. Otherwise, it will return the Python-Blosc2 container used to cache the data which can be a SChunk or a NDArray and may not have all the data initialized (e.g. if the user has not accessed to it yet).

  • When opening a LazyExpr keep in mind the note above regarding operands.

Examples

>>> import blosc2
>>> import numpy as np
>>> import os
>>> import tempfile
>>> tmpdirname = tempfile.mkdtemp()
>>> urlpath = os.path.join(tmpdirname, 'b2frame')
>>> storage = blosc2.Storage(contiguous=True, urlpath=urlpath, mode="w")
>>> nelem = 20 * 1000
>>> nchunks = 5
>>> chunksize = nelem * 4 // nchunks
>>> data = np.arange(nelem, dtype="int32")
>>> # Create SChunk and append data
>>> schunk = blosc2.SChunk(chunksize=chunksize, data=data.tobytes(), storage=storage)
>>> # Open SChunk
>>> sc_open = blosc2.open(urlpath=urlpath, mode="r")
>>> for i in range(nchunks):
...     dest = np.empty(nelem // nchunks, dtype=data.dtype)
...     schunk.decompress_chunk(i, dest)
...     dest1 = np.empty(nelem // nchunks, dtype=data.dtype)
...     sc_open.decompress_chunk(i, dest1)
...     np.array_equal(dest, dest1)
True
True
True
True
True

To open the same schunk memory-mapped, we simply need to pass the mmap_mode parameter:

>>> sc_open_mmap = blosc2.open(urlpath=urlpath, mode="r", mmap_mode="r")
>>> sc_open.nchunks == sc_open_mmap.nchunks
True
>>> all(sc_open.decompress_chunk(i, dest1) == sc_open_mmap.decompress_chunk(i, dest1) for i in range(nchunks))
True
blosc2.load(urlpath: str | Path, offset: int = 0, **kwargs: dict)[source]

Load a persistent Blosc2 object into memory.

This is the in-memory counterpart to open(). It opens urlpath in read-only mode and returns a standalone object that is not backed by the original file. For CTable, this dispatches to CTable.load(); for array-like containers it returns an in-memory copy.

Parameters:
  • urlpath (str | pathlib.Path) – Path to the persistent Blosc2 object.

  • offset (int, optional) – Offset in the file where the object is located. This is mainly useful for SChunk/NDArray objects embedded in a larger file.

  • kwargs (dict, optional) – Additional read-time keyword arguments passed to open(), such as dparams.

Returns:

A standalone in-memory Blosc2 object.

Return type:

out

Raises:

TypeError – If the opened object cannot be loaded as a standalone in-memory object.

Examples

>>> import blosc2
>>> import numpy as np
>>> arr = blosc2.asarray(np.arange(10), urlpath="example.b2nd", mode="w")
>>> loaded = blosc2.load("example.b2nd")
>>> loaded.urlpath is None
True
>>> np.array_equal(loaded[:], arr[:])
True
>>> blosc2.remove_urlpath("example.b2nd")
blosc2.save_array(arr: ndarray, urlpath: str, chunksize: int | None = None, **kwargs: dict) int[source]

Save a serialized NumPy array to a specified file path.

Parameters:
  • arr (np.ndarray) – The NumPy array to be saved.

  • urlpath (str) – The path for the file where the array will be saved.

  • chunksize (int) – The size (in bytes) for the chunks during compression. If not provided, it is computed automatically.

  • kwargs (dict, optional) – These are the same as the kwargs in SChunk.__init__.

Returns:

out – The number of bytes of the saved array.

Return type:

int

Examples

>>> import numpy as np
>>> a = np.arange(1e6)
>>> serial_size = blosc2.save_array(a, "test.bl2", mode="w")
>>> serial_size < a.size * a.itemsize
True
blosc2.load_array(urlpath: str, dparams: dict | None = None) ndarray[source]

Load a serialized NumPy array from a file.

Parameters:
  • urlpath (str) – The path to the file containing the serialized array.

  • dparams (dict, optional) – A dictionary with the decompression parameters, which can be used in the decompress2() function.

Returns:

out – The deserialized NumPy array.

Return type:

np.ndarray

Raises:
  • TypeError – If urlpath is not in cframe format

  • RunTimeError – If any other error is detected.

Examples

>>> import numpy as np
>>> a = np.arange(1e6)
>>> serial_size = blosc2.save_array(a, "test.bl2", mode="w")
>>> serial_size < a.size * a.itemsize
True
>>> a2 = blosc2.load_array("test.bl2")
>>> np.array_equal(a, a2)
True
blosc2.save_tensor(tensor: tensorflow.Tensor | torch.Tensor | np.ndarray, urlpath: str, chunksize: int | None = None, **kwargs: dict) int[source]

Save a serialized PyTorch or TensorFlow tensor or NumPy array to a specified file path.

Parameters:
  • tensor (tensorflow.Tensor, torch.Tensor, or np.ndarray) – The tensor or array to be saved.

  • urlpath (str) – The file path where the tensor or array will be saved.

  • chunksize (int) – The size (in bytes) for the chunks during compression. If not provided, it is computed automatically.

  • kwargs (dict, optional) – These are the same as the kwargs in SChunk.__init__.

Returns:

out – The number of bytes of the saved tensor or array.

Return type:

int

Examples

>>> import numpy as np
>>> th = np.arange(1e6, dtype=np.float32)
>>> serial_size = blosc2.save_tensor(th, "test.bl2", mode="w")
>>> if not os.getenv("BTUNE_TRADEOFF"):
...     assert serial_size < th.size * th.itemsize
...
blosc2.load_tensor(urlpath: str, dparams: dict | None = None) tensorflow.Tensor | torch.Tensor | np.ndarray[source]

Load a serialized PyTorch or TensorFlow tensor or NumPy array from a file.

Parameters:
  • urlpath (str) – The path to the file where the tensor or array is stored.

  • dparams (dict, optional) – A dictionary with the decompression parameters, which are the same as those used in the decompress2() function.

Returns:

out – The unpacked PyTorch or TensorFlow tensor or NumPy array.

Return type:

tensor or ndarray

Raises:
  • TypeError – If urlpath is not in cframe format

  • RunTimeError – If some other problem is detected.

Examples

>>> import numpy as np
>>> th = np.arange(1e6, dtype=np.float32)
>>> size = blosc2.save_tensor(th, "test.bl2", mode="w")
>>> if not os.getenv("BTUNE_TRADEOFF"):
...     assert size < th.size * th.itemsize
...
>>> th2 = blosc2.load_tensor("test.bl2")
>>> np.array_equal(th, th2)
True
blosc2.from_cframe(cframe: bytes | str, copy: bool = True) EmbedStore | NDArray | SChunk | ListArray | BatchArray | ObjectArray | C2Array[source]

Create a EmbedStore, NDArray, SChunk, BatchArray or ObjectArray instance from a contiguous frame buffer.

Parameters:
  • cframe (bytes or str) – The bytes object containing the in-memory cframe.

  • copy (bool) – Whether to internally make a copy. If False, the user is responsible for keeping a reference to cframe. Default is True, which is safer. If you need to save time/memory, you can set it to False, but then you must ensure that the cframe is not garbage collected while the returned object is still in use.

Returns:

outBatchArray or ObjectArray A new instance of the appropriate type containing the data passed.

Return type:

EmbedStore, NDArray, SChunk,

See also

from_cframe(), from_cframe(), from_cframe()