NDArray: mutidimensional SChunk#

NDArray functions let users perform different operations with NDArray arrays like setting, copying or slicing them. In this section, we are going to see how to create and manipulate a NDArray array in a simple way.

[26]:
import numpy as np

import blosc2

Creating an array#

First, we create an array, with zeros being used as the default value for uninitialized portions of the array.

[27]:
array = blosc2.zeros((10000, 10000), dtype=np.int32)
print(array.info)
type    : NDArray
shape   : (10000, 10000)
chunks  : (25, 10000)
blocks  : (2, 10000)
dtype   : int32
cratio  : 32500.00
cparams : {'blocksize': 80000,
 'clevel': 1,
 'codec': <Codec.ZSTD: 5>,
 'codec_meta': 0,
 'filters': [<Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.SHUFFLE: 1>],
 'filters_meta': [0, 0, 0, 0, 0, 0],
 'nthreads': 4,
 'splitmode': <SplitMode.ALWAYS_SPLIT: 1>,
 'typesize': 4,
 'use_dict': 0}
dparams : {'nthreads': 4}

Note that all the compression and decompression parameters, as well as the chunks and blocks shapes are set to the default.

Reading and writing data#

We can access and edit NDArray arrays using NumPy.

[28]:
array[0, :] = np.arange(10000, dtype=array.dtype)
array[:, 0] = np.arange(10000, dtype=array.dtype)
[29]:
array[0, 0]
[29]:
array(0, dtype=int32)
[30]:
array[0, :]
[30]:
array([   0,    1,    2, ..., 9997, 9998, 9999], dtype=int32)
[31]:
array[:, 0]
[31]:
array([   0,    1,    2, ..., 9997, 9998, 9999], dtype=int32)

Persistent data#

As in the SChunk, when we create a NDArray array, we can specify where it will be stored. Indeed, we can specify all the compression/decompression parameters that we can specify in a SChunk. So as in the SChunk, to store an array on-disk we only have to specify a urlpath where to store the new array.

[32]:
array = blosc2.full(
    (1000, 1000),
    fill_value=b"pepe",
    chunks=(100, 100),
    blocks=(50, 50),
    urlpath="ndarray_tutorial.b2nd",
    mode="w",
)
print(array.info)
type    : NDArray
shape   : (1000, 1000)
chunks  : (100, 100)
blocks  : (50, 50)
dtype   : |S4
cratio  : 1111.11
cparams : {'blocksize': 10000,
 'clevel': 1,
 'codec': <Codec.ZSTD: 5>,
 'codec_meta': 0,
 'filters': [<Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.SHUFFLE: 1>],
 'filters_meta': [0, 0, 0, 0, 0, 0],
 'nthreads': 4,
 'splitmode': <SplitMode.ALWAYS_SPLIT: 1>,
 'typesize': 4,
 'use_dict': 0}
dparams : {'nthreads': 4}

This time we even set the chunks and blocks shapes. You can now open it with modes w, a or r.

[33]:
array2 = blosc2.open("ndarray_tutorial.b2nd")
print(array2.info)
type    : NDArray
shape   : (1000, 1000)
chunks  : (100, 100)
blocks  : (50, 50)
dtype   : |S4
cratio  : 1111.11
cparams : {'blocksize': 10000,
 'clevel': 1,
 'codec': <Codec.ZSTD: 5>,
 'codec_meta': 0,
 'filters': [<Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.SHUFFLE: 1>],
 'filters_meta': [0, 0, 0, 0, 0, 0],
 'nthreads': 1,
 'splitmode': <SplitMode.ALWAYS_SPLIT: 1>,
 'typesize': 4,
 'use_dict': 0}
dparams : {'nthreads': 1}

Compression params#

Here we can see how when we make a copy of a NDArray array we can change its compression parameters in an easy way.

[38]:
b = np.arange(1000000).tobytes()
array1 = blosc2.frombuffer(b, shape=(1000, 1000), dtype=np.int64, chunks=(500, 10), blocks=(50, 10))
print(array1.info)
type    : NDArray
shape   : (1000, 1000)
chunks  : (500, 10)
blocks  : (50, 10)
dtype   : int64
cratio  : 7.45
cparams : {'blocksize': 4000,
 'clevel': 1,
 'codec': <Codec.ZSTD: 5>,
 'codec_meta': 0,
 'filters': [<Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.SHUFFLE: 1>],
 'filters_meta': [0, 0, 0, 0, 0, 0],
 'nthreads': 4,
 'splitmode': <SplitMode.ALWAYS_SPLIT: 1>,
 'typesize': 8,
 'use_dict': 0}
dparams : {'nthreads': 4}

[39]:
cparams = blosc2.CParams(
    codec=blosc2.Codec.ZSTD,
    clevel=9,
    filters=[blosc2.Filter.BITSHUFFLE],
    filters_meta=[0],
)

array2 = array1.copy(chunks=(500, 10), blocks=(50, 10), cparams=cparams)
print(array2.info)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[39], line 8
      1 cparams = blosc2.CParams(
      2     codec=blosc2.Codec.ZSTD,
      3     clevel=9,
      4     filters=[blosc2.Filter.BITSHUFFLE],
      5     filters_meta=[0],
      6 )
----> 8 array2 = array1.copy(chunks=(500, 10), blocks=(50, 10), cparams=cparams)
      9 print(array2.info)

File ~/blosc/python-blosc2/src/blosc2/ndarray.py:1356, in NDArray.copy(self, dtype, **kwargs)
   1351 if dtype is None:
   1352     dtype = self.dtype
   1353 kwargs["cparams"] = (
   1354     kwargs.get("cparams").copy()
   1355     if isinstance(kwargs.get("cparams"), dict)
-> 1356     else asdict(self.schunk.cparams)
   1357 )
   1358 kwargs["dparams"] = (
   1359     kwargs.get("dparams").copy()
   1360     if isinstance(kwargs.get("dparams"), dict)
   1361     else asdict(self.schunk.dparams)
   1362 )
   1363 if "meta" not in kwargs:
   1364     # Copy metalayers as well

File ~/opt/miniconda3/lib/python3.12/dataclasses.py:1319, in asdict(obj, dict_factory)
   1300 """Return the fields of a dataclass instance as a new dictionary mapping
   1301 field names to field values.
   1302
   (...)
   1316 tuples, lists, and dicts. Other objects are copied with 'copy.deepcopy()'.
   1317 """
   1318 if not _is_dataclass_instance(obj):
-> 1319     raise TypeError("asdict() should be called on dataclass instances")
   1320 return _asdict_inner(obj, dict_factory)

TypeError: asdict() should be called on dataclass instances

Metalayers and variable length metalayers#

We have seen that you can pass to the NDArray constructor any compression or decompression parameters that you may pass to a SChunk. Indeed, you can also pass the metalayer dict. Metalayers are small metadata for informing about the properties of data that is stored on a container. As explained in the SChunk basics, there are two kinds. The first one (meta), cannot be deleted, must be added at construction time and can only be updated with values that have the same bytes size as the old value. They are easy to access and edit by users:

[ ]:
meta = {"dtype": "i8", "coords": [5.14, 23.0]}
array = blosc2.zeros((1000, 1000), dtype=np.int16, chunks=(100, 100), blocks=(50, 50), meta=meta)

You can work with them like if you were working with a dictionary. To access this dictionary you will use the SChunk attribute that an NDArray has.

[ ]:
array.schunk.meta
[23]:
array.schunk.meta.keys()
[23]:
['b2nd']

As you can see, Blosc2 internally uses these metalayers to store shapes, ndim, dtype, etc, and retrieve this data when needed in the b2nd metalayer.

[24]:
array.schunk.meta["b2nd"]
[24]:
[0, 2, [1000, 1000], [100, 100], [50, 50], 0, '|S4']
[25]:
array.schunk.meta["coords"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[25], line 1
----> 1 array.schunk.meta["coords"]

File ~/blosc/python-blosc2/src/blosc2/schunk.py:122, in Meta.__getitem__(self, item)
    117     return unpackb(
    118         blosc2_ext.meta__getitem__(self.schunk, item),
    119         list_hook=blosc2_ext.decode_tuple,
    120     )
    121 else:
--> 122     raise KeyError(f"{item} not found")

KeyError: 'coords not found'

To add a metalayer after the creation or a variable length metalayer, you can use the vlmeta accessor from the SChunk. As well as the meta, it works similarly to a dictionary.

[ ]:
print(array.schunk.vlmeta.getall())
array.schunk.vlmeta["info1"] = "This is an example"
array.schunk.vlmeta["info2"] = "of user meta handling"
array.schunk.vlmeta.getall()

You can update them with a value larger than the original one:

[ ]:
array.schunk.vlmeta["info1"] = "This is a larger example"
array.schunk.vlmeta.getall()

Creating a NDArray from a NumPy array#

Let’s create a NDArray from a NumPy array using the asarray constructor:

[ ]:
shape = (100, 100, 100)
dtype = np.float64
nparray = np.linspace(0, 100, np.prod(shape), dtype=dtype).reshape(shape)
b2ndarray = blosc2.asarray(nparray)
print(b2ndarray.info)

Building a NDArray from a buffer#

Furthermore, you can create a NDArray filled with data from a buffer:

[ ]:
rng = np.random.default_rng()
buffer = bytes(rng.normal(size=np.prod(shape)) * 8)
b2ndarray = blosc2.frombuffer(buffer, shape, dtype=dtype)
print("Compression ratio:", b2ndarray.schunk.cratio)
b2ndarray[:5, :5, :5]

That’s all for now. There are more examples in the examples directory of the git repository for you to explore. Enjoy!