NDArray: mutidimensional SChunk#
NDArray functions let users perform different operations with NDArray arrays like setting, copying or slicing them. In this section, we are going to see how to create and manipulate a NDArray array in a simple way.
[1]:
import blosc2
import numpy as np
Creating an array#
First, we create an array, with zeros being used as the default value for uninitialized portions of the array.
[2]:
array = blosc2.zeros((10000, 10000), dtype=np.int32)
print(array.info)
type : NDArray
shape : (10000, 10000)
chunks : (512, 1024)
blocks : (128, 256)
dtype : int32
cratio : 65536.00
cparams : {'blocksize': 131072,
'clevel': 1,
'codec': <Codec.ZSTD: 5>,
'codec_meta': 0,
'filters': [<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.SHUFFLE: 1>],
'filters_meta': [0, 0, 0, 0, 0, 0],
'nthreads': 6,
'splitmode': <SplitMode.ALWAYS_SPLIT: 1>,
'typesize': 4,
'use_dict': 0}
dparams : {'nthreads': 6}
Note that all the compression and decompression parameters, as well as the chunks and blocks shapes are set to the default.
Reading and writing data#
We can access and edit NDArray arrays using NumPy.
[3]:
array[0, :] = np.arange(10000, dtype=array.dtype)
array[:, 0] = np.arange(10000, dtype=array.dtype)
[4]:
array[0, 0]
[4]:
array(0, dtype=int32)
[5]:
array[0, :]
[5]:
array([ 0, 1, 2, ..., 9997, 9998, 9999], dtype=int32)
[6]:
array[:, 0]
[6]:
array([ 0, 1, 2, ..., 9997, 9998, 9999], dtype=int32)
Persistent data#
As in the SChunk, when we create a NDArray array, we can specify where it will be stored. Indeed, we can specify all the compression/decompression parameters that we can specify in a SChunk. So as in the SChunk, to store an array on-disk we only have to specify a urlpath
where to store the new array.
[7]:
array = blosc2.full((1000, 1000), fill_value=b"pepe", chunks=(100, 100), blocks=(50, 50),
urlpath="ndarray_tutorial.b2nd", mode="w")
print(array.info)
type : NDArray
shape : (1000, 1000)
chunks : (100, 100)
blocks : (50, 50)
dtype : |S4
cratio : 1111.11
cparams : {'blocksize': 10000,
'clevel': 1,
'codec': <Codec.ZSTD: 5>,
'codec_meta': 0,
'filters': [<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.SHUFFLE: 1>],
'filters_meta': [0, 0, 0, 0, 0, 0],
'nthreads': 6,
'splitmode': <SplitMode.ALWAYS_SPLIT: 1>,
'typesize': 4,
'use_dict': 0}
dparams : {'nthreads': 6}
This time we even set the chunks and blocks shapes. You can now open it with modes w
, a
or r
.
[8]:
array2 = blosc2.open("ndarray_tutorial.b2nd")
print(array2.info)
type : NDArray
shape : (1000, 1000)
chunks : (100, 100)
blocks : (50, 50)
dtype : |S4
cratio : 1111.11
cparams : {'blocksize': 10000,
'clevel': 1,
'codec': <Codec.ZSTD: 5>,
'codec_meta': 0,
'filters': [<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.SHUFFLE: 1>],
'filters_meta': [0, 0, 0, 0, 0, 0],
'nthreads': 6,
'splitmode': <SplitMode.ALWAYS_SPLIT: 1>,
'typesize': 4,
'use_dict': 0}
dparams : {'nthreads': 6}
Compression params#
Here we can see how when we make a copy of a NDArray array we can change its compression parameters in an easy way.
[9]:
b = np.arange(1000000).tobytes()
array1 = blosc2.frombuffer(b, shape=(1000, 1000), dtype=np.int64, chunks=(500, 10), blocks=(50, 10))
print(array1.info)
type : NDArray
shape : (1000, 1000)
chunks : (500, 10)
blocks : (50, 10)
dtype : int64
cratio : 7.45
cparams : {'blocksize': 4000,
'clevel': 1,
'codec': <Codec.ZSTD: 5>,
'codec_meta': 0,
'filters': [<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.SHUFFLE: 1>],
'filters_meta': [0, 0, 0, 0, 0, 0],
'nthreads': 6,
'splitmode': <SplitMode.ALWAYS_SPLIT: 1>,
'typesize': 8,
'use_dict': 0}
dparams : {'nthreads': 6}
[10]:
cparams = {
"codec": blosc2.Codec.ZSTD,
"clevel": 9,
"filters": [blosc2.Filter.BITSHUFFLE],
"filters_meta": [0]
}
array2 = array1.copy(chunks=(500, 10), blocks=(50, 10), cparams=cparams)
print(array2.info)
type : NDArray
shape : (1000, 1000)
chunks : (500, 10)
blocks : (50, 10)
dtype : int64
cratio : 13.94
cparams : {'blocksize': 4000,
'clevel': 9,
'codec': <Codec.ZSTD: 5>,
'codec_meta': 0,
'filters': [<Filter.BITSHUFFLE: 2>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>],
'filters_meta': [0, 0, 0, 0, 0, 0],
'nthreads': 6,
'splitmode': <SplitMode.ALWAYS_SPLIT: 1>,
'typesize': 8,
'use_dict': 0}
dparams : {'nthreads': 6}
Metalayers and variable length metalayers#
We have seen that you can pass to the NDArray constructor any compression or decompression parameters that you may pass to a SChunk. Indeed, you can also pass the metalayer dict. Metalayers are small metadata for informing about the properties of data that is stored on a container. As explained in the SChunk basics, there are two kinds. The first one (meta
), cannot be deleted, must be added at construction time and can only be updated with values that have the
same bytes size as the old value. They are easy to access and edit by users:
[11]:
meta = {
"dtype": "i8",
"coords": [5.14, 23.]
}
array = blosc2.zeros((1000, 1000), dtype=np.int16, chunks=(100, 100), blocks=(50, 50), meta=meta)
You can work with them like if you were working with a dictionary. To access this dictionary you will use the SChunk attribute that an NDArray has.
[12]:
array.schunk.meta
[12]:
<blosc2.schunk.Meta at 0x10cdff800>
[13]:
array.schunk.meta.keys()
[13]:
['b2nd', 'dtype', 'coords']
As you can see, Blosc2 internally uses these metalayers to store shapes, ndim, dtype, etc, and retrieve this data when needed in the b2nd
metalayer.
[14]:
array.schunk.meta["b2nd"]
[14]:
[0, 2, [1000, 1000], [100, 100], [50, 50], 0, '<i2']
[15]:
array.schunk.meta["coords"]
[15]:
[5.14, 23.0]
To add a metalayer after the creation or a variable length metalayer, you can use the vlmeta
accessor from the SChunk. As well as the meta
, it works similarly to a dictionary.
[16]:
print(array.schunk.vlmeta.getall())
array.schunk.vlmeta["info1"] = "This is an example"
array.schunk.vlmeta["info2"] = "of user meta handling"
array.schunk.vlmeta.getall()
{}
[16]:
{b'info1': 'This is an example', b'info2': 'of user meta handling'}
You can update them with a value larger than the original one:
[17]:
array.schunk.vlmeta["info1"] = "This is a larger example"
array.schunk.vlmeta.getall()
[17]:
{b'info1': 'This is a larger example', b'info2': 'of user meta handling'}
Creating a NDArray from a NumPy array#
Let’s create a NDArray from a NumPy array using the asarray
constructor:
[18]:
shape = (100, 100, 100)
dtype = np.float64
nparray = np.linspace(0, 100, np.prod(shape), dtype=dtype).reshape(shape)
b2ndarray = blosc2.asarray(nparray)
print(b2ndarray.info)
type : NDArray
shape : (100, 100, 100)
chunks : (64, 64, 100)
blocks : (32, 32, 32)
dtype : float64
cratio : 15.99
cparams : {'blocksize': 262144,
'clevel': 1,
'codec': <Codec.ZSTD: 5>,
'codec_meta': 0,
'filters': [<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.NOFILTER: 0>,
<Filter.SHUFFLE: 1>],
'filters_meta': [0, 0, 0, 0, 0, 0],
'nthreads': 6,
'splitmode': <SplitMode.ALWAYS_SPLIT: 1>,
'typesize': 8,
'use_dict': 0}
dparams : {'nthreads': 6}
Building a NDArray from a buffer#
Furthermore, you can create a NDArray filled with data from a buffer:
[19]:
buffer = bytes(np.random.normal(0, 1, np.prod(shape)) * 8)
b2ndarray = blosc2.frombuffer(buffer, shape, dtype=dtype)
print("Compression ratio:", b2ndarray.schunk.cratio)
b2ndarray[:5, :5, :5]
Compression ratio: 2.346534664543712
[19]:
array([[[ 5.42196142e+00, 2.73411248e-01, -8.16705224e-01,
1.37387920e+00, 4.67745267e+00],
[-9.74870871e+00, 5.84935129e+00, 9.58553390e+00,
-2.83529450e-01, 7.53172473e+00],
[ 1.49656577e+00, 5.17716640e+00, 7.88381029e+00,
6.98547347e-01, -4.22113557e+00],
[ 3.26899881e+00, -1.82539905e+00, -6.64803980e+00,
2.26750920e+00, -8.04904893e+00],
[ 1.25639643e+01, 6.13877785e+00, 8.36071977e-01,
4.61057570e+00, 1.48929362e+01]],
[[ 3.35584136e+00, 1.99526803e-01, -1.83173110e+01,
-9.23138847e+00, -1.16172733e+00],
[-5.03933967e+00, -1.12041458e+01, 4.03284196e+00,
1.00896486e+01, 1.66993503e+00],
[-1.18575679e+01, -4.75050150e+00, 2.18309491e+00,
7.96693815e+00, -1.08675195e+01],
[-8.88867651e+00, 2.61614522e+00, -1.21496391e+00,
-1.07405006e+01, -1.62225644e+01],
[-8.06054293e+00, 1.41019810e+01, 3.73009613e+00,
1.94280930e+00, -4.03920319e-01]],
[[ 7.11325574e+00, 1.81344216e+00, -1.31212523e+01,
7.53794442e+00, 6.05015875e+00],
[-3.72363480e+00, 1.51570884e+01, -2.04563128e-01,
2.48303234e+00, -2.40123746e+00],
[-6.54960604e+00, -9.95287318e+00, -5.29298162e+00,
8.24236836e+00, 7.44135682e+00],
[ 2.93987926e+00, -6.38440848e+00, -1.14590714e+00,
2.02831822e+00, 2.50627016e-03],
[ 4.39693638e+00, 7.14526714e+00, -1.83301102e+00,
8.41598861e+00, -4.57312873e+00]],
[[ 1.72690846e+01, 6.30828920e+00, -5.30917037e+00,
7.52455436e+00, 1.19643440e+01],
[ 9.12355405e+00, 1.67975018e+00, 2.93640941e+00,
-7.64215452e+00, 1.62410350e+00],
[-1.55437404e+00, 9.49132288e-01, 8.92834289e+00,
-1.37456729e+01, 9.86778010e+00],
[-7.21853497e+00, -4.47973496e+00, 3.25376041e+00,
-6.51526389e+00, 8.59162340e+00],
[-9.83341081e+00, 9.25969121e+00, -1.36367239e+01,
8.07390571e+00, 6.14360462e-01]],
[[ 4.65602528e+00, -1.48217159e+01, 7.67247150e+00,
-1.41809697e+01, 8.29187072e+00],
[-2.09188110e+01, -1.21744141e+01, -1.23980307e+00,
-1.67901253e+01, -1.11255548e+01],
[-1.71639719e+00, 8.41005260e+00, -9.16336234e+00,
-9.91380613e+00, -9.34633040e-01],
[-7.14082014e+00, -3.63309930e+00, 5.40634385e+00,
-1.65522254e+00, 5.61551645e+00],
[-6.91584808e-01, 1.46205820e+01, -6.56466213e+00,
-3.79375418e+00, 6.82807996e+00]]])
That’s all for now. There are more examples in the examples directory of the git repository for you to explore. Enjoy!