Expressions containing NDArray objects (and others)#

Python-Blosc2 implements a powerful way to operate with NDArray (and other flavors) objects. In this section, we will see how to do computations with NDArray arrays in a simple way.

[1]:
import numpy as np

import blosc2

A simple example#

First, let’s create a couple of NDArrays. We will use NumPy arrays to fill them.

[2]:
shape = (500, 1000)
npa = np.linspace(0, 1, np.prod(shape), dtype=np.float32).reshape(shape)
npb = np.linspace(1, 2, np.prod(shape), dtype=np.float64).reshape(shape)

a = blosc2.asarray(npa, urlpath="a.b2nd", mode="w")
b = blosc2.asarray(npb, urlpath="b.b2nd", mode="w")

Now, let’s create an expression that involves a and b

[3]:
c = a**2 + b**2 + 2 * a * b + 1
print(c.info)  # at this stage, the expression has not been evaluated yet
type       : LazyExpr
expression : ((((o0 ** 2) + (o1 ** 2)) + ((2 * o0) * o1)) + 1)
operands   : {'o0': 'a.b2nd', 'o1': 'b.b2nd'}
shape      : (500, 1000)
dtype      : float64

We see that the outcome of the expression is a LazyExpr object. This object is a placeholder for the actual computation that will be done when we evaluate it. This is a very powerful feature because it allows us to build complex expressions without actually computing them until we really need the result.

Now, let’s evaluate it. LazyExpr objects follow the LazyArray interface, and this provides several ways for performing the evaluation, depending on the object we want as the desired output.

First, let’s use the eval method. The result will be another NDArray array:

[4]:
d = c.eval()  # evaluate the expression
print(f"Class: {type(d)}")
print(f"Compression ratio: {d.schunk.cratio:.2f}x")
Class: <class 'blosc2.ndarray.NDArray'>
Compression ratio: 1.89x

We can specify different compression parameters for the result. For example, we can change the codec to zstd, use the bitshuffle filter, and the compression level set to 9:

[5]:
cparams = {"codec": blosc2.Codec.ZSTD, "filters": [blosc2.Filter.BITSHUFFLE], "clevel": 9}
d = c.eval(cparams=cparams)
print(f"Compression ratio: {d.schunk.cratio:.2f}x")
Compression ratio: 2.10x

Now, let’s evaluate the expression and store the result in a NumPy array. For this, we will use the __getitem__ method:

[6]:
npd = d[:]
print(f"Class: {type(npd)}")
Class: <class 'numpy.ndarray'>

Saving expressions to disk#

Finally, you can save expressions to disk. For this, use the save method of LazyArray objects. For example, let’s save the expression c to disk:

[7]:
c = a**2 + b**2 + 2 * a * b + 1
c.save(urlpath="expr.b2nd")

And you can load it back with the open function:

[8]:
c2 = blosc2.open("expr.b2nd")
print(c2.info)
type       : LazyExpr
expression : ((((o0 ** 2) + (o1 ** 2)) + ((2 * o0) * o1)) + 1)
operands   : {'o0': 'a.b2nd', 'o1': 'b.b2nd'}
shape      : (500, 1000)
dtype      : float64

Now, you can evaluate it as before:

[9]:
d2 = c2.eval()
print(f"Compression ratio: {d2.schunk.cratio:.2f}x")
Compression ratio: 1.89x

Reductions#

We can also perform reductions on NDArray arrays. Let’s see an example:

[10]:
c = (a + b).sum()
c
[10]:
999999.9999999471

As we can see, the result is a scalar. That means that reductions in expressions always perform the computation immediately. We can also specify the axis for the reduction:

[11]:
c = (a + b).sum(axis=1)
print(f"Shape of c: {c.shape}")
# Show the first 4 elements of the result
c[:4]
Shape of c: (500,)
[11]:
array([1001.998004  , 1005.998012  , 1009.99802   , 1013.99802799])

Selections#

We can also perform selections on NDArray arrays with structured types. Let’s see an example. First, we will create a structured array:

[12]:
nps = np.array(
    [(1, 2.0, b"Hello"), (2, 1.0, b"World"), (4, 3.9, b"World2")],
    dtype=[("A", "i4"), ("B", "f4"), ("C", "S10")],
)
s = blosc2.asarray(nps, urlpath="s.b2nd", mode="w")
s[:]
[12]:
array([(1, 2. , b'Hello'), (2, 1. , b'World'), (4, 3.9, b'World2')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

Now, we can select rows depending on the value of different fields:

[13]:
A = s.fields["A"]
B = s.fields["B"]
expr = s[A > B]
expr[:]
[13]:
array([(2, 1. , b'World'), (4, 3.9, b'World2')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

We can do the same on a more compact way using a expression in string form:

[14]:
expr = s["A > B"]
expr[:]
[14]:
array([(2, 1. , b'World'), (4, 3.9, b'World2')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

The expression can also be a complex one:

[15]:
C = s.fields["C"]
expr = s[(A > B) & (C == b"World")]
expr[:]
[15]:
array([(2, 1., b'World')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

We can also do selections and extract a single field:

[16]:
C[A > B][:]
[16]:
array([b'World', b'World2'], dtype='|S10')

Finally, we can do selections and perform reductions on them in one go by using the where() function. For example, let’s sum all the rows with the maximum of field A or field B:

[17]:
s[A > B].where(A, B).sum()
[17]:
8.0

Combining all the different weaponery of selections can make querying your data very effective. As the evaluation is lazy, all the operations are grouped and executed together for maximum performance; the only exception is that, when a reduction is found, it is evaluated eagerly, but still can be part of more general expressions.

Broadcasting#

NumPy arrays support broadcasting, and so do NDArray arrays. Let’s see an example:

[18]:
b2 = b[0]  # take the first row of b
print(f"Shape of a: {a.shape}, shape of b2: {b2.shape}")
Shape of a: (500, 1000), shape of b2: (1000,)

We see that the shapes of a and b2 are different. However, we can still operate with them and the broadcasting will be done automatically (à la NumPy):

[19]:
c2 = a + b2
d2 = c2.eval()
print(f"Compression ratio: {d2.schunk.cratio:.2f}x, shape: {d2.shape}")
Compression ratio: 32.63x, shape: (500, 1000)

The boradcasting feature is still experimental, and it may not work in all cases. If you find a bug, please report it to the Python-Blosc2 issue tracker.

Summary#

In this section, we have seen how to perform computations with NDArray arrays. We have seen how to create expressions, evaluate them, and save them to disk. We have also seen how to perform reductions, selections and combinations of both. Finally, we have seen how expressions containing operators having different (but compatible) shapes can be evaluated too. Lazy expressions are a very powerful feature that allows you to build and evaluate complex computations from operands that can be in-memory, on-disk or in remote boxes (C2Array) in a simple way, and very effectively (see the benchmarks).