Expressions containing NDArray objects (and others)#
Python-Blosc2 implements a powerful way to operate with NDArray (and other flavors) objects. In this section, we will see how to do computations with NDArray arrays in a simple way.
[1]:
import numpy as np
import blosc2
A simple example#
First, let’s create a couple of NDArrays. We will use NumPy arrays to fill them.
[2]:
shape = (500, 1000)
npa = np.linspace(0, 1, np.prod(shape), dtype=np.float32).reshape(shape)
npb = np.linspace(1, 2, np.prod(shape), dtype=np.float64).reshape(shape)
a = blosc2.asarray(npa, urlpath="a.b2nd", mode="w")
b = blosc2.asarray(npb, urlpath="b.b2nd", mode="w")
Now, let’s create an expression that involves a
and b
[3]:
c = a**2 + b**2 + 2 * a * b + 1
print(c.info) # at this stage, the expression has not been evaluated yet
type : LazyExpr
expression : ((((o0 ** 2) + (o1 ** 2)) + ((2 * o0) * o1)) + 1)
operands : {'o0': 'a.b2nd', 'o1': 'b.b2nd'}
shape : (500, 1000)
dtype : float64
We see that the outcome of the expression is a LazyExpr
object. This object is a placeholder for the actual computation that will be done when we evaluate it. This is a very powerful feature because it allows us to build complex expressions without actually computing them until we really need the result.
Now, let’s evaluate it. LazyExpr
objects follow the LazyArray
interface, and this provides several ways for performing the evaluation, depending on the object we want as the desired output.
First, let’s use the eval
method. The result will be another NDArray array:
[4]:
d = c.compute() # evaluate the expression
print(f"Class: {type(d)}")
print(f"Compression ratio: {d.schunk.cratio:.2f}x")
Class: <class 'blosc2.ndarray.NDArray'>
Compression ratio: 1.89x
We can specify different compression parameters for the result. For example, we can change the codec to zstd
, use the bitshuffle filter, and the compression level set to 9:
[7]:
cparams = blosc2.CParams(
codec=blosc2.Codec.ZSTD, filters=[blosc2.Filter.BITSHUFFLE], clevel=9, filters_meta=[0]
)
d = c.compute(cparams=cparams)
print(f"Compression ratio: {d.schunk.cratio:.2f}x")
Compression ratio: 2.08x
Now, let’s evaluate the expression and store the result in a NumPy array. For this, we will use the __getitem__
method:
[8]:
npd = d[:]
print(f"Class: {type(npd)}")
Class: <class 'numpy.ndarray'>
Saving expressions to disk#
Finally, you can save expressions to disk. For this, use the save
method of LazyArray
objects. For example, let’s save the expression c
to disk:
[9]:
c = a**2 + b**2 + 2 * a * b + 1
c.save(urlpath="expr.b2nd")
And you can load it back with the open
function:
[10]:
c2 = blosc2.open("expr.b2nd")
print(c2.info)
type : LazyExpr
expression : ((((o0 ** 2) + (o1 ** 2)) + ((2 * o0) * o1)) + 1)
operands : {'o0': 'a.b2nd', 'o1': 'b.b2nd'}
shape : (500, 1000)
dtype : float64
Now, you can evaluate it as before:
[11]:
d2 = c2.compute()
print(f"Compression ratio: {d2.schunk.cratio:.2f}x")
Compression ratio: 1.89x
Reductions#
We can also perform reductions on NDArray arrays. Let’s see an example:
[12]:
c = (a + b).sum()
c
[12]:
999999.9999999471
As we can see, the result is a scalar. That means that reductions in expressions always perform the computation immediately. We can also specify the axis for the reduction:
[11]:
c = (a + b).sum(axis=1)
print(f"Shape of c: {c.shape}")
# Show the first 4 elements of the result
c[:4]
Shape of c: (500,)
[11]:
array([1001.998004 , 1005.998012 , 1009.99802 , 1013.99802799])
Selections#
We can also perform selections on NDArray arrays with structured types. Let’s see an example. First, we will create a structured array:
[12]:
nps = np.array(
[(1, 2.0, b"Hello"), (2, 1.0, b"World"), (4, 3.9, b"World2")],
dtype=[("A", "i4"), ("B", "f4"), ("C", "S10")],
)
s = blosc2.asarray(nps, urlpath="s.b2nd", mode="w")
s[:]
[12]:
array([(1, 2. , b'Hello'), (2, 1. , b'World'), (4, 3.9, b'World2')],
dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])
Now, we can select rows depending on the value of different fields:
[13]:
A = s.fields["A"]
B = s.fields["B"]
expr = s[A > B]
expr[:]
[13]:
array([(2, 1. , b'World'), (4, 3.9, b'World2')],
dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])
We can do the same on a more compact way using a expression in string form:
[14]:
expr = s["A > B"]
expr[:]
[14]:
array([(2, 1. , b'World'), (4, 3.9, b'World2')],
dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])
The expression can also be a complex one:
[15]:
C = s.fields["C"]
expr = s[(A > B) & (C == b"World")]
expr[:]
[15]:
array([(2, 1., b'World')],
dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])
We can also do selections and extract a single field:
[16]:
C[A > B][:]
[16]:
array([b'World', b'World2'], dtype='|S10')
Finally, we can do selections and perform reductions on them in one go by using the where()
function. For example, let’s sum all the rows with the maximum of field A
or field B
:
[17]:
s[A > B].where(A, B).sum()
[17]:
8.0
Combining all the different weaponery of selections can make querying your data very effective. As the evaluation is lazy, all the operations are grouped and executed together for maximum performance; the only exception is that, when a reduction is found, it is evaluated eagerly, but still can be part of more general expressions.
Broadcasting#
NumPy arrays support broadcasting, and so do NDArray arrays. Let’s see an example:
[18]:
b2 = b[0] # take the first row of b
print(f"Shape of a: {a.shape}, shape of b2: {b2.shape}")
Shape of a: (500, 1000), shape of b2: (1000,)
We see that the shapes of a
and b2
are different. However, we can still operate with them and the broadcasting will be done automatically (à la NumPy):
[19]:
c2 = a + b2
d2 = c2.compute()
print(f"Compression ratio: {d2.schunk.cratio:.2f}x, shape: {d2.shape}")
Compression ratio: 32.63x, shape: (500, 1000)
The boradcasting feature is still experimental, and it may not work in all cases. If you find a bug, please report it to the Python-Blosc2 issue tracker.
Summary#
In this section, we have seen how to perform computations with NDArray arrays, and more in particular, how to create expressions, evaluate them, and save them to disk. Also, we have looked at performing reductions, selections and combinations of both. Finally, we have seen how expressions containing operators having different (but compatible) shapes can be evaluated too. Lazy expressions are a very powerful feature that allows you to build and evaluate complex computations from operands that can
be in-memory, on-disk or in remote boxes (C2Array
) in a simple way, and very effectively.
[ ]: