User Defined Functions#

Python-Blosc2 implements a powerful way to operate with NDArray (and other flavors) objects. In this section, we will see how to do computations with NDArray arrays using functions defined by ourselves (aka User-Defined-Functions).

[1]:
import numba as nb
import numpy as np

import blosc2

A simple example#

First, let’s create a NumPy array which we will use to create and fill a NDArray.

[2]:
shape = (500, 1000)
npa = np.linspace(0, 1, np.prod(shape), dtype=np.float32).reshape(shape)

Now, let’s define our function. This function can be executed for each block or chunk and will always receive 3 parameters. The first one is the inputs tuple to which we can pass any operand such as a NDArray, NumPy array or Python scalar. The second is the output buffer to be filled and the third is an offset corresponding to the start inside the array of the chunk or block being filled.

[3]:
def add_one(inputs_tuple, output, offset):
    x = inputs_tuple[0]
    output[:] = x + 1

As you can see, this function will take the first input, add one and save the result in output.

Now, to actually create a LazyUDF we will use its constructor lazyudf.

[4]:
b = blosc2.lazyudf(add_one, (npa,), npa.dtype)
print(f"Class: {type(b)}")
Class: <class 'blosc2.lazyexpr.LazyUDF'>

Next, to execute and get the result of your function you can choose between the __getitem__ and eval methods. The main difference is that the first one will return the computed result as a NumPy array whereas the second one will return a NDArray. Let’s see __getitem__ first.

[5]:
%%time
npc = b[...]
print(f"Class: {type(npc)}")
Class: <class 'numpy.ndarray'>
CPU times: user 6.25 ms, sys: 6.59 ms, total: 12.8 ms
Wall time: 8.49 ms

Now, let’s use eval for the same purpose. The advantage of using this method is that you can pass some construction parameters for the resulting NDArray like the urlpath to store the resulting array on-disk.

[6]:
c = b.compute(urlpath="res.b2nd", mode="w")
print(f"Class: {type(c)}")
print(c.info)
Class: <class 'blosc2.ndarray.NDArray'>
type    : NDArray
shape   : (500, 1000)
chunks  : (500, 1000)
blocks  : (20, 1000)
dtype   : float32
cratio  : 23.13
cparams : {'blocksize': 80000,
 'clevel': 1,
 'codec': <Codec.ZSTD: 5>,
 'codec_meta': 0,
 'filters': [<Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.NOFILTER: 0>,
             <Filter.SHUFFLE: 1>],
 'filters_meta': [0, 0, 0, 0, 0, 0],
 'nthreads': 7,
 'splitmode': <SplitMode.ALWAYS_SPLIT: 1>,
 'typesize': 4,
 'use_dict': 0}
dparams : {'nthreads': 7}

Comparison with Numba#

In this section we will compare Python-Blosc2 performance with Numba. For this we will execute the same function but using Numba.

[11]:
@nb.jit(nopython=True, parallel=True)
def add_one(inputs_tuple, output, offset):
    x = inputs_tuple[0]
    output[:] = x + 1
[12]:
%%time
out = np.empty(c.shape, dtype=c.dtype)
add_one((npa,), out, 0)
CPU times: user 188 ms, sys: 4.06 ms, total: 192 ms
Wall time: 191 ms

As you can see, Python-Blosc2 was much faster than Numba.

Summary#

In this section, we have seen how to execute user-defined function and get the result as a NumPy or NDArray. We have also seen that the Python-Blosc2 LazyUDF is faster than the Numba way for getting the same result.