User Defined Functions#
In this section, we will see how to do computations with NDArray and/or NumPy arrays using functions defined by ourselves (aka User-Defined-Functions).
[1]:
import numba as nb
import numpy as np
import blosc2
A simple example#
First, let’s create a NDArray, a NumPy array and regular scalar, which we will use to create a LazyArray.
[2]:
shape = (5_000, 2_000)
a = np.linspace(0, 1, np.prod(shape), dtype=np.int32).reshape(shape)
b = blosc2.arange(np.prod(shape), dtype=np.float32, shape=shape)
s = 2.1 # a regular scalar
Now, let’s define our function. This function can be executed for each chunk and will always receive 3 parameters. The first one is the inputs tuple to which we can pass any operand such as a NDArray, NumPy array or Python scalar. The second is the output buffer to be filled and the third is an offset corresponding to the starting point inside the array of the chunk being filled.
[3]:
def myudf(inputs_tuple, output, offset):
x, y, s = inputs_tuple # at this point, all are either numpy arrays or scalars
output[:] = x**3 + np.sin(y) + s + 1
As you can see, this function will take the first input, add one and save the result in output.
Now, to actually create a LazyUDF
object (which also follows the LazyArray interface) we will use its constructor lazyudf
.
[4]:
larray = blosc2.lazyudf(myudf, (a, b, s), a.dtype)
print(f"Type: {type(larray)}")
Type: <class 'blosc2.lazyexpr.LazyUDF'>
Next, to execute and get the result of your function you can choose between the __getitem__
and compute
methods. The main difference is that the first one will return the computed result as a NumPy array whereas the second one will return a NDArray. Let’s see __getitem__
first.
[5]:
%%time
npc = larray[:]
print(f"Type: {type(npc)}")
Type: <class 'numpy.ndarray'>
CPU times: user 182 ms, sys: 43.3 ms, total: 226 ms
Wall time: 190 ms
Now, let’s use compute
for the same purpose. The advantage of using this method is that you can pass some construction parameters for the resulting NDArray like the urlpath
to store the resulting array on-disk.
[6]:
%%time
c = larray.compute(urlpath="larray.b2nd", mode="w")
print(f"Type: {type(c)}")
print(c.info)
Type: <class 'blosc2.ndarray.NDArray'>
type : NDArray
shape : (5000, 2000)
chunks : (500, 2000)
blocks : (20, 2000)
dtype : int32
cratio : 258.64
cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=4,
: nthreads=11, blocksize=160000, splitmode=<SplitMode.AUTO_SPLIT: 3>,
: filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,
: <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,
: 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)
dparams : DParams(nthreads=11)
CPU times: user 211 ms, sys: 38.7 ms, total: 249 ms
Wall time: 181 ms
Using Numba#
Let’s see how Python-Blosc2 can use Numba as an UDF. For this, let’s decorate the same function with Numba.
[7]:
@nb.jit(nopython=True, parallel=True)
def myudf_numba(inputs_tuple, output, offset):
x, y, s = inputs_tuple
output[:] = x**3 + np.sin(y) + s + 1
[8]:
larray2 = blosc2.lazyudf(myudf_numba, (a, b, s), a.dtype)
Cool! We made our first Numba UDF function. Now, let’s evaluate it.
[9]:
%%time
npc2 = larray2[:]
CPU times: user 996 ms, sys: 122 ms, total: 1.12 s
Wall time: 917 ms
Incidentally, the pure Python version was faster than Numba. This is because Numba has large initialization overheads and the function is quite simple. For more complex functions, or larger arrays, the difference will be less noticeable or favorable to it.
Exercise#
Check which array size Numba UDF starts to be competitive. If you master Numba enough, you may also want to unroll loops in UDF and see whether you can make it faster.
Summary#
We have seen how to build new LazyArray objects coming from other NDArray or NumPy objects and use User Defined Functions (UDFs) to create the desired result. We have also demonstrated that integrating Numba in UDF is pretty easy.