Using the client APIs#
To follow these instructions, make sure that you have started test Caterva2 services (see Launching Caterva2 services).
The top level client API#
Let’s try Caterva2’s top level client API (fully described in Top level API). Run your Python interpreter and enter:
import caterva2
roots = caterva2.get_roots()
We just connected to the default subscriber at localhost:8002
(you may specify a different one as an argument) and asked about all roots known by the broker. If you print roots
you’ll see a dictionary with a foo
entry:
{'foo': {'name': 'foo', 'http': 'localhost:8001', 'subscribed': None}}
Besides its name, it contains the address of the publisher providing it, and an indication that we’re not subscribed to it. Getting a list of datasets in that root with caterva2.get_list('foo')
will fail with 404 Not Found
. So let’s try again by first subscribing to it:
caterva2.subscribe('foo')
datasets = caterva2.get_list('foo')
If you print datasets
you’ll see a list of datasets in the foo
root:
['ds-1d.b2nd', 'ds-hello.b2frame', 'ds-1d-b.b2nd', 'README.md',
'dir1/ds-3d.b2nd', 'dir1/ds-2d.b2nd', 'dir2/ds-4d.b2nd']
(If you repeat the call to caterva2.get_roots()
you’ll see that foo
has subscribed=True
now.)
We can get some information about a dataset without downloading it:
metadata = caterva2.get_info('foo/dir1/ds-2d.b2nd')
Note how we identify the dataset by using a slash /
to concatenate the root name with the dataset name in that root (which may contain slashes itself). The metadata
dictionary contains assorted dataset attributes:
{'dtype': 'uint16',
'ndim': 2,
'shape': [10, 20],
# ...
'schunk': {# ...
'cparams': {'codec': 5, # ...
},
# ...
},
# ...
'size': 400}
So foo/dir1/ds-2d.b2nd
is a 10x20 dataset of 16-bit unsigned integers. With caterva2.fetch()
we can get as a NumPy array the whole dataset or just a part of it (passing a string representation of the slice that we would use between brackets as the slice_
argument):
caterva2.fetch('foo/dir1/ds-2d.b2nd', slice_='0:2, 4:8')
This returns just the requested slice:
array([[ 4, 5, 6, 7],
[24, 25, 26, 27]], dtype=uint16)
If the dataset is big and well compressed, and Blosc2 is available at the client, including the prefer_schunk=True
argument may save resources when transferring data between subscriber and client.
Finally, you may want to save the whole dataset locally:
caterva2.download('foo/dir1/ds-2d.b2nd')
The call downloads the dataset as a file and returns its local path PosixPath('foo/dir1/ds-2d.b2nd')
, which should be similar to the dataset name.
The object-oriented client API#
The top level client API is simple but not very pythonic. Fortunately, Caterva2 also provides a light and concise object-oriented client API (fully described in Root class, File class and Dataset class), similar to that of h5py.
First, let’s create a caterva2.Root
instance for the foo
root (using the default subscriber – remember to start your Caterva2 services first):
foo = caterva2.Root('foo')
This also takes care of subscribing to foo
if it hasn’t been done yet. To get the list of datasets in the root, just access foo.node_list
:
['ds-1d.b2nd', 'ds-hello.b2frame', 'ds-1d-b.b2nd', 'README.md',
'dir1/ds-3d.b2nd', 'dir1/ds-2d.b2nd', 'dir2/ds-4d.b2nd']
Indexing the caterva2.Root
instance with the name of the dataset results in a caterva2.Dataset
instance (or caterva2.File
, as we’ll see below). The instance offers easy access to its metadata via the meta
attribute:
ds2d = foo['dir1/ds-2d.b2nd']
ds2d.meta
We get the dataset metadata:
{'dtype': 'uint16',
'ndim': 2,
'shape': [10, 20],
# ...
'size': 400}
Getting data from the dataset is very concise, as caterva2.Dataset
instances support slicing notation, so this expression:
ds2d[0:2, 4:8]
Results in the same slice as the (much more verbose) caterva2.fetch()
call in the previous section:
array([[ 4, 5, 6, 7],
[24, 25, 26, 27]], dtype=uint16)
Slicing like this automatically uses Blosc2 for the transfer when available. Finally, you may download the whole dataset like this, which also returns the path of the resulting local file:
ds2d.download() # -> PosixPath('foo/dir1/ds-2d.b2nd')
On datasets and files#
The type of instance that you get from indexing a caterva2.Root
instance depends on the kind of the named dataset: for datasets whose name ends in .b2nd
(n-dimensional Blosc2 array) or .b2frame
(byte string in a Blosc2 frame) you’ll get a caterva2.Dataset
, while otherwise you’ll get a caterva2.File
(non-Blosc2 data). Both classes support the same operations, with slicing only supporting one dimension and always returning a byte string for Blosc2 frames and other files:
type(ds2d[0:2, 4:8]) # -> <class 'numpy.ndarray'>
type(foo['ds-hello.b2frame'][:10]) # -> <class 'bytes'>
type(foo['README.md'][:10]) # -> <class 'bytes'>