<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Blosc Home Page  (Posts by Luke Shaw)</title><link>https://blosc.org/</link><description></description><atom:link href="https://blosc.org/authors/luke-shaw.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 &lt;a href="mailto:blosc@blosc.org"&gt;The Blosc Developers&lt;/a&gt; </copyright><lastBuildDate>Fri, 08 May 2026 07:30:58 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title> Cumulative reductions in Blosc2</title><link>https://blosc.org/posts/cumsum/</link><dc:creator>Luke Shaw</dc:creator><description>&lt;p&gt;As mentioned in previous blog posts (see &lt;a class="reference external" href="https://ironarray.io/blog/array-api"&gt;this blog&lt;/a&gt;) the maintainers of &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;python-blosc2&lt;/span&gt;&lt;/code&gt; are going all-in on Array API integration. This means adding new functions to bring the library up to the standard. Of course, integrating a given function may be more or less difficult for a given library which aspires to compatibility, depending on legacy code, design principles, and the overarching philosophy of the package. Since &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;python-blosc2&lt;/span&gt;&lt;/code&gt; uses chunked arrays, handling reductions and mapping between local chunk- and global array-indexing can be tricky. We had some help from Yang Kang Chua at UConn with this functionality - many thanks to him!&lt;/p&gt;
&lt;section id="cumulative-reductions"&gt;
&lt;h2&gt;Cumulative reductions&lt;/h2&gt;
&lt;p&gt;Consider an array &lt;code class="docutils literal"&gt;a&lt;/code&gt; of shape &lt;code class="docutils literal"&gt;(1000, 2000, 3000)&lt;/code&gt; and data type &lt;code class="docutils literal"&gt;float64&lt;/code&gt; (more on numerical precision later). The result of &lt;code class="docutils literal"&gt;sum(a, axis=0)&lt;/code&gt; would be &lt;code class="docutils literal"&gt;(20, 30)&lt;/code&gt; and &lt;code class="docutils literal"&gt;sum(a, axis=1)&lt;/code&gt; would be &lt;code class="docutils literal"&gt;(1000, 3000)&lt;/code&gt;. In general we can say that reductions &lt;em&gt;reduce&lt;/em&gt; the sizes of arrays. On the other hand, cumulative reductions store the intermediate reduction results along the reduction axis, so that the shape of the result is always the same as that of the input array: &lt;code class="docutils literal"&gt;cumulative_sum(a, axis=ax)&lt;/code&gt; is always &lt;code class="docutils literal"&gt;(1000, 2000, 3000)&lt;/code&gt; for any (valid) value of &lt;code class="docutils literal"&gt;ax&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This has a couple of consequences. One is that memory consumption may be rather important: the array &lt;code class="docutils literal"&gt;a&lt;/code&gt; will occupy &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;math.prod((1000,&lt;/span&gt; 2000, &lt;span class="pre"&gt;3000))*8/(1024**3)&lt;/span&gt; = 44.7GB&lt;/code&gt;, but its sum along the first axis only &lt;code class="docutils literal"&gt;.0447GB&lt;/code&gt;. Thus we can easily store the final result in memory. Not so for the result of &lt;code class="docutils literal"&gt;cumulative_sum&lt;/code&gt; which also occupies &lt;code class="docutils literal"&gt;44.7GB&lt;/code&gt;!&lt;/p&gt;
&lt;p&gt;The second consequence, for chunked array libraries, is that the order in which one loads chunks and calculates the result matters. Consider the following diagram, where we have a 1D array of three elements. To calculate the final sum, we may load the chunks in any order and do not require access to any previous value except the running total - loading the first, third and finally second chunks, we obtain the correct sum of 4. However, for the cumulative sum, each element of the result depends on the previous element (and from there the sum of all prior elements of the array). Consequently, we must ensure we load the chunks according to their order in memory - if not, we will end up with an incorrect final result. A minimal criterion is that the final element of the cumulative sum should be the same as the sum, which is not the case here!&lt;/p&gt;
&lt;img alt="/images/cumulative_sumprod/ordermatters.png" class="align-center" src="https://blosc.org/images/cumulative_sumprod/ordermatters.png" style="width: 50%;"&gt;
&lt;/section&gt;
&lt;section id="consequences-for-numerical-precision"&gt;
&lt;h2&gt;Consequences for numerical precision&lt;/h2&gt;
&lt;p&gt;When calculating reductions, numerical precision is a common hiccup. For products, one can quickly overflow the data type - the product of &lt;code class="docutils literal"&gt;arange(1, 14)&lt;/code&gt; already overflows the maximum value of &lt;code class="docutils literal"&gt;int32&lt;/code&gt;. For sums, rounding errors incurred due to adding elements of a small size to the running total of a large size can quickly become significant. For this reason, Numpy will try to use pairwise summation to calculate &lt;code class="docutils literal"&gt;sum(a)&lt;/code&gt; - this involves breaking the array into small parts, calculating the sum on each small part (i.e. simply successively adding elements to a running total), and then recursively summing pairs of sums until the final result is reached. Each recursive sum operation thus involves the sum of two numbers of similar size, thus reducing the rounding errors incurred when summing disparate numbers. This algorithm also only has a minimal additional overhead compared to the naive approach and is eminently parallelisable. And it has a natural recursive implementation, something which computer scientists always find appealing even if only for aesthetic reasons!&lt;/p&gt;
&lt;img alt="/images/cumulative_sumprod/pairwise_sum.png" class="align-center" src="https://blosc.org/images/cumulative_sumprod/pairwise_sum.png" style="width: 50%;"&gt;
&lt;p&gt;Unfortunately, such an approach is not possible for cumulative sums since, as discussed above, order matters! One possibility is to use Kahan summation (the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Kahan_summation_algorithm"&gt;Wikipedia article is excellent&lt;/a&gt;), which does have additional costs (both in terms of FLOPS and memory consumption) although these are not prohibitive. One essentially keeps track of the rounding errors incurred with an auxiliary running total and uses this to correct the sum:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_bc8b2b8623444a73817a0190eb8daf28-1" name="rest_code_bc8b2b8623444a73817a0190eb8daf28-1" href="https://blosc.org/posts/cumsum/#rest_code_bc8b2b8623444a73817a0190eb8daf28-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Kahan summation algorithm&lt;/span&gt;
&lt;a id="rest_code_bc8b2b8623444a73817a0190eb8daf28-2" name="rest_code_bc8b2b8623444a73817a0190eb8daf28-2" href="https://blosc.org/posts/cumsum/#rest_code_bc8b2b8623444a73817a0190eb8daf28-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;tot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;a id="rest_code_bc8b2b8623444a73817a0190eb8daf28-3" name="rest_code_bc8b2b8623444a73817a0190eb8daf28-3" href="https://blosc.org/posts/cumsum/#rest_code_bc8b2b8623444a73817a0190eb8daf28-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;tracker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;a id="rest_code_bc8b2b8623444a73817a0190eb8daf28-4" name="rest_code_bc8b2b8623444a73817a0190eb8daf28-4" href="https://blosc.org/posts/cumsum/#rest_code_bc8b2b8623444a73817a0190eb8daf28-4"&gt;&lt;/a&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_bc8b2b8623444a73817a0190eb8daf28-5" name="rest_code_bc8b2b8623444a73817a0190eb8daf28-5" href="https://blosc.org/posts/cumsum/#rest_code_bc8b2b8623444a73817a0190eb8daf28-5"&gt;&lt;/a&gt;    &lt;span class="n"&gt;corrected_el&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="c1"&gt;# nudge el with accumulated lost digits&lt;/span&gt;
&lt;a id="rest_code_bc8b2b8623444a73817a0190eb8daf28-6" name="rest_code_bc8b2b8623444a73817a0190eb8daf28-6" href="https://blosc.org/posts/cumsum/#rest_code_bc8b2b8623444a73817a0190eb8daf28-6"&gt;&lt;/a&gt;    &lt;span class="n"&gt;temp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tot&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;corrected_el&lt;/span&gt; &lt;span class="c1"&gt;# lose last few digits of el&lt;/span&gt;
&lt;a id="rest_code_bc8b2b8623444a73817a0190eb8daf28-7" name="rest_code_bc8b2b8623444a73817a0190eb8daf28-7" href="https://blosc.org/posts/cumsum/#rest_code_bc8b2b8623444a73817a0190eb8daf28-7"&gt;&lt;/a&gt;    &lt;span class="n"&gt;tracker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temp&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;tot&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;corrected_el&lt;/span&gt;  &lt;span class="c1"&gt;# store the lost digits of el&lt;/span&gt;
&lt;a id="rest_code_bc8b2b8623444a73817a0190eb8daf28-8" name="rest_code_bc8b2b8623444a73817a0190eb8daf28-8" href="https://blosc.org/posts/cumsum/#rest_code_bc8b2b8623444a73817a0190eb8daf28-8"&gt;&lt;/a&gt;    &lt;span class="n"&gt;tot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;temp&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In implementation, we calculate the cumulative sum on a decompressed chunk in order and then carry forward the last element of the cumulative sum (i.e. the sum of the whole chunk) to the next chunk, incrementing the result of the cumulative sum by this carried-over value to give the &lt;em&gt;global&lt;/em&gt; cumulative sum. Thus, we can use Kahan summation between the small(er) values of the local chunk cumulative sum and the large(r) carried-forward running total to try and conserve precision.&lt;/p&gt;
&lt;p&gt;Unfortunately, we still observe discrepancies with respect to the Numpy implementation (which sums element-by-element essentially) of cumulative sum - but this also differs from the results of &lt;code class="docutils literal"&gt;np.sum&lt;/code&gt; due to the latter's use of pairwise summation! Finite arithmetic imposes an insuperable barrier: three different algorithms cannot guarantee agreement in every possible case. Since the Kahan sum approach has a slight overhead, we decided to junk it, as it did not improve precision sufficiently to justify its use.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="experiments"&gt;
&lt;h2&gt;Experiments&lt;/h2&gt;
&lt;p&gt;We performed some experiments comparing the new &lt;code class="docutils literal"&gt;blosc2.cumulative_sum&lt;/code&gt; function to Numpy's version for some large arrays of (of size &lt;code class="docutils literal"&gt;(N, N, N)&lt;/code&gt; for various values of &lt;code class="docutils literal"&gt;N&lt;/code&gt;). Since the working set is double the size of the input array (input + output), we expect to see significant benefits from Blosc2 compression and exploitation of caching. Indeed, once the working set size starts to approach the available RAM (32 GB), NumPy begins to slow down rapidly and when the working set exceeds memory and swap must be used NumPy becomes vastly slower.&lt;/p&gt;
&lt;img alt="/images/cumulative_sumprod/cumsumbench.png" class="align-center" src="https://blosc.org/images/cumulative_sumprod/cumsumbench.png" style="width: 50%;"&gt;
&lt;p&gt;The plot shows the average computation time for &lt;code class="docutils literal"&gt;cumulative_sum&lt;/code&gt; over the three different axes of the input array. The benchmark code may be found &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/cumsum_bench.py"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusions"&gt;
&lt;h2&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;Blosc2 achieves superior compression and enables computation on larger datasets by tightly integrating compression and computation and interleaving I/O and computation. The returns on such an approach are clear in an era of &lt;a class="reference external" href="https://arstechnica.com/gadgets/2025/11/spiking-memory-prices-mean-that-it-is-once-again-a-horrible-time-to-build-a-pc/"&gt;increasingly expensive RAM&lt;/a&gt; and thus increasingly desirable memory efficiency. As an array library catering in a unique way to this growing need, bringing Blosc2 into greater alignment with the interlibrary array API standard is of utmost importance to ease its integration into users' workflows and applications. We are thus especially pleased that the performance of the freshly-implemented cumulative reduction operations mandated by the Array API standard only underline the validity of chunkwise operations.&lt;/p&gt;
&lt;p&gt;The Blosc team isn't resting on our laurels either, as we continue to optimise the existing framework to accelerate computations further. The recent introduction of the &lt;code class="docutils literal"&gt;miniexpr&lt;/code&gt; library into the backend is the capstone to these efforts, and has made the compression/computation integration truly seamless, &lt;a class="reference external" href="https://ironarray.io/blog/miniexpr-powered-blosc2"&gt;bringing incredible speedups for memory-bound computations&lt;/a&gt;, justifying Blosc2's compression-first, cache-aware philosophy. This all allows Blosc2 to handle significantly larger working sets than other solutions, delivering high performance for both in-memory and on-disk datasets, even exceeding available RAM.&lt;/p&gt;
&lt;p&gt;If you find our work useful and valuable, we would be grateful if you could support us by &lt;a class="reference external" href="https://www.blosc.org/pages/donate/"&gt;making a donation&lt;/a&gt;. Your contribution will help us continue to develop and improve Blosc packages, making them more accessible and useful for everyone.  Our team is committed to creating high-quality and efficient software, and your support will help us to achieve this goal.&lt;/p&gt;
&lt;/section&gt;</description><category>blosc array-api reductions computation</category><guid>https://blosc.org/posts/cumsum/</guid><pubDate>Mon, 16 Feb 2026 10:32:20 GMT</pubDate></item><item><title> OpenZL Plugin for Blosc2</title><link>https://blosc.org/posts/openzl-plugin/</link><dc:creator>Luke Shaw</dc:creator><description>&lt;p&gt;Blosc's philosophy of meta-compression is incredibly powerful - one is able to compose pipelines to optimally compress data (for speed or compression ratio), store information about the pipeline alognside the data in metadata, and then rely on a generic decompressor to read this and reverse the pipeline. The OpenZL team share our belief in the validity of this approach and have designed &lt;a class="reference external" href="https://openzl.org/"&gt;a graph-based formalisation with extensive support for all kinds of compression pipelines&lt;/a&gt; for all kinds of data.&lt;/p&gt;
&lt;p&gt;However, Blosc2 is now much more than just a compression library - it offers comprehensive indexing support (including fancy indexing via the python-blosc2 interface) as well as an increasingly rapid compute engine (see &lt;a class="reference external" href="https://ironarray.io/blog/miniexpr-powered-blosc2"&gt;this blog!&lt;/a&gt;). What if we could marry the incredibly comprehensive compression coverage of OpenZL with Blosc2's extended array manipulation functionality?&lt;/p&gt;
&lt;p&gt;Foreseeing precisely this sort of challenge, prior Blosc2 developers implemented a dynamic plugin register functionality (loading the plugin in C-Blosc2, which can be called via Python-Blosc2). This means that with some unintrusive, relatively concise interface code, one can link Blosc2 and OpenZL at runtime (without substantially modifying either) and offer Blosc2 arrays compressed and decompressed with OpenZL.&lt;/p&gt;
&lt;section id="the-openzl-plugin"&gt;
&lt;h2&gt;The OpenZL plugin&lt;/h2&gt;
&lt;p&gt;The source code for the plugin can be found &lt;a class="reference external" href="https://github.com/Blosc/blosc2-openzl"&gt;here&lt;/a&gt;. The minimal skeleton for the plugin layout follows&lt;/p&gt;
&lt;pre class="literal-block"&gt;├── CMakeLists.txt
├── blosc2_openzl
│   └── __init__.py
├── pyproject.toml
├── requirements-build.txt
└── src
    ├── CMakeLists.txt
    ├── blosc2_openzl.c
    └── blosc2_openzl.h&lt;/pre&gt;
&lt;p&gt;The &lt;code class="docutils literal"&gt;blosc2_openzl.c&lt;/code&gt; must implement an encoder and decoder which are exported via an &lt;code class="docutils literal"&gt;info&lt;/code&gt; struct:&lt;/p&gt;
&lt;pre class="literal-block"&gt;#include "blosc2_openzl.h"

BLOSC2_OPENZL_EXPORT codec_info info = {
    .encoder=(char *)"blosc2_openzl_encoder",
    .decoder=(char *)"blosc2_openzl_decoder"
};

int blosc2_openzl_encoder(const uint8_t* src, uint8_t* dest,
                                  int32_t size, uint8_t meta,
                                  blosc2_cparams *cparams, uint8_t id) {
  // code
}


int blosc2_openzl_decoder(const uint8_t *input, int32_t input_len, uint8_t *output,
                            int32_t output_len, uint8_t meta, blosc2_dparams *dparams,
                            const void *chunk) {
  // code
}&lt;/pre&gt;
&lt;p&gt;The header &lt;code class="docutils literal"&gt;blosc2_openzl.h&lt;/code&gt; then makes the &lt;code class="docutils literal"&gt;info&lt;/code&gt; and &lt;code class="docutils literal"&gt;encoder/decoder&lt;/code&gt; functions available to Blosc2:&lt;/p&gt;
&lt;pre class="literal-block"&gt;#include "blosc2.h"
#include "blosc2/codecs-registry.h"
#include "openzl/openzl.h"

BLOSC2_OPENZL_EXPORT int blosc2_openzl_encoder(...);

BLOSC2_OPENZL_EXPORT int blosc2_openzl_decoder(...);

// Declare the info struct as extern
extern BLOSC2_OPENZL_EXPORT codec_info info;&lt;/pre&gt;
&lt;/section&gt;
&lt;section id="pep-427-and-wheel-structure"&gt;
&lt;h2&gt;PEP 427 and wheel structure&lt;/h2&gt;
&lt;p&gt;In order for the plugin to dynamically link to Blosc2, it has to be able to find the Blosc2 library at runtime. This has historically been quite finicky since different platforms and package managers may store Python packages (and the associated &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;.so/.dylib/.dll&lt;/span&gt;&lt;/code&gt; library objects differently). Consequently, PEP 427 recommends distributing the Python wheels for packages which depend on compiled objects such as Python-Blosc2 in the following way&lt;/p&gt;
&lt;pre class="literal-block"&gt;blosc2
  ├── __init__.py
  ├── lib
  │   ├── libblosc2.so
  │   ├── cmake
  │   └── pkgconfig
  └── include
      └── blosc2.h&lt;/pre&gt;
&lt;p&gt;Finding the necessary &lt;code class="docutils literal"&gt;libblosc2.so&lt;/code&gt; object from the top-level &lt;code class="docutils literal"&gt;CMakeLists.txt&lt;/code&gt; file for the plugin is then as easy as:&lt;/p&gt;
&lt;pre class="literal-block"&gt;# Find blosc2 package location using Python
execute_process(
    COMMAND "${Python_EXECUTABLE}" -c "import blosc2, pathlib; print(pathlib.Path(blosc2.__file__).parent)"
    OUTPUT_VARIABLE BLOSC2_PACKAGE_DIR
)
set(BLOSC2_INCLUDE_DIR "${BLOSC2_PACKAGE_DIR}/include")
set(BLOSC2_LIB_DIR "${BLOSC2_PACKAGE_DIR}/lib")&lt;/pre&gt;
&lt;p&gt;After building the plugin backend in &lt;code class="docutils literal"&gt;src/CMakelists.txt&lt;/code&gt; one simply links the plugin to the backend (in this case &lt;code class="docutils literal"&gt;openzl&lt;/code&gt;) and installs like so:&lt;/p&gt;
&lt;pre class="literal-block"&gt;add_library(blosc2_openzl SHARED blosc2_openzl.c)
target_include_directories(blosc2_openzl PUBLIC ${BLOSC2_INCLUDE_DIR})
target_link_libraries(blosc2_openzl ${OPENZL_TARGET})
# Install
install(TARGETS blosc2_openzl
    RUNTIME DESTINATION blosc2_openzl
    LIBRARY DESTINATION blosc2_openzl
)&lt;/pre&gt;
&lt;p&gt;Note that it is not necessary to link &lt;code class="docutils literal"&gt;blosc2_openzl&lt;/code&gt; and &lt;code class="docutils literal"&gt;blosc2&lt;/code&gt; in &lt;code class="docutils literal"&gt;target_link_libraries&lt;/code&gt; as the former depends only on macros and structs defined in header files - and not functions. This makes the &lt;code class="docutils literal"&gt;libblosc2_openzl.so&lt;/code&gt; object especially light and robust, as blosc2 is not registered as an explicit dependency. In fact on Linux, even if the &lt;code class="docutils literal"&gt;blosc2_openzl.c&lt;/code&gt; were to include blosc2 functions, it is still not necessary to perform such linking!&lt;/p&gt;
&lt;p&gt;Following PEP 427 allows one to add an additional safeguard to check if the plugin fails to find blosc2 by adding the RUNTIME_PATH property to the installed object&lt;/p&gt;
&lt;pre class="literal-block"&gt;set_target_properties(blosc2_openzl PROPERTIES
    INSTALL_RPATH "$ORIGIN/../blosc2/lib"
)&lt;/pre&gt;
&lt;p&gt;It also allows one to easily find the plugin &lt;code class="docutils literal"&gt;.so&lt;/code&gt; object when calling via python - in the &lt;code class="docutils literal"&gt;blosc2_openzl/__init__.py&lt;/code&gt; file one can find the library path as easily as &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;os.path.abspath(Path(__file__).parent&lt;/span&gt; / libname)&lt;/code&gt; where &lt;code class="docutils literal"&gt;libname&lt;/code&gt; is the desired &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;.so/.dylib/.dll&lt;/span&gt;&lt;/code&gt; object (depending on platform). All these benefits have led us to update the wheel structure for &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;python-blosc2&lt;/span&gt;&lt;/code&gt; in the latest 4.0 release.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="using-openzl-from-python"&gt;
&lt;h2&gt;Using OpenZL from Python&lt;/h2&gt;
&lt;p&gt;Installing is then as simple as:&lt;/p&gt;
&lt;pre class="literal-block"&gt;pip install blosc2_openzl&lt;/pre&gt;
&lt;p&gt;One can also download the project and use the &lt;code class="docutils literal"&gt;cmake&lt;/code&gt; and &lt;code class="docutils literal"&gt;cmake &lt;span class="pre"&gt;--build&lt;/span&gt;&lt;/code&gt; commands to compile C-level tests or examples. But let's get compressing with &lt;code class="docutils literal"&gt;python&lt;/code&gt; straight away:&lt;/p&gt;
&lt;pre class="literal-block"&gt;import blosc2
import numpy as np
import blosc2_openzl
from blosc2_openzl import OpenZLProfile as OZLP
prof = OZLP.OZLPROF_SH_BD_LZ4
# Define the compression parameters for Blosc2
cparams = {'codec': blosc2.Codec.OPENZL, 'codec_meta': prof.value}

# Create (uncompressed) array
np_array = np.arange(1000).reshape((10,100))

# Compression with the OpenZL codec
bl_array = blosc2.asarray(np_array, cparams=cparams)
print(bl_array.cratio) # print compression ratio
&amp;gt;&amp;gt; 25.078369905956112&lt;/pre&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;The &lt;code class="docutils literal"&gt;OpenZLProfile&lt;/code&gt; enum contains the available profile pipelines that have been implemented in the plugin, which use the &lt;code class="docutils literal"&gt;codec_meta&lt;/code&gt; field (an 8-bit integer) to specify the desired transformation via codecs, filters and other nodes for the compression graph. Starting from the Least-Significant-Bit (LSB), setting the bits tells OpenZL how to build the graph:&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;CODEC | SHUFFLE | DELTA | SPLIT | CRC | x | x | x |&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;CODEC - If set, use LZ4. Else ZSTD.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SHUFFLE - If set, use shuffle (outputs a stream for every byte of input data typesize)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DELTA - If set, apply a bytedelta (to all streams if necessary)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SPLIT - If set, do not recombine the byte streams&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CRC - If set, store a checksum during compression and check it during decompression&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;The remaining bits may be used in the future.&lt;/p&gt;
&lt;p&gt;In the future it would be great to further expand the OpenZL functionalities that we can offer via the plugin, such as bespoke transformers trained via machine learning techniques - see the OpenZL page for a flavour of what can be done with the (still evolving) library.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusions"&gt;
&lt;h2&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;C-Blosc2's ability to support dynamically loaded plugins allows the library to grow in features without increasing the size and complexity of the library itself. For more information about user-defined plugins, refer to this &lt;a class="reference external" href="https://www.blosc.org/posts/registering-plugins/"&gt;blog entry&lt;/a&gt;. We have put this to work to offer linkage with the rather complex OpenZL library with a relatively rapid turnaround from design to prototype to full release in around a month. This is thanks to prior hard work by open source contributors from Blosc but naturally also OpenZL - many thanks to all!&lt;/p&gt;
&lt;p&gt;If you find our work useful and valuable, we would be grateful if you could support us by &lt;a class="reference external" href="https://www.blosc.org/pages/donate/"&gt;making a donation&lt;/a&gt;. Your contribution will help us continue to develop and improve Blosc packages, making them more accessible and useful for everyone.  Our team is committed to creating high-quality and efficient software, and your support will help us to achieve this goal.&lt;/p&gt;
&lt;/section&gt;</description><category>blosc plugins codecs openzl</category><guid>https://blosc.org/posts/openzl-plugin/</guid><pubDate>Fri, 30 Jan 2026 10:32:20 GMT</pubDate></item><item><title>Blosc2 Gets Fancy (Indexing)</title><link>https://blosc.org/posts/blosc2-fancy-indexing/</link><dc:creator>Luke Shaw</dc:creator><description>&lt;p&gt;&lt;strong&gt;Update (2025-08-26)&lt;/strong&gt;: After some further effort, the 1D fast path mentioned below has been extended to the multidimensional case, with consequent speedups in Blosc2 3.7.3! See below plot comparing maximum and minimum indexing times for the Blosc2-supported fancy indexing cases mentioned below.&lt;/p&gt;
&lt;img alt="/images/blosc2-fancy-indexing/newfancybench.png" src="https://blosc.org/images/blosc2-fancy-indexing/newfancybench.png"&gt;
&lt;p&gt;---&lt;/p&gt;
&lt;p&gt;In response to requests from our users, the Blosc2 team has &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/release_notes/index.html"&gt;introduced a fancy indexing capability&lt;/a&gt; into the flagship Blosc2 &lt;code class="docutils literal"&gt;NDArray&lt;/code&gt; object. In the future, this could be extended to other classes within the Blosc2 library, such as &lt;code class="docutils literal"&gt;C2Array&lt;/code&gt; and &lt;code class="docutils literal"&gt;LazyArray&lt;/code&gt;.&lt;/p&gt;
&lt;section id="what-is-fancy-indexing"&gt;
&lt;h2&gt;What is Fancy Indexing?&lt;/h2&gt;
&lt;p&gt;In many array libraries, most famously &lt;code class="docutils literal"&gt;NumPy&lt;/code&gt;, &lt;em&gt;fancy indexing&lt;/em&gt; refers to a vectorized indexing format which allows for simultaneous selection and reshaping of arrays (see &lt;a class="reference external" href="https://jakevdp.github.io/PythonDataScienceHandbook/02.07-fancy-indexing.html"&gt;this excerpt&lt;/a&gt;). For example, one may wish to select three entries from a 1D array:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr = array([10, 11, 12])&lt;/pre&gt;
&lt;p&gt;which can be done like so:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[[1,2,1]]
&amp;gt;&amp;gt; array([11, 12, 11])&lt;/pre&gt;
&lt;p&gt;Note that the order of the indices is arbitrary (i.e. the elements of the output may occur in a different order to the original array) and indices may be repeated. Moreover, if the array is multidimensional, for example:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr = array([[10, 11],
             [12, 13],
             [14, 15]])&lt;/pre&gt;
&lt;p&gt;then the output consists of the relevant rows:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[[1,2,0]]
&amp;gt;&amp;gt; array([[12, 13],
          [14, 15],
          [10, 11]])&lt;/pre&gt;
&lt;p&gt;and so on for arbitrary numbers of dimensions.&lt;/p&gt;
&lt;p&gt;Indeed one can output arbitrary shapes, for example via:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[[[1,2],[0,1]]]
&amp;gt;&amp;gt; array([[[12, 13],
          [14, 15]],

         [[10, 11],
          [12, 13]]])&lt;/pre&gt;
&lt;p&gt;NumPy supports many different kinds of fancy indexing, a flavour of which can be seen from the following examples, where &lt;code class="docutils literal"&gt;row&lt;/code&gt; and &lt;code class="docutils literal"&gt;col&lt;/code&gt; are integer array objects. If they are not of the same shape then broadcasting conventions will be applied to try to massage the index into an understandable format.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal"&gt;arr[row]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;arr[[row,&lt;/span&gt; col]]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal"&gt;arr[row, col]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;arr[row[:,&lt;/span&gt; None], col]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal"&gt;arr[1, col]&lt;/code&gt; or &lt;code class="docutils literal"&gt;arr[1:9, col]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In addition, one may use a boolean mask, in combination with integer indices, slices, or integer arrays via&lt;/p&gt;
&lt;ol class="arabic simple" start="6"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;arr[row[:,&lt;/span&gt; None], mask]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;where the &lt;code class="docutils literal"&gt;mask&lt;/code&gt; must have the same length as the indexed dimension(s).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="support-for-fancy-indexing-and-ndindex"&gt;
&lt;h2&gt;Support for Fancy Indexing and &lt;code class="docutils literal"&gt;ndindex&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Other libraries for management of large arrays such as &lt;code class="docutils literal"&gt;zarr&lt;/code&gt; and &lt;code class="docutils literal"&gt;h5py&lt;/code&gt; offer fancy indexing support but neither are as comprehensive as NumPy. &lt;code class="docutils literal"&gt;h5py&lt;/code&gt;, which uses the HDF5 format, is quite limited in that one may only use one integer array, no repeated indices are allowed, and the array must be sorted in increasing order, although mixed slice and integer array indexing is possible.
&lt;code class="docutils literal"&gt;zarr&lt;/code&gt;, via its &lt;code class="docutils literal"&gt;vindex&lt;/code&gt; (for vectorized index), offers more support, but is rather limited when it comes to mixed indexing, as slices may not be used with integer arrays, and an integer array must be provided for every dimension of the array (i.e. &lt;code class="docutils literal"&gt;arr[row]&lt;/code&gt; fails on any non-1D &lt;code class="docutils literal"&gt;arr&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;This makes it difficult (in the case of &lt;code class="docutils literal"&gt;zarr&lt;/code&gt;) or impossible (in the case of &lt;code class="docutils literal"&gt;h5py&lt;/code&gt;) to do the kind of reshaping we saw in the introduction (i.e. case 2 above &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;arr[[[1,2],[0,1]]]&lt;/span&gt;&lt;/code&gt;). This lack of support is due to a combination of: 1) the computational difficulty of many of these operations; and 2) the at times counter-intuitive behaviour of fancy indexing (see the end of this blog post for more details).&lt;/p&gt;
&lt;p&gt;When implementing fancy indexing for Blosc2 we strove to match the functionality of NumPy as closely as possible, and we have almost been able to do so — all the 6 cases mentioned above are perfectly feasible with this new Blosc2 release! There are only some minor edge cases which are not supported (see Example 2 in the Addendum). This would not have been possible without the excellent &lt;a class="reference external" href="https://quansight-labs.github.io/ndindex/index.html"&gt;ndindex library&lt;/a&gt;, which offers many very useful, efficient functions for index conversion between different shapes and chunks. We can then call NumPy behind-the-scenes, chunk-by-chunk, and exploit its native support for fancy indexing, without having to load the entire array into memory.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="results-blosc2-zarr-h5py-and-numpy"&gt;
&lt;h2&gt;Results: Blosc2, Zarr, H5Py and NumPy&lt;/h2&gt;
&lt;p&gt;Hence, when averaging over the indexing cases above on 2D arrays of varying sizes, we observe only a minor slowdown for Blosc2 compared to NumPy when the array size is small compared to total memory (24GB), suggesting a small chunking-and-indexing overhead. As expected, when the array grows to an appreciable fraction of memory (16GB), loading the full NumPy array into memory starts to impact performance. The black error bars in the plots indicate the maximum and minimum times observed over the indexing cases (for which there is clearly a large variation).&lt;/p&gt;
&lt;p&gt;Note that for cases 4 and 6 with large &lt;code class="docutils literal"&gt;row&lt;/code&gt; or &lt;code class="docutils literal"&gt;col&lt;/code&gt; index arrays, broadcasting causes the resulting index (stored in memory) to be very large, and even for array sizes of 2GB computation is too slow. In the future, we would like to see if this can be improved.&lt;/p&gt;
&lt;img alt="/images/blosc2-fancy-indexing/fancyIdxNumpyBlosc22D.png" src="https://blosc.org/images/blosc2-fancy-indexing/fancyIdxNumpyBlosc22D.png"&gt;
&lt;p&gt;Blosc2 is also as fast or faster than Zarr and HDF5 even for the limited use cases that the latter two libraries both support. HDF5 in particular is especially slow when the indexing array is very large.&lt;/p&gt;
&lt;img alt="/images/blosc2-fancy-indexing/fancyIdxNumpyBlosc2ZarrHDF52D.png" src="https://blosc.org/images/blosc2-fancy-indexing/fancyIdxNumpyBlosc2ZarrHDF52D.png"&gt;
&lt;p&gt;These plots have been generated using a Mac mini with the Apple M4 Pro processor. The benchmark is available on the Blosc2 github repo &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/fancy_index.py"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Blosc2 offers a powerful and flexible fancy indexing functionality that is more extensive than that of Zarr and H5Py, while also being able to handle large arrays on-disk without loading them into memory. This makes it a great choice for applications that require complex indexing operations on large datasets.
Give it a try in your own projects! If you have questions, the Blosc2 community is here to help.&lt;/p&gt;
&lt;p&gt;If you appreciate what we're doing with Blosc2, please think about &lt;a class="reference external" href="https://www.blosc.org/pages/blosc-in-depth/#support-blosc/"&gt;supporting us&lt;/a&gt;. Your help lets us keep making these tools better.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="addendum-oindex-vindex-and-fancyindex-via-two-examples"&gt;
&lt;h2&gt;Addendum: Oindex, Vindex and FancyIndex via Two Examples&lt;/h2&gt;
&lt;p&gt;Zarr's implementation of fancy indexing is packaged as &lt;code class="docutils literal"&gt;vindex&lt;/code&gt; (vectorized indexing). It also offers another indexing functionality, called orthogonal indexing, via &lt;code class="docutils literal"&gt;oindex&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The reason for this dual support becomes clear when one considers a simple example.&lt;/p&gt;
&lt;section id="example-1"&gt;
&lt;h3&gt;Example 1&lt;/h3&gt;
&lt;p&gt;For a 2D array, we have seen that the fancy-indexing rules will cause the two index arrays below to be broadcast together:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[[0, 1], [2, 3]] -&amp;gt; [arr[0,2], arr[1,3]]&lt;/pre&gt;
&lt;p&gt;giving an output with two elements of shape (2,). This is &lt;em&gt;vindexing&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;However, one could understand this indexing as selecting rows 0 and 1 in the array, and then their intersection with columns 2 and 3. This gives an output with &lt;em&gt;four&lt;/em&gt; elements of shape (2, 2), with elements:&lt;/p&gt;
&lt;pre class="literal-block"&gt;[[arr[0,2], arr[0,3]],
 [arr[1,2], arr[1,3]]]&lt;/pre&gt;
&lt;p&gt;This is &lt;em&gt;oindexing&lt;/em&gt;. Clearly, given the same index, the output is in general different; it is for this reason that the debate about fancy indexing can be quite polemical, and why there is a &lt;a class="reference external" href="https://NumPy.org/neps/nep-0021-advanced-indexing.html"&gt;movement&lt;/a&gt; to introduce the vindex/oindex duality in NumPy.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="example-2"&gt;
&lt;h3&gt;Example 2&lt;/h3&gt;
&lt;p&gt;I have glossed over this until now, but vindex is &lt;em&gt;not&lt;/em&gt; the same as fancy indexing. For this reason Zarr does not support all the functionality of fancy indexing, since it only supports vindex. The most important distinction between the two is that it seeks to avoid certain unexpected fancy indexing behaviour, as can be seen by considering a 3D NumPy array of shape &lt;code class="docutils literal"&gt;(X, Y, Z)&lt;/code&gt; as in the &lt;a class="reference external" href="https://NumPy.org/neps/nep-0021-advanced-indexing.html#mixed-indexing"&gt;example here&lt;/a&gt;. Consider the unexpected behaviour of:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[:10, :, [0,1]] has shape (10, Y, 2).

arr[0, :, [0, 1]] has shape (2, Y), not (Y, 2)!!&lt;/pre&gt;
&lt;p&gt;NumPy indexing treats non-slice indices differently, and will always put the axes introduced by the index array first, unless the non-slice indexes are consecutive, in which case it will try to massage the result to something intuitive (which normally coincides with the result of an &lt;code class="docutils literal"&gt;oindex&lt;/code&gt;) — hence &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;arr[:,&lt;/span&gt; 0, [0, 1]]&lt;/code&gt; has shape &lt;code class="docutils literal"&gt;(X, 2)&lt;/code&gt;, not &lt;code class="docutils literal"&gt;(2, X)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The hypothesised NumPy &lt;code class="docutils literal"&gt;vindex&lt;/code&gt; would eliminate this transposition behaviour, and be internally consistent, always putting the axes introduced by the index array first. Unfortunately, this is difficult and costly, and so the alternative is to simply not allow such indexing and throw an error, or force the user to be very specific.&lt;/p&gt;
&lt;p&gt;Blosc2 will throw an error when one inserts a slice between array indices:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[:, 0, [0, 1]] -&amp;gt; shape (X, 2)
arr.vindex[0, :, [0,1]] -&amp;gt; ERROR&lt;/pre&gt;
&lt;p&gt;Zarr's &lt;code class="docutils literal"&gt;vindex&lt;/code&gt; (called by &lt;code class="docutils literal"&gt;__getitem__&lt;/code&gt;), by requiring integer array indices for all dimensions, throws an error for all mixed indices of this type:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[:, 0, [0, 1]] -&amp;gt; ERROR
arr[0, :, [0,1]] -&amp;gt; ERROR&lt;/pre&gt;
&lt;p&gt;Thus to reproduce the result of Blosc2 for the first case, one must use an explicit index array:&lt;/p&gt;
&lt;pre class="literal-block"&gt;idx = np.array([0,1]).reshape(1,-1)
arr[np.arange(X).reshape(-1,1), 0 , idx] -&amp;gt; shape (X, 2)&lt;/pre&gt;
&lt;p&gt;For both Blosc2 and Zarr, one must use an explicit index array like so for the second case:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[0, np.arange(Y).reshape(-1,1), idx] -&amp;gt; shape (Y, 2)&lt;/pre&gt;
&lt;p&gt;Hopefully you now understand why fancy indexing can be so tricky, and why few libraries seek to support it to the same extent as NumPy - some would say it is perhaps not even desirable to do so!&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;</description><category>blosc2 fancyindex performance</category><guid>https://blosc.org/posts/blosc2-fancy-indexing/</guid><pubDate>Wed, 16 Jul 2025 13:33:20 GMT</pubDate></item></channel></rss>