<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Blosc Home Page  (Posts about posts)</title><link>https://blosc.org/</link><description></description><atom:link href="https://blosc.org/categories/cat_posts.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 &lt;a href="mailto:blosc@blosc.org"&gt;The Blosc Developers&lt;/a&gt; </copyright><lastBuildDate>Wed, 04 Mar 2026 11:43:33 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title> Cumulative reductions in Blosc2</title><link>https://blosc.org/posts/cumsum/</link><dc:creator>Luke Shaw</dc:creator><description>&lt;p&gt;As mentioned in previous blog posts (see &lt;a class="reference external" href="https://ironarray.io/blog/array-api"&gt;this blog&lt;/a&gt;) the maintainers of &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;python-blosc2&lt;/span&gt;&lt;/code&gt; are going all-in on Array API integration. This means adding new functions to bring the library up to the standard. Of course, integrating a given function may be more or less difficult for a given library which aspires to compatibility, depending on legacy code, design principles, and the overarching philosophy of the package. Since &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;python-blosc2&lt;/span&gt;&lt;/code&gt; uses chunked arrays, handling reductions and mapping between local chunk- and global array-indexing can be tricky. We had some help from Yang Kang Chua at UConn with this functionality - many thanks to him!&lt;/p&gt;
&lt;section id="cumulative-reductions"&gt;
&lt;h2&gt;Cumulative reductions&lt;/h2&gt;
&lt;p&gt;Consider an array &lt;code class="docutils literal"&gt;a&lt;/code&gt; of shape &lt;code class="docutils literal"&gt;(1000, 2000, 3000)&lt;/code&gt; and data type &lt;code class="docutils literal"&gt;float64&lt;/code&gt; (more on numerical precision later). The result of &lt;code class="docutils literal"&gt;sum(a, axis=0)&lt;/code&gt; would be &lt;code class="docutils literal"&gt;(20, 30)&lt;/code&gt; and &lt;code class="docutils literal"&gt;sum(a, axis=1)&lt;/code&gt; would be &lt;code class="docutils literal"&gt;(1000, 3000)&lt;/code&gt;. In general we can say that reductions &lt;em&gt;reduce&lt;/em&gt; the sizes of arrays. On the other hand, cumulative reductions store the intermediate reduction results along the reduction axis, so that the shape of the result is always the same as that of the input array: &lt;code class="docutils literal"&gt;cumulative_sum(a, axis=ax)&lt;/code&gt; is always &lt;code class="docutils literal"&gt;(1000, 2000, 3000)&lt;/code&gt; for any (valid) value of &lt;code class="docutils literal"&gt;ax&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This has a couple of consequences. One is that memory consumption may be rather important: the array &lt;code class="docutils literal"&gt;a&lt;/code&gt; will occupy &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;math.prod((1000,&lt;/span&gt; 2000, &lt;span class="pre"&gt;3000))*8/(1024**3)&lt;/span&gt; = 44.7GB&lt;/code&gt;, but its sum along the first axis only &lt;code class="docutils literal"&gt;.0447GB&lt;/code&gt;. Thus we can easily store the final result in memory. Not so for the result of &lt;code class="docutils literal"&gt;cumulative_sum&lt;/code&gt; which also occupies &lt;code class="docutils literal"&gt;44.7GB&lt;/code&gt;!&lt;/p&gt;
&lt;p&gt;The second consequence, for chunked array libraries, is that the order in which one loads chunks and calculates the result matters. Consider the following diagram, where we have a 1D array of three elements. To calculate the final sum, we may load the chunks in any order and do not require access to any previous value except the running total - loading the first, third and finally second chunks, we obtain the correct sum of 4. However, for the cumulative sum, each element of the result depends on the previous element (and from there the sum of all prior elements of the array). Consequently, we must ensure we load the chunks according to their order in memory - if not, we will end up with an incorrect final result. A minimal criterion is that the final element of the cumulative sum should be the same as the sum, which is not the case here!&lt;/p&gt;
&lt;img alt="/images/cumulative_sumprod/ordermatters.png" class="align-center" src="https://blosc.org/images/cumulative_sumprod/ordermatters.png" style="width: 50%;"&gt;
&lt;/section&gt;
&lt;section id="consequences-for-numerical-precision"&gt;
&lt;h2&gt;Consequences for numerical precision&lt;/h2&gt;
&lt;p&gt;When calculating reductions, numerical precision is a common hiccup. For products, one can quickly overflow the data type - the product of &lt;code class="docutils literal"&gt;arange(1, 14)&lt;/code&gt; already overflows the maximum value of &lt;code class="docutils literal"&gt;int32&lt;/code&gt;. For sums, rounding errors incurred due to adding elements of a small size to the running total of a large size can quickly become significant. For this reason, Numpy will try to use pairwise summation to calculate &lt;code class="docutils literal"&gt;sum(a)&lt;/code&gt; - this involves breaking the array into small parts, calculating the sum on each small part (i.e. simply successively adding elements to a running total), and then recursively summing pairs of sums until the final result is reached. Each recursive sum operation thus involves the sum of two numbers of similar size, thus reducing the rounding errors incurred when summing disparate numbers. This algorithm also only has a minimal additional overhead compared to the naive approach and is eminently parallelisable. And it has a natural recursive implementation, something which computer scientists always find appealing even if only for aesthetic reasons!&lt;/p&gt;
&lt;img alt="/images/cumulative_sumprod/pairwise_sum.png" class="align-center" src="https://blosc.org/images/cumulative_sumprod/pairwise_sum.png" style="width: 50%;"&gt;
&lt;p&gt;Unfortunately, such an approach is not possible for cumulative sums since, as discussed above, order matters! One possibility is to use Kahan summation (the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Kahan_summation_algorithm"&gt;Wikipedia article is excellent&lt;/a&gt;), which does have additional costs (both in terms of FLOPS and memory consumption) although these are not prohibitive. One essentially keeps track of the rounding errors incurred with an auxiliary running total and uses this to correct the sum:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-1" name="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-1" href="https://blosc.org/posts/cumsum/#rest_code_ba32b3b5f6b34c2b89d42942da7487fd-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Kahan summation algorithm&lt;/span&gt;
&lt;a id="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-2" name="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-2" href="https://blosc.org/posts/cumsum/#rest_code_ba32b3b5f6b34c2b89d42942da7487fd-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;tot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;a id="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-3" name="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-3" href="https://blosc.org/posts/cumsum/#rest_code_ba32b3b5f6b34c2b89d42942da7487fd-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;tracker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;a id="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-4" name="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-4" href="https://blosc.org/posts/cumsum/#rest_code_ba32b3b5f6b34c2b89d42942da7487fd-4"&gt;&lt;/a&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-5" name="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-5" href="https://blosc.org/posts/cumsum/#rest_code_ba32b3b5f6b34c2b89d42942da7487fd-5"&gt;&lt;/a&gt;    &lt;span class="n"&gt;corrected_el&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="c1"&gt;# nudge el with accumulated lost digits&lt;/span&gt;
&lt;a id="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-6" name="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-6" href="https://blosc.org/posts/cumsum/#rest_code_ba32b3b5f6b34c2b89d42942da7487fd-6"&gt;&lt;/a&gt;    &lt;span class="n"&gt;temp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tot&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;corrected_el&lt;/span&gt; &lt;span class="c1"&gt;# lose last few digits of el&lt;/span&gt;
&lt;a id="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-7" name="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-7" href="https://blosc.org/posts/cumsum/#rest_code_ba32b3b5f6b34c2b89d42942da7487fd-7"&gt;&lt;/a&gt;    &lt;span class="n"&gt;tracker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temp&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;tot&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;corrected_el&lt;/span&gt;  &lt;span class="c1"&gt;# store the lost digits of el&lt;/span&gt;
&lt;a id="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-8" name="rest_code_ba32b3b5f6b34c2b89d42942da7487fd-8" href="https://blosc.org/posts/cumsum/#rest_code_ba32b3b5f6b34c2b89d42942da7487fd-8"&gt;&lt;/a&gt;    &lt;span class="n"&gt;tot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;temp&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In implementation, we calculate the cumulative sum on a decompressed chunk in order and then carry forward the last element of the cumulative sum (i.e. the sum of the whole chunk) to the next chunk, incrementing the result of the cumulative sum by this carried-over value to give the &lt;em&gt;global&lt;/em&gt; cumulative sum. Thus, we can use Kahan summation between the small(er) values of the local chunk cumulative sum and the large(r) carried-forward running total to try and conserve precision.&lt;/p&gt;
&lt;p&gt;Unfortunately, we still observe discrepancies with respect to the Numpy implementation (which sums element-by-element essentially) of cumulative sum - but this also differs from the results of &lt;code class="docutils literal"&gt;np.sum&lt;/code&gt; due to the latter's use of pairwise summation! Finite arithmetic imposes an insuperable barrier: three different algorithms cannot guarantee agreement in every possible case. Since the Kahan sum approach has a slight overhead, we decided to junk it, as it did not improve precision sufficiently to justify its use.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="experiments"&gt;
&lt;h2&gt;Experiments&lt;/h2&gt;
&lt;p&gt;We performed some experiments comparing the new &lt;code class="docutils literal"&gt;blosc2.cumulative_sum&lt;/code&gt; function to Numpy's version for some large arrays of (of size &lt;code class="docutils literal"&gt;(N, N, N)&lt;/code&gt; for various values of &lt;code class="docutils literal"&gt;N&lt;/code&gt;). Since the working set is double the size of the input array (input + output), we expect to see significant benefits from Blosc2 compression and exploitation of caching. Indeed, once the working set size starts to approach the available RAM (32 GB), NumPy begins to slow down rapidly and when the working set exceeds memory and swap must be used NumPy becomes vastly slower.&lt;/p&gt;
&lt;img alt="/images/cumulative_sumprod/cumsumbench.png" class="align-center" src="https://blosc.org/images/cumulative_sumprod/cumsumbench.png" style="width: 50%;"&gt;
&lt;p&gt;The plot shows the average computation time for &lt;code class="docutils literal"&gt;cumulative_sum&lt;/code&gt; over the three different axes of the input array. The benchmark code may be found &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/cumsum_bench.py"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusions"&gt;
&lt;h2&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;Blosc2 achieves superior compression and enables computation on larger datasets by tightly integrating compression and computation and interleaving I/O and computation. The returns on such an approach are clear in an era of &lt;a class="reference external" href="https://arstechnica.com/gadgets/2025/11/spiking-memory-prices-mean-that-it-is-once-again-a-horrible-time-to-build-a-pc/"&gt;increasingly expensive RAM&lt;/a&gt; and thus increasingly desirable memory efficiency. As an array library catering in a unique way to this growing need, bringing Blosc2 into greater alignment with the interlibrary array API standard is of utmost importance to ease its integration into users' workflows and applications. We are thus especially pleased that the performance of the freshly-implemented cumulative reduction operations mandated by the Array API standard only underline the validity of chunkwise operations.&lt;/p&gt;
&lt;p&gt;The Blosc team isn't resting on our laurels either, as we continue to optimise the existing framework to accelerate computations further. The recent introduction of the &lt;code class="docutils literal"&gt;miniexpr&lt;/code&gt; library into the backend is the capstone to these efforts, and has made the compression/computation integration truly seamless, &lt;a class="reference external" href="https://ironarray.io/blog/miniexpr-powered-blosc2"&gt;bringing incredible speedups for memory-bound computations&lt;/a&gt;, justifying Blosc2's compression-first, cache-aware philosophy. This all allows Blosc2 to handle significantly larger working sets than other solutions, delivering high performance for both in-memory and on-disk datasets, even exceeding available RAM.&lt;/p&gt;
&lt;p&gt;If you find our work useful and valuable, we would be grateful if you could support us by &lt;a class="reference external" href="https://www.blosc.org/pages/donate/"&gt;making a donation&lt;/a&gt;. Your contribution will help us continue to develop and improve Blosc packages, making them more accessible and useful for everyone.  Our team is committed to creating high-quality and efficient software, and your support will help us to achieve this goal.&lt;/p&gt;
&lt;/section&gt;</description><category>blosc array-api reductions computation</category><guid>https://blosc.org/posts/cumsum/</guid><pubDate>Mon, 16 Feb 2026 10:32:20 GMT</pubDate></item><item><title> OpenZL Plugin for Blosc2</title><link>https://blosc.org/posts/openzl-plugin/</link><dc:creator>Luke Shaw</dc:creator><description>&lt;p&gt;Blosc's philosophy of meta-compression is incredibly powerful - one is able to compose pipelines to optimally compress data (for speed or compression ratio), store information about the pipeline alognside the data in metadata, and then rely on a generic decompressor to read this and reverse the pipeline. The OpenZL team share our belief in the validity of this approach and have designed &lt;a class="reference external" href="https://openzl.org/"&gt;a graph-based formalisation with extensive support for all kinds of compression pipelines&lt;/a&gt; for all kinds of data.&lt;/p&gt;
&lt;p&gt;However, Blosc2 is now much more than just a compression library - it offers comprehensive indexing support (including fancy indexing via the python-blosc2 interface) as well as an increasingly rapid compute engine (see &lt;a class="reference external" href="https://ironarray.io/blog/miniexpr-powered-blosc2"&gt;this blog!&lt;/a&gt;). What if we could marry the incredibly comprehensive compression coverage of OpenZL with Blosc2's extended array manipulation functionality?&lt;/p&gt;
&lt;p&gt;Foreseeing precisely this sort of challenge, prior Blosc2 developers implemented a dynamic plugin register functionality (loading the plugin in C-Blosc2, which can be called via Python-Blosc2). This means that with some unintrusive, relatively concise interface code, one can link Blosc2 and OpenZL at runtime (without substantially modifying either) and offer Blosc2 arrays compressed and decompressed with OpenZL.&lt;/p&gt;
&lt;section id="the-openzl-plugin"&gt;
&lt;h2&gt;The OpenZL plugin&lt;/h2&gt;
&lt;p&gt;The source code for the plugin can be found &lt;a class="reference external" href="https://github.com/Blosc/blosc2-openzl"&gt;here&lt;/a&gt;. The minimal skeleton for the plugin layout follows&lt;/p&gt;
&lt;pre class="literal-block"&gt;├── CMakeLists.txt
├── blosc2_openzl
│   └── __init__.py
├── pyproject.toml
├── requirements-build.txt
└── src
    ├── CMakeLists.txt
    ├── blosc2_openzl.c
    └── blosc2_openzl.h&lt;/pre&gt;
&lt;p&gt;The &lt;code class="docutils literal"&gt;blosc2_openzl.c&lt;/code&gt; must implement an encoder and decoder which are exported via an &lt;code class="docutils literal"&gt;info&lt;/code&gt; struct:&lt;/p&gt;
&lt;pre class="literal-block"&gt;#include "blosc2_openzl.h"

BLOSC2_OPENZL_EXPORT codec_info info = {
    .encoder=(char *)"blosc2_openzl_encoder",
    .decoder=(char *)"blosc2_openzl_decoder"
};

int blosc2_openzl_encoder(const uint8_t* src, uint8_t* dest,
                                  int32_t size, uint8_t meta,
                                  blosc2_cparams *cparams, uint8_t id) {
  // code
}


int blosc2_openzl_decoder(const uint8_t *input, int32_t input_len, uint8_t *output,
                            int32_t output_len, uint8_t meta, blosc2_dparams *dparams,
                            const void *chunk) {
  // code
}&lt;/pre&gt;
&lt;p&gt;The header &lt;code class="docutils literal"&gt;blosc2_openzl.h&lt;/code&gt; then makes the &lt;code class="docutils literal"&gt;info&lt;/code&gt; and &lt;code class="docutils literal"&gt;encoder/decoder&lt;/code&gt; functions available to Blosc2:&lt;/p&gt;
&lt;pre class="literal-block"&gt;#include "blosc2.h"
#include "blosc2/codecs-registry.h"
#include "openzl/openzl.h"

BLOSC2_OPENZL_EXPORT int blosc2_openzl_encoder(...);

BLOSC2_OPENZL_EXPORT int blosc2_openzl_decoder(...);

// Declare the info struct as extern
extern BLOSC2_OPENZL_EXPORT codec_info info;&lt;/pre&gt;
&lt;/section&gt;
&lt;section id="pep-427-and-wheel-structure"&gt;
&lt;h2&gt;PEP 427 and wheel structure&lt;/h2&gt;
&lt;p&gt;In order for the plugin to dynamically link to Blosc2, it has to be able to find the Blosc2 library at runtime. This has historically been quite finicky since different platforms and package managers may store Python packages (and the associated &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;.so/.dylib/.dll&lt;/span&gt;&lt;/code&gt; library objects differently). Consequently, PEP 427 recommends distributing the Python wheels for packages which depend on compiled objects such as Python-Blosc2 in the following way&lt;/p&gt;
&lt;pre class="literal-block"&gt;blosc2
  ├── __init__.py
  ├── lib
  │   ├── libblosc2.so
  │   ├── cmake
  │   └── pkgconfig
  └── include
      └── blosc2.h&lt;/pre&gt;
&lt;p&gt;Finding the necessary &lt;code class="docutils literal"&gt;libblosc2.so&lt;/code&gt; object from the top-level &lt;code class="docutils literal"&gt;CMakeLists.txt&lt;/code&gt; file for the plugin is then as easy as:&lt;/p&gt;
&lt;pre class="literal-block"&gt;# Find blosc2 package location using Python
execute_process(
    COMMAND "${Python_EXECUTABLE}" -c "import blosc2, pathlib; print(pathlib.Path(blosc2.__file__).parent)"
    OUTPUT_VARIABLE BLOSC2_PACKAGE_DIR
)
set(BLOSC2_INCLUDE_DIR "${BLOSC2_PACKAGE_DIR}/include")
set(BLOSC2_LIB_DIR "${BLOSC2_PACKAGE_DIR}/lib")&lt;/pre&gt;
&lt;p&gt;After building the plugin backend in &lt;code class="docutils literal"&gt;src/CMakelists.txt&lt;/code&gt; one simply links the plugin to the backend (in this case &lt;code class="docutils literal"&gt;openzl&lt;/code&gt;) and installs like so:&lt;/p&gt;
&lt;pre class="literal-block"&gt;add_library(blosc2_openzl SHARED blosc2_openzl.c)
target_include_directories(blosc2_openzl PUBLIC ${BLOSC2_INCLUDE_DIR})
target_link_libraries(blosc2_openzl ${OPENZL_TARGET})
# Install
install(TARGETS blosc2_openzl
    RUNTIME DESTINATION blosc2_openzl
    LIBRARY DESTINATION blosc2_openzl
)&lt;/pre&gt;
&lt;p&gt;Note that it is not necessary to link &lt;code class="docutils literal"&gt;blosc2_openzl&lt;/code&gt; and &lt;code class="docutils literal"&gt;blosc2&lt;/code&gt; in &lt;code class="docutils literal"&gt;target_link_libraries&lt;/code&gt; as the former depends only on macros and structs defined in header files - and not functions. This makes the &lt;code class="docutils literal"&gt;libblosc2_openzl.so&lt;/code&gt; object especially light and robust, as blosc2 is not registered as an explicit dependency. In fact on Linux, even if the &lt;code class="docutils literal"&gt;blosc2_openzl.c&lt;/code&gt; were to include blosc2 functions, it is still not necessary to perform such linking!&lt;/p&gt;
&lt;p&gt;Following PEP 427 allows one to add an additional safeguard to check if the plugin fails to find blosc2 by adding the RUNTIME_PATH property to the installed object&lt;/p&gt;
&lt;pre class="literal-block"&gt;set_target_properties(blosc2_openzl PROPERTIES
    INSTALL_RPATH "$ORIGIN/../blosc2/lib"
)&lt;/pre&gt;
&lt;p&gt;It also allows one to easily find the plugin &lt;code class="docutils literal"&gt;.so&lt;/code&gt; object when calling via python - in the &lt;code class="docutils literal"&gt;blosc2_openzl/__init__.py&lt;/code&gt; file one can find the library path as easily as &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;os.path.abspath(Path(__file__).parent&lt;/span&gt; / libname)&lt;/code&gt; where &lt;code class="docutils literal"&gt;libname&lt;/code&gt; is the desired &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;.so/.dylib/.dll&lt;/span&gt;&lt;/code&gt; object (depending on platform). All these benefits have led us to update the wheel structure for &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;python-blosc2&lt;/span&gt;&lt;/code&gt; in the latest 4.0 release.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="using-openzl-from-python"&gt;
&lt;h2&gt;Using OpenZL from Python&lt;/h2&gt;
&lt;p&gt;Installing is then as simple as:&lt;/p&gt;
&lt;pre class="literal-block"&gt;pip install blosc2_openzl&lt;/pre&gt;
&lt;p&gt;One can also download the project and use the &lt;code class="docutils literal"&gt;cmake&lt;/code&gt; and &lt;code class="docutils literal"&gt;cmake &lt;span class="pre"&gt;--build&lt;/span&gt;&lt;/code&gt; commands to compile C-level tests or examples. But let's get compressing with &lt;code class="docutils literal"&gt;python&lt;/code&gt; straight away:&lt;/p&gt;
&lt;pre class="literal-block"&gt;import blosc2
import numpy as np
import blosc2_openzl
from blosc2_openzl import OpenZLProfile as OZLP
prof = OZLP.OZLPROF_SH_BD_LZ4
# Define the compression parameters for Blosc2
cparams = {'codec': blosc2.Codec.OPENZL, 'codec_meta': prof.value}

# Create (uncompressed) array
np_array = np.arange(1000).reshape((10,100))

# Compression with the OpenZL codec
bl_array = blosc2.asarray(np_array, cparams=cparams)
print(bl_array.cratio) # print compression ratio
&amp;gt;&amp;gt; 25.078369905956112&lt;/pre&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;The &lt;code class="docutils literal"&gt;OpenZLProfile&lt;/code&gt; enum contains the available profile pipelines that have been implemented in the plugin, which use the &lt;code class="docutils literal"&gt;codec_meta&lt;/code&gt; field (an 8-bit integer) to specify the desired transformation via codecs, filters and other nodes for the compression graph. Starting from the Least-Significant-Bit (LSB), setting the bits tells OpenZL how to build the graph:&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;CODEC | SHUFFLE | DELTA | SPLIT | CRC | x | x | x |&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;CODEC - If set, use LZ4. Else ZSTD.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SHUFFLE - If set, use shuffle (outputs a stream for every byte of input data typesize)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DELTA - If set, apply a bytedelta (to all streams if necessary)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SPLIT - If set, do not recombine the byte streams&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CRC - If set, store a checksum during compression and check it during decompression&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;The remaining bits may be used in the future.&lt;/p&gt;
&lt;p&gt;In the future it would be great to further expand the OpenZL functionalities that we can offer via the plugin, such as bespoke transformers trained via machine learning techniques - see the OpenZL page for a flavour of what can be done with the (still evolving) library.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusions"&gt;
&lt;h2&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;C-Blosc2's ability to support dynamically loaded plugins allows the library to grow in features without increasing the size and complexity of the library itself. For more information about user-defined plugins, refer to this &lt;a class="reference external" href="https://www.blosc.org/posts/registering-plugins/"&gt;blog entry&lt;/a&gt;. We have put this to work to offer linkage with the rather complex OpenZL library with a relatively rapid turnaround from design to prototype to full release in around a month. This is thanks to prior hard work by open source contributors from Blosc but naturally also OpenZL - many thanks to all!&lt;/p&gt;
&lt;p&gt;If you find our work useful and valuable, we would be grateful if you could support us by &lt;a class="reference external" href="https://www.blosc.org/pages/donate/"&gt;making a donation&lt;/a&gt;. Your contribution will help us continue to develop and improve Blosc packages, making them more accessible and useful for everyone.  Our team is committed to creating high-quality and efficient software, and your support will help us to achieve this goal.&lt;/p&gt;
&lt;/section&gt;</description><category>blosc plugins codecs openzl</category><guid>https://blosc.org/posts/openzl-plugin/</guid><pubDate>Fri, 30 Jan 2026 10:32:20 GMT</pubDate></item><item><title>The Surprising Speed of Compressed Data: A Roofline Story</title><link>https://blosc.org/posts/roofline-analysis-blosc2/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;Can a library designed for computing with compressed data ever hope to outperform highly optimized numerical engines like NumPy and Numexpr? The answer is complex, and it hinges on the "memory wall" — a phenomenon which occurs when system memory limitations start to drag on CPU. This post uses Roofline analysis to explore this very question, dissecting the performance of Blosc2 and revealing the surprising scenarios where it can gain a competitive edge.&lt;/p&gt;
&lt;aside class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update on 2026-02-06:&lt;/strong&gt; We have published a follow-up post, &lt;a class="reference external" href="https://ironarray.io/blog/miniexpr-powered-blosc2"&gt;Python-Blosc2 4.0: Unleashing Compute Speed with miniexpr&lt;/a&gt;, which revisits this topic. This new post explains how the integration of miniexpr into Blosc2's compute engine has significantly improved performance—especially for in-memory operations—updating the conclusions drawn in this original analysis. We highly recommend reading the new post for the latest insights.&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="tl-dr"&gt;
&lt;h2&gt;TL;DR&lt;/h2&gt;
&lt;p&gt;Before we dive in, here's what we discovered:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;For in-memory tasks, Blosc2's overhead can make it slower than Numexpr, especially on x86 CPUs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This changes on Apple Silicon, where Blosc2's performance is much more competitive.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For on-disk tasks, Blosc2 consistently outperforms NumPy/Numexpr on both platforms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The "memory wall" is real, and disk I/O is an even bigger one, which is where compression shines.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="a-trip-down-memory-lane"&gt;
&lt;h2&gt;A Trip Down Memory Lane&lt;/h2&gt;
&lt;p&gt;Let's rewind to 2008. NumPy 1.0 was just a toddler, and the computing world was buzzing with the arrival of multi-core CPUs and their shiny new SIMD instructions. On the &lt;a class="reference external" href="https://mail.python.org/archives/list/numpy-discussion@python.org/thread/YPX5PGM5WZXQAMQ5AZLLEU67D5RZBOVH/#YFX3G2RYHTIYMFDPCHKHED5F7CT4OTVK"&gt;NumPy mailing list&lt;/a&gt;, a group of us were brainstorming how to harness this new power to make Python's number-crunching faster.&lt;/p&gt;
&lt;p&gt;The idea seemed simple: trust newer compilers to use SIMD (and, possibly, data alignment) to perform operations on multiple data points at once. To test this, a &lt;a class="reference external" href="https://mail.python.org/archives/list/numpy-discussion@python.org/message/S2IEJV7U7TXHQLEMORGME6KIGRZTG33L/"&gt;simple benchmark&lt;/a&gt; was shared: multiply two large vectors element-wise. Developers from around the community ran the code and shared their results. What came back was a revelation.&lt;/p&gt;
&lt;p&gt;For small arrays that fit snugly into the CPU's high-speed cache, SIMD was quite good at accelerating computations. But as soon as the arrays grew larger, the performance boost vanished. Some of us were already suspicious about the new "memory wall" that had been growing lately, seemingly due to the widening gap between CPU speeds and memory bandwidth.  However, a conclusive answer (and solution) was still lacking.&lt;/p&gt;
&lt;p&gt;But amidst the confusion, a curious anomaly emerged. One machine, belonging to NumPy legend Charles Harris, was consistently outperforming the rest—even those with faster processors. It made no sense. We checked our code, our compilers, everything. Yet, his machine remained inexplicably faster. The answer, when it finally came, wasn't in the software at all. Charles, a hardware wizard, had &lt;a class="reference external" href="https://mail.python.org/archives/list/numpy-discussion@python.org/message/YFX3G2RYHTIYMFDPCHKHED5F7CT4OTVK/"&gt;tinkered with his BIOS to overclock his RAM&lt;/a&gt; from 667 MHz to a whopping 800 MHz.&lt;/p&gt;
&lt;p&gt;That was my lightbulb moment: for data-intensive tasks, raw CPU clock speed was not the limiting factor; memory bandwidth was what truly mattered.&lt;/p&gt;
&lt;p&gt;This led me to a wild idea: what if we could make memory &lt;em&gt;effectively&lt;/em&gt; faster? What if we could compress data in memory and decompress it on-the-fly, just in time for the CPU? This would &lt;a class="reference external" href="https://www.blosc.org/docs/StarvingCPUs-CISE-2010.pdf"&gt;slash the amount of data being moved&lt;/a&gt;, boosting our effective memory bandwidth. That idea became the seed for &lt;a class="reference external" href="https://www.blosc.org"&gt;Blosc&lt;/a&gt;, a project I started in 2010 that has been &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2"&gt;my passion ever since&lt;/a&gt;. Now, 15 years later, it is time to revisit that idea and see how well it holds up in today's computing landscape.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="roofline-model-understanding-the-memory-wall"&gt;
&lt;h2&gt;Roofline Model: Understanding the Memory Wall&lt;/h2&gt;
&lt;p&gt;Not all computations are equally affected by the memory wall - in general performance can be either CPU-bound or memory-bound. To diagnose which resource is the limiting factor, the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Roofline_model"&gt;Roofline model&lt;/a&gt; provides an insightful analytical framework. This model &lt;a class="reference external" href="https://docs.nersc.gov/tools/performance/roofline/"&gt;plots computational performance against arithmetic intensity&lt;/a&gt; (i.e. floating-point operations per second versus memory accesses per second) to visually determine whether a task is constrained by CPU speed or memory bandwidth.&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-intro.avif" src="https://blosc.org/images/roofline-surprising-story/roofline-intro.avif"&gt;
&lt;p&gt;We will use Roofline plots to analyze Blosc2's performance, compared to that of NumPy and Numexpr. NumPy, with its highly optimized linear algebra backends, and Numexpr, with its efficient evaluation of element-wise expressions, together form a strong performance baseline for the full range of arithmetic intensities tested.&lt;/p&gt;
&lt;p&gt;To highlight the role of memory bandwidth, we will conduct our benchmarks on an AMD Ryzen 7800X3D CPU at two different memory speeds: the standard 4800 MTS and an overclocked 6000 MTS. This allows us to directly observe how memory frequency impacts computational performance.&lt;/p&gt;
&lt;p&gt;To cover a range of computational scenarios, our benchmarks include five operations with varying arithmetic intensities:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Very Low&lt;/strong&gt;: A simple element-wise addition (a + b + c).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Low&lt;/strong&gt;: A moderately complex element-wise expression (sqrt(a + 2 * b + (c / 2)) ^ 1.2).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Medium&lt;/strong&gt;: A highly complex element-wise calculation involving trigonometric and exponential functions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High&lt;/strong&gt;: Matrix multiplication on small matrices (labeled matmul0).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Very High&lt;/strong&gt;: Matrix multiplication on large matrices (labeled matmul1 and matmul2).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-mem-speed-AMD-7800X3D.png" src="https://blosc.org/images/roofline-surprising-story/roofline-mem-speed-AMD-7800X3D.png"&gt;
&lt;p&gt;The Roofline plot confirms that increasing memory speed only benefits memory-bound operations (low arithmetic intensity), while CPU-bound tasks (high arithmetic intensity) are unaffected, as expected. Although this might suggest the "memory wall" is not a major obstacle, low-intensity operations like element-wise calculations, reductions, and selections are extremely common and often create performance bottlenecks. Therefore, optimizing for memory performance remains crucial.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="the-in-memory-surprise-why-wasn-t-compression-faster"&gt;
&lt;h2&gt;The In-Memory Surprise: Why Wasn't Compression Faster?&lt;/h2&gt;
&lt;p&gt;We benchmarked Blosc2 (both compressed and uncompressed) against NumPy and Numexpr. For this test, Blosc2 was configured with the LZ4 codec and shuffle filter, a setup known for its balance of speed and compression ratio.  The benchmarks were executed on an AMD Ryzen 7800X3D CPU with memory speed set to 6000 MTS, ensuring optimal memory bandwidth for the tests.&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-7800X3D-mem-def.png" src="https://blosc.org/images/roofline-surprising-story/roofline-7800X3D-mem-def.png"&gt;
&lt;p&gt;The analysis reveals a surprising outcome: for memory-bound operations, Blosc2 is up to five times slower than Numexpr. Although operating on compressed data provides a marginal improvement over uncompressed Blosc2, it is not enough to overcome this performance gap. This result is unexpected because Blosc2 leverages Numexpr internally, and the reduced memory bandwidth from compression should theoretically lead to better performance in these scenarios.&lt;/p&gt;
&lt;p&gt;To understand this counter-intuitive result, we must examine Blosc2's core architecture. The key lies in its double partitioning scheme, which, while powerful, introduces an overhead that can negate the benefits of compression in memory-bound contexts.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="unpacking-the-overhead-a-look-inside-blosc2-s-architecture"&gt;
&lt;h2&gt;Unpacking the Overhead: A Look Inside Blosc2's Architecture&lt;/h2&gt;
&lt;p&gt;The performance characteristics of Blosc2 are rooted in its double partitioning architecture, which organizes data into chunks and blocks.&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/double-partition-b2nd.avif" src="https://blosc.org/images/roofline-surprising-story/double-partition-b2nd.avif"&gt;
&lt;p&gt;This design is crucial for both aligning with the CPU's memory hierarchy and enabling efficient multidimensional array representation (important for things like e.g. n-dimensional slicing). However, this structure introduces an inherent overhead from additional indexing logic. In memory-bound scenarios, this latency counteracts the performance gains from reduced memory traffic, explaining why Blosc2 does not surpass Numexpr.&lt;/p&gt;
&lt;p&gt;Conversely, as arithmetic intensity increases, the computational demands begin to dominate the total execution time. In these CPU-bound regimes, the partitioning overhead is effectively amortized, allowing Blosc2 to close the performance gap and eventually match NumPy's performance in tasks like large matrix multiplications.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="modern-arm-architectures"&gt;
&lt;h2&gt;Modern ARM Architectures&lt;/h2&gt;
&lt;p&gt;CPU architecture is a rapidly evolving field. To investigate how these changes impact performance, we extended our analysis to the Apple Silicon M4 Pro, a modern ARM-based processor.&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-m4pro-mem-def.png" src="https://blosc.org/images/roofline-surprising-story/roofline-m4pro-mem-def.png"&gt;
&lt;p&gt;The results show that Blosc2 performs significantly better on this platform, narrowing the performance gap with NumPy/NumExpr, especially for operations on compressed data. While compute engines optimized for uncompressed data still hold an edge, these findings suggest that compression will play an increasingly important role in improving computational performance in the future.&lt;/p&gt;
&lt;p&gt;However, while the in-memory results are revealing, they don't tell the whole story. Blosc2 was designed not just to fight the memory wall, but to conquer an even greater bottleneck: disk I/O. Although compression has the benefit of fitting more data into RAM when used in-memory (which is per se extremely interesting in these times, where &lt;a class="reference external" href="https://arstechnica.com/gadgets/2025/11/spiking-memory-prices-mean-that-it-is-once-again-a-horrible-time-to-build-a-pc/"&gt;RAM prices skyrocketed&lt;/a&gt;), its true power is unleashed when computations move off-motherboard. Now, let's shift the battlefield to the disk and see how Blosc2 performs in its native territory.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="a-different-battlefield-blosc2-shines-with-on-disk-data"&gt;
&lt;h2&gt;A Different Battlefield: Blosc2 Shines with On-Disk Data&lt;/h2&gt;
&lt;p&gt;Blosc2's architecture extends its computational engine to operate seamlessly on data stored on disk, a significant advantage for large-scale analysis.  This is particularly relevant in scenarios where datasets exceed available memory, necessitating out-of-core processing, as commonly encountered in data science, machine learning workflows or &lt;a class="reference external" href="https://ironarray.io/cat2cloud"&gt;cloud computing environments&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Our on-disk benchmarks were designed to use datasets larger than the system's available memory to prevent filesystem caching from influencing the results. To establish a baseline, we implemented an out-of-core solution for NumPy/NumExpr, leveraging memory-mapped files. Here Blosc2 has a performance edge, particularly for memory-bound operations on compressed data, being able to send and receive data faster to disk than the memory-mapped NumPy arrays.&lt;/p&gt;
&lt;p&gt;In this case, we've used high-performance NVMe SSDs (NVMe 4.0) to minimize the impact of disk speed on the results.  We also switched to the ZSTD codec for Blosc2, as its superior compression ratio over LZ4 further minimizes data transfer to and from the disk.&lt;/p&gt;
&lt;p&gt;First, let's see the results for the AMD Ryzen 7800X3D system:&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-7800X3D-disk-def.png" src="https://blosc.org/images/roofline-surprising-story/roofline-7800X3D-disk-def.png"&gt;
&lt;p&gt;The plots above show that Blosc2 outperforms both NumPy and Numexpr for all low-to-medium intensity operations. This is because the high latency of disk I/O amortizes the overhead of Blosc2's double partitioning scheme. Furthermore, the reduced bandwidth required for compressed data gives Blosc2 an additional performance advantage in this scenario.&lt;/p&gt;
&lt;p&gt;Now, let's see the results for the Apple Silicon M4 Pro system:&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-m4pro-disk-def.png" src="https://blosc.org/images/roofline-surprising-story/roofline-m4pro-disk-def.png"&gt;
&lt;p&gt;On the Apple Silicon M4 Pro system, Blosc2 again outperforms both NumPy and Numexpr for all on-disk operations, mirroring the results from the AMD system. However, the performance advantage is even more significant here, especially for memory-bound tasks. This is mainly because memory-mapped arrays are less efficient on Apple Silicon than on x86_64 systems, increasing the overhead for the NumPy/Numexpr baseline.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="roofline-plot-in-memory-vs-on-disk"&gt;
&lt;h2&gt;Roofline Plot: In-Memory vs On-Disk&lt;/h2&gt;
&lt;p&gt;To better understand the trade-offs between in-memory and on-disk processing with Blosc2, the following plot contrasts their performance characteristics for compressed data:&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-mem-disk-def.png" src="https://blosc.org/images/roofline-surprising-story/roofline-mem-disk-def.png"&gt;
&lt;p&gt;A notable finding for the AMD system is that Blosc2's on-disk operations are noticeably faster than its in-memory operations, especially for memory-bound tasks (low arithmetic intensity). This is likely due to two factors: first, the larger datasets used for on-disk tests allow Blosc2 to use more efficient internal partitions (chunks and blocks), and second, parallel data reads from disk further reduce bandwidth requirements.&lt;/p&gt;
&lt;p&gt;In contrast, for CPU-bound tasks (high arithmetic intensity), on-disk performance is comparable to, albeit slightly slower than, in-memory performance. The analysis also reveals a specific weakness: small matrix multiplications (matmul0) are significantly slower on-disk, identifying a clear target for future optimization.&lt;/p&gt;
&lt;p&gt;In contrast to the AMD system, the Apple Silicon M4 Pro shows that Blosc2's on-disk operations are slower than in-memory, a difference that is most significant for memory-bound tasks. This performance disparity suggests that current on-disk optimizations may favor x86_64 architectures over ARM.&lt;/p&gt;
&lt;p&gt;As with the AMD platform, CPU-bound operations exhibit similar performance for both on-disk and in-memory contexts. The notable exception remains the small matrix multiplication (matmul0), which performs significantly worse on-disk. This recurring pattern pinpoints a clear opportunity for future optimization efforts.&lt;/p&gt;
&lt;p&gt;Finally, and in addition to its on-disk performance, Blosc2 offers a significant cost advantage. With the &lt;a class="reference external" href="https://arstechnica.com/gadgets/2025/11/spiking-memory-prices-mean-that-it-is-once-again-a-horrible-time-to-build-a-pc/"&gt;recent rise in SSD prices&lt;/a&gt;, compressing data on disk becomes an economically attractive strategy, allowing you to store more data in less space and thereby reduce hardware expenses.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="reproducibility"&gt;
&lt;h2&gt;Reproducibility&lt;/h2&gt;
&lt;p&gt;All the &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/roofline-analysis.py"&gt;benchmarks&lt;/a&gt; and &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/roofline-plot.py"&gt;plots&lt;/a&gt; presented in this blog post can be reproduced. You are invited to run the scripts on your own hardware to explore the performance characteristics of Blosc2 in different environments. In case you get interesting results, please consider sharing them with the community!&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusions"&gt;
&lt;h2&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;In this blog post, we explored the Roofline model to analyze the performance of Blosc2, NumPy, and Numexpr. We've confirmed that memory-bound operations are significantly affected by the "memory wall", making data compression of interest when maximizing performance. However, for in-memory operations, the overhead of Blosc2's double partitioning scheme can be a limiting factor, especially on x86_64 architectures. Encouragingly, this performance gap narrows considerably on modern ARM platforms like Apple Silicon, suggesting a promising future.&lt;/p&gt;
&lt;p&gt;The situation changes dramatically for on-disk operations. Here, Blosc2 consistently outperforms NumPy and Numexpr, as the high latency of disk I/O (even if we used SSDs here) amortizes its internal overhead. This makes Blosc2 a compelling choice for out-of-core computations, one of its primary use cases.&lt;/p&gt;
&lt;p&gt;Overall, this analysis has provided valuable insights, highlighting the importance of the memory hierarchy. It has also exposed specific areas for improvement, such as the performance of small matrix multiplications. As Blosc2 continues to evolve, I am confident we can address these points and further enhance its performance, making it an even more powerful tool for numerical computations in Python.&lt;/p&gt;
&lt;hr class="docutils"&gt;
&lt;p&gt;Read more about &lt;a class="reference external" href="https://ironarray.io"&gt;ironArray SLU&lt;/a&gt; — the company behind Blosc2, Caterva2, Numexpr and other high-performance data processing libraries.&lt;/p&gt;
&lt;p&gt;Compress Better, Compute Bigger!&lt;/p&gt;
&lt;/section&gt;</description><category>Blosc2</category><category>memory wall</category><category>numexpr</category><category>numpy</category><category>performance</category><category>roofline</category><guid>https://blosc.org/posts/roofline-analysis-blosc2/</guid><pubDate>Thu, 27 Nov 2025 08:05:21 GMT</pubDate></item><item><title>Blosc2: A Universal Lazy Engine for Array Operations</title><link>https://blosc.org/posts/tensordot-pure-persistent/</link><dc:creator>Francesc Alted, Luke Shaw</dc:creator><description>&lt;p&gt;While compression is often seen merely as a way to save storage, the Blosc development team has long viewed it as a foundational element for high-performance computing. This philosophy is at the heart of Blosc2, which is not just a compression library but a powerful framework for handling large datasets. This post will highlight one of Python-Blosc2's most exciting capabilities: its lazy evaluation engine for array operations.&lt;/p&gt;
&lt;p&gt;Libraries optimised for computation on large datasets that don't fit in memory - such as Dask or Spark - often use lazy evaluation of computation expressions. This typically speeds up evaluation since one can build the full chain of computations and only execute them when the final result is needed. Consequently, Python-Blosc2's compute engine also uses the lazy imperative paradigm, which proves to be both &lt;a class="reference external" href="https://ironarray.io/blog/compute-bigger"&gt;powerful and efficient&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;An additional benefit of the engine is its ability to act as a universal backend. Python-Blosc2 has a native &lt;code class="docutils literal"&gt;blosc2.NDArray&lt;/code&gt; format, but it can also easily execute lazy operations on arrays from other popular libraries like NumPy, HDF5, Zarr, Xarray or TileDB - basically any array object which complies with a minimal protocol.&lt;/p&gt;
&lt;p&gt;In the recent &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/releases"&gt;Python-Blosc2 3.10.x series&lt;/a&gt;, we added support for lazy evaluation of eager functions, expanding the capabilities of the compute engine, and making interaction with other formats easier. Let's explore how this works using an out-of-core &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/reference/linalg.html#blosc2.linalg.tensordot"&gt;tensordot&lt;/a&gt; operation as an example.&lt;/p&gt;
&lt;section id="from-eager-to-lazy-with-blosc2-lazyexpr"&gt;
&lt;h2&gt;From Eager to Lazy with &lt;code class="docutils literal"&gt;blosc2.lazyexpr&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Functions which return a result with a different shape to the input operands - such as reductions or linear algebra operations - must be evaluated eagerly (computed and the result returned immediately). For example, &lt;code class="docutils literal"&gt;blosc2.tensordot()&lt;/code&gt; executes eagerly.&lt;/p&gt;
&lt;p&gt;Nevertheless, we can defer this computation, by wrapping the call in a string and passing it to &lt;code class="docutils literal"&gt;blosc2.lazyexpr&lt;/code&gt;. This creates a &lt;code class="docutils literal"&gt;LazyExpr&lt;/code&gt; object that represents the operation without executing it.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_e8aec2e5ad384155a19b11cabe84224a-1" name="rest_code_e8aec2e5ad384155a19b11cabe84224a-1" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_e8aec2e5ad384155a19b11cabe84224a-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Assume a and b are large, on-disk blosc2 arrays&lt;/span&gt;
&lt;a id="rest_code_e8aec2e5ad384155a19b11cabe84224a-2" name="rest_code_e8aec2e5ad384155a19b11cabe84224a-2" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_e8aec2e5ad384155a19b11cabe84224a-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;axis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_e8aec2e5ad384155a19b11cabe84224a-3" name="rest_code_e8aec2e5ad384155a19b11cabe84224a-3" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_e8aec2e5ad384155a19b11cabe84224a-3"&gt;&lt;/a&gt;
&lt;a id="rest_code_e8aec2e5ad384155a19b11cabe84224a-4" name="rest_code_e8aec2e5ad384155a19b11cabe84224a-4" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_e8aec2e5ad384155a19b11cabe84224a-4"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Create a lazy expression object&lt;/span&gt;
&lt;a id="rest_code_e8aec2e5ad384155a19b11cabe84224a-5" name="rest_code_e8aec2e5ad384155a19b11cabe84224a-5" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_e8aec2e5ad384155a19b11cabe84224a-5"&gt;&lt;/a&gt;&lt;span class="n"&gt;lexpr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lazyexpr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"tensordot(a, b, axes=(axis, axis))"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_e8aec2e5ad384155a19b11cabe84224a-6" name="rest_code_e8aec2e5ad384155a19b11cabe84224a-6" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_e8aec2e5ad384155a19b11cabe84224a-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_e8aec2e5ad384155a19b11cabe84224a-7" name="rest_code_e8aec2e5ad384155a19b11cabe84224a-7" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_e8aec2e5ad384155a19b11cabe84224a-7"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# The computation has not run yet.&lt;/span&gt;
&lt;a id="rest_code_e8aec2e5ad384155a19b11cabe84224a-8" name="rest_code_e8aec2e5ad384155a19b11cabe84224a-8" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_e8aec2e5ad384155a19b11cabe84224a-8"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# To execute it and save the result to a new persistent array:&lt;/span&gt;
&lt;a id="rest_code_e8aec2e5ad384155a19b11cabe84224a-9" name="rest_code_e8aec2e5ad384155a19b11cabe84224a-9" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_e8aec2e5ad384155a19b11cabe84224a-9"&gt;&lt;/a&gt;&lt;span class="n"&gt;out_blosc2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lexpr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"out.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is useful, and highly efficient both in terms of computation time and memory usage, as we'll see later. But the real magic happens when we use this computation engine with other array formats.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="one-engine-many-backends"&gt;
&lt;h2&gt;One Engine, Many Backends&lt;/h2&gt;
&lt;p&gt;The &lt;code class="docutils literal"&gt;blosc2.evaluate()&lt;/code&gt; function takes the same string expression but can operate on any array-like objects that follow the &lt;code class="docutils literal"&gt;blosc2.Array&lt;/code&gt; protocol. This protocol simply requires the object to have &lt;code class="docutils literal"&gt;shape&lt;/code&gt;, &lt;code class="docutils literal"&gt;dtype&lt;/code&gt;, &lt;code class="docutils literal"&gt;__getitem__&lt;/code&gt;, and &lt;code class="docutils literal"&gt;__setitem__&lt;/code&gt; attributes, which are standard in &lt;code class="docutils literal"&gt;h5py&lt;/code&gt;, &lt;code class="docutils literal"&gt;zarr&lt;/code&gt;, &lt;code class="docutils literal"&gt;tiledb&lt;/code&gt;, &lt;code class="docutils literal"&gt;xarray&lt;/code&gt; and &lt;code class="docutils literal"&gt;numpy&lt;/code&gt; arrays.&lt;/p&gt;
&lt;p&gt;This means you can use Blosc2's efficient evaluation engine to perform out-of-core computations directly on your existing (HDF5, Zarr, etc.) datasets.&lt;/p&gt;
&lt;section id="example-with-hdf5"&gt;
&lt;h3&gt;Example with HDF5&lt;/h3&gt;
&lt;p&gt;Here, we instruct &lt;code class="docutils literal"&gt;blosc2.evaluate&lt;/code&gt; to run the &lt;code class="docutils literal"&gt;tensordot&lt;/code&gt; operation on two &lt;code class="docutils literal"&gt;h5py&lt;/code&gt; datasets and store the result in a third one.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-1" name="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-1" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_9725b4c2abdb467c8396e9df47f0c4ab-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Open HDF5 datasets&lt;/span&gt;
&lt;a id="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-2" name="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-2" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_9725b4c2abdb467c8396e9df47f0c4ab-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;h5py&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"a_b_out.h5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"a"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-3" name="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-3" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_9725b4c2abdb467c8396e9df47f0c4ab-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"a"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-4" name="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-4" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_9725b4c2abdb467c8396e9df47f0c4ab-4"&gt;&lt;/a&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"b"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-5" name="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-5" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_9725b4c2abdb467c8396e9df47f0c4ab-5"&gt;&lt;/a&gt;&lt;span class="n"&gt;out_hdf5&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"out"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-6" name="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-6" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_9725b4c2abdb467c8396e9df47f0c4ab-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-7" name="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-7" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_9725b4c2abdb467c8396e9df47f0c4ab-7"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Use blosc2.evaluate() with HDF5 arrays&lt;/span&gt;
&lt;a id="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-8" name="rest_code_9725b4c2abdb467c8396e9df47f0c4ab-8" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_9725b4c2abdb467c8396e9df47f0c4ab-8"&gt;&lt;/a&gt;&lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"tensordot(a, b, axes=(axis, axis))"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;out_hdf5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice that the expression string is identical to the one we used before. &lt;code class="docutils literal"&gt;blosc2&lt;/code&gt; inspects the objects in the expression's namespace and computes with them, regardless of their underlying format.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="example-with-zarr"&gt;
&lt;h3&gt;Example with Zarr&lt;/h3&gt;
&lt;p&gt;The same principle applies to Zarr arrays.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_57a303deff314380b2b76a8b748fca29-1" name="rest_code_57a303deff314380b2b76a8b748fca29-1" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_57a303deff314380b2b76a8b748fca29-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Open Zarr arrays&lt;/span&gt;
&lt;a id="rest_code_57a303deff314380b2b76a8b748fca29-2" name="rest_code_57a303deff314380b2b76a8b748fca29-2" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_57a303deff314380b2b76a8b748fca29-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;zarr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"a.zarr"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_57a303deff314380b2b76a8b748fca29-3" name="rest_code_57a303deff314380b2b76a8b748fca29-3" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_57a303deff314380b2b76a8b748fca29-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;zarr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"b.zarr"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_57a303deff314380b2b76a8b748fca29-4" name="rest_code_57a303deff314380b2b76a8b748fca29-4" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_57a303deff314380b2b76a8b748fca29-4"&gt;&lt;/a&gt;&lt;span class="n"&gt;zout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;zarr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"out.zarr"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_57a303deff314380b2b76a8b748fca29-5" name="rest_code_57a303deff314380b2b76a8b748fca29-5" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_57a303deff314380b2b76a8b748fca29-5"&gt;&lt;/a&gt;
&lt;a id="rest_code_57a303deff314380b2b76a8b748fca29-6" name="rest_code_57a303deff314380b2b76a8b748fca29-6" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_57a303deff314380b2b76a8b748fca29-6"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Use blosc2.evaluate() with Zarr arrays&lt;/span&gt;
&lt;a id="rest_code_57a303deff314380b2b76a8b748fca29-7" name="rest_code_57a303deff314380b2b76a8b748fca29-7" href="https://blosc.org/posts/tensordot-pure-persistent/#rest_code_57a303deff314380b2b76a8b748fca29-7"&gt;&lt;/a&gt;&lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"tensordot(a, b, axes=(axis, axis))"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;zout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This makes &lt;code class="docutils literal"&gt;blosc2.evaluate&lt;/code&gt; a powerful, backend-agnostic tool for out-of-core array computations.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="performance-comparison"&gt;
&lt;h2&gt;Performance Comparison&lt;/h2&gt;
&lt;p&gt;As well as offering smooth integration, &lt;code class="docutils literal"&gt;blosc2.evaluate&lt;/code&gt; is highly performant. Python-Blosc2 uses a lazy evaluation engine that integrates tightly with the Blosc2 format. This means that the computation is performed on-the-fly, without any intermediate copies. This is a huge advantage for large datasets, as it allows us to perform computations on arrays that don't fit in memory.  In addition, it actively tries to leverage the hierarchical memory layout in modern CPUs, so that it can use both private and shared caches in the best way possible.&lt;/p&gt;
&lt;p&gt;We ran a &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/tensordot_pure_persistent.ipynb"&gt;benchmark&lt;/a&gt; performing a &lt;code class="docutils literal"&gt;tensordot&lt;/code&gt; operation (run over three different axis combinations) on two 3D arrays stored on disk; we then write the output to disk as well.
We consider four approaches:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Blosc2 Native&lt;/strong&gt;: Using &lt;code class="docutils literal"&gt;blosc2.lazyexpr&lt;/code&gt; with &lt;code class="docutils literal"&gt;blosc2.NDArray&lt;/code&gt; containers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Blosc2+HDF5&lt;/strong&gt;: Using &lt;code class="docutils literal"&gt;blosc2.evaluate&lt;/code&gt; with HDF5 for storage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Blosc2+Zarr&lt;/strong&gt;: Using &lt;code class="docutils literal"&gt;blosc2.evaluate&lt;/code&gt; with Zarr for storage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dask+HDF5&lt;/strong&gt;: The combination of Dask for computation and HDF5 for storage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dask+Zarr&lt;/strong&gt;: The combination of Dask for computation and Zarr for storage.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For each approach we plot the memory consumption vs. time for arrays of increasing size.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Results on two (600, 600, 600) float64 arrays (3 GB working set):&lt;/strong&gt;&lt;/p&gt;
&lt;img alt="/images/tensordot_pure_persistent/tensordot-600c-amd.png" src="https://blosc.org/images/tensordot_pure_persistent/tensordot-600c-amd.png" style="width: 100%;"&gt;
&lt;p&gt;&lt;strong&gt;Results on two (1200, 1200, 1200) float64 arrays (26 GB working set):&lt;/strong&gt;&lt;/p&gt;
&lt;img alt="/images/tensordot_pure_persistent/tensordot-1200c-amd.png" src="https://blosc.org/images/tensordot_pure_persistent/tensordot-1200c-amd.png" style="width: 100%;"&gt;
&lt;p&gt;&lt;strong&gt;Results on two (1500, 1500, 1500) float64 arrays (50 GB working set):&lt;/strong&gt;&lt;/p&gt;
&lt;img alt="/images/tensordot_pure_persistent/tensordot-1500c-amd.png" src="https://blosc.org/images/tensordot_pure_persistent/tensordot-1500c-amd.png" style="width: 100%;"&gt;
&lt;p&gt;As can be seen, the amount of memory required by the different approaches is very different, although none requires more than a small fraction of the total working set (which is 3, 26 and 50 GB, respectively). This is because all approaches are out-of-core, and only load small chunks of data into memory at any given time.&lt;/p&gt;
&lt;p&gt;The benchmarks were executed on an AMD Ryzen 9800X3D CPU, with 16 logical cores and 64GB of RAM, using Ubuntu Linux 25.04. We have used the following versions of the libraries: python-blosc2 3.10.1, h5py 3.14.0, zarr 3.1.3, 2025.9.1, and numpy 2.3.3.  All backends are using Blosc or Blosc2 as the compression backend, with same codecs and filters, and using the same number of threads for compression and decompression.&lt;/p&gt;
&lt;section id="analysis"&gt;
&lt;h3&gt;Analysis&lt;/h3&gt;
&lt;p&gt;The results are revealing:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Blosc2 native is fastest&lt;/strong&gt;: The tight integration between the Blosc2 compute engine and its native array format yields the best performance, making it the fastest solution by a significant margin.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rapid computation time&lt;/strong&gt;: &lt;code class="docutils literal"&gt;blosc2.evaluate&lt;/code&gt; delivers impressive speed when operating directly on HDF5 and Zarr files, outperforming the more complex Dask+HDF5 and Dask+Zarr stack. This is great news for anyone with existing HDF5/Zarr datasets.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Low memory usage&lt;/strong&gt;: While the memory consumption for the Blosc2+HDF5 combination is a bit high (we are still analyzing why), the memory usage for the Blosc2 native approach is pretty low, making it suitable for systems with limited RAM and/or operands not fitting in memory.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not to say that Dask (or Spark) is an inferior choice for out-of-core computations. It's a great tool for large-scale data processing, especially when using clusters, is very flexible, and offers a wide range of functions; it's certainly a first-class citizen in the PyData ecosystem. However, if your needs are more modest and you want a simple, efficient way to run computations on existing datasets, using a core of common functions, and leveraging the full capabilities of modern multi-core systems, all without the overhead of a full Dask setup, &lt;code class="docutils literal"&gt;blosc2.evaluate()&lt;/code&gt; is a fantastic alternative.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Python-Blosc2 is more than just a compression library for storing data in &lt;code class="docutils literal"&gt;blosc2.NDArray&lt;/code&gt; objects; it's a high-performance computing tool as well. Its lazy evaluation engine provides a simple yet powerful way to handle out-of-core operations. The computation engine is completely decoupled from the compression backend, and thus can easily work with many different array formats; however, the compute engine meshes most tightly with the Blosc2 native array format, achieving maximal performance (in terms of both computation time and memory usage).&lt;/p&gt;
&lt;p&gt;By adhering to the &lt;a class="reference external" href="https://data-apis.org/array-api/"&gt;Array API standard&lt;/a&gt;, it acts as a universal engine that can work with different storage backends; we already implement &lt;a class="reference external" href="https://ironarray.io/blog/array-api"&gt;more than 100 functions that are required by that standard&lt;/a&gt;, and the number will only grow in the future. If you have existing datasets in HDF5 or Zarr or TileDB (and we are always looking forward to support even more formats), and need a lightweight, efficient way to run computations on them, &lt;code class="docutils literal"&gt;blosc2.evaluate()&lt;/code&gt; is a fantastic tool to have in your arsenal. Of course, for maximum performance, the native Blosc2 format is a clear winner.&lt;/p&gt;
&lt;p&gt;Our work continues. We are committed to enhancing Python-Blosc2 by expanding its supported operations, improving performance across backends, and adding new ones. Stay tuned for more updates! If you found this post useful, please share it. For questions or comments, reach out to us on &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/discussions"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;</description><category>blosc2 hdf5 zarr tiledb dask numpy</category><guid>https://blosc.org/posts/tensordot-pure-persistent/</guid><pubDate>Wed, 15 Oct 2025 10:32:20 GMT</pubDate></item><item><title>TreeStore: Endowing Your Data With Hierarchical Structure</title><link>https://blosc.org/posts/new-treestore-blosc2/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;When working with large and complex datasets, having a way to organize your data efficiently is crucial. &lt;code class="docutils literal"&gt;blosc2.TreeStore&lt;/code&gt; is a powerful feature in the &lt;code class="docutils literal"&gt;blosc2&lt;/code&gt; library that allows you to store and manage your compressed arrays in a hierarchical, tree-like structure, much like a filesystem. This container, typically saved with a &lt;code class="docutils literal"&gt;.b2z&lt;/code&gt; extension, can hold not only &lt;code class="docutils literal"&gt;blosc2.NDArray&lt;/code&gt; or &lt;code class="docutils literal"&gt;blosc2.SChunk&lt;/code&gt; objects but also metadata, making it a versatile tool for data organization.&lt;/p&gt;
&lt;section id="what-is-a-treestore"&gt;
&lt;h2&gt;What is a TreeStore?&lt;/h2&gt;
&lt;p&gt;A &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; lets you arrange your data into groups (like directories) and datasets (like files). Each dataset is a &lt;code class="docutils literal"&gt;blosc2.NDArray&lt;/code&gt; or &lt;code class="docutils literal"&gt;blosc2.SChunk&lt;/code&gt; instance, benefiting from Blosc2's high-performance compression. This structure is ideal for scenarios where data has a natural hierarchy, such as in scientific experiments, simulations, or any project with multiple related datasets.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="basic-usage-creating-and-populating-a-treestore"&gt;
&lt;h2&gt;Basic Usage: Creating and Populating a TreeStore&lt;/h2&gt;
&lt;p&gt;Creating a &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is straightforward. You can use a &lt;code class="docutils literal"&gt;with&lt;/code&gt; statement to ensure the store is properly managed. Inside the &lt;code class="docutils literal"&gt;with&lt;/code&gt; block, you can create groups and datasets using a path-like syntax.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-1" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;blosc2&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-2" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-2"&gt;&lt;/a&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-3" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-3"&gt;&lt;/a&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-4" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-4"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Create a new TreeStore&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-5" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-5" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-5"&gt;&lt;/a&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TreeStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"my_experiment.b2z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-6" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-6" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-6"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# You can store numpy arrays, which are converted to blosc2.NDArray&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-7" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-7" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-7"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/dataset0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-8" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-8" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-8"&gt;&lt;/a&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-9" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-9" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-9"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Create a group with a dataset that can be a blosc2 NDArray&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-10" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-10" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-10"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1/dataset1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-11" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-11" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-11"&gt;&lt;/a&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-12" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-12" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-12"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# You can also store blosc2 arrays directly (vlmeta included)&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-13" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-13" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-13"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-14" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-14" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-14"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ext&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"desc"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dataset2 metadata"&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-15" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-15" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-15"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1/dataset2"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ext&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this example, we created a &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; in a file named &lt;code class="docutils literal"&gt;my_experiment.b2z&lt;/code&gt;.&lt;/p&gt;
&lt;img alt="/images/new-treestore-blosc2/tree-store-blog.png" class="align-center" src="https://blosc.org/images/new-treestore-blosc2/tree-store-blog.png" style="width: 90%;"&gt;
&lt;p&gt;It contains two groups, &lt;code class="docutils literal"&gt;root&lt;/code&gt; and &lt;code class="docutils literal"&gt;group1&lt;/code&gt;, each holding datasets.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="reading-from-a-treestore"&gt;
&lt;h2&gt;Reading from a TreeStore&lt;/h2&gt;
&lt;p&gt;To access the data, you open the &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; in read mode (&lt;code class="docutils literal"&gt;'r'&lt;/code&gt;) and use the same path-like keys to retrieve your arrays.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-1" name="rest_code_0507044fc58946738f7db9fd6207b65b-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Open the TreeStore in read-only mode ('r')&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-2" name="rest_code_0507044fc58946738f7db9fd6207b65b-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-2"&gt;&lt;/a&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TreeStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"my_experiment.b2z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-3" name="rest_code_0507044fc58946738f7db9fd6207b65b-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-3"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Access a dataset&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-4" name="rest_code_0507044fc58946738f7db9fd6207b65b-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-4"&gt;&lt;/a&gt;    &lt;span class="n"&gt;dataset1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1/dataset1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-5" name="rest_code_0507044fc58946738f7db9fd6207b65b-5" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-5"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Dataset 1:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataset1&lt;/span&gt;&lt;span class="p"&gt;[:])&lt;/span&gt;  &lt;span class="c1"&gt;# Use [:] to decompress and get a NumPy array&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-6" name="rest_code_0507044fc58946738f7db9fd6207b65b-6" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-7" name="rest_code_0507044fc58946738f7db9fd6207b65b-7" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-7"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Access the external array that has been stored internally&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-8" name="rest_code_0507044fc58946738f7db9fd6207b65b-8" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-8"&gt;&lt;/a&gt;    &lt;span class="n"&gt;dataset2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1/dataset2"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-9" name="rest_code_0507044fc58946738f7db9fd6207b65b-9" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-9"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Dataset 2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataset2&lt;/span&gt;&lt;span class="p"&gt;[:])&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-10" name="rest_code_0507044fc58946738f7db9fd6207b65b-10" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-10"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Dataset 2 metadata:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataset2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[:])&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-11" name="rest_code_0507044fc58946738f7db9fd6207b65b-11" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-11"&gt;&lt;/a&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-12" name="rest_code_0507044fc58946738f7db9fd6207b65b-12" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-12"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# List all paths in the store&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-13" name="rest_code_0507044fc58946738f7db9fd6207b65b-13" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-13"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Paths in TreeStore:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-1" name="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-1"&gt;&lt;/a&gt;Dataset 1: [0 1 2 3 4 5 6 7 8 9]
&lt;a id="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-2" name="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-2"&gt;&lt;/a&gt;Dataset 2 [0.0000000e+00 1.0001000e-04 2.0002000e-04 ... 9.9979997e-01 9.9989998e-01
&lt;a id="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-3" name="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-3"&gt;&lt;/a&gt; 1.0000000e+00]
&lt;a id="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-4" name="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-4"&gt;&lt;/a&gt;Dataset 2 metadata: {b'desc': 'dataset2 metadata'}
&lt;a id="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-5" name="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-5" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-5"&gt;&lt;/a&gt;Paths in TreeStore: ['/group1/dataset2', '/group2', '/group1', '/group2/another_dataset', '/group1/dataset1']
&lt;/pre&gt;&lt;/div&gt;
&lt;/section&gt;
&lt;section id="advanced-usage-metadata-and-subtrees"&gt;
&lt;h2&gt;Advanced Usage: Metadata and Subtrees&lt;/h2&gt;
&lt;p&gt;&lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; becomes even more powerful when you use metadata and interact with subtrees (groups).&lt;/p&gt;
&lt;section id="storing-metadata-with-vlmeta"&gt;
&lt;h3&gt;Storing Metadata with &lt;code class="docutils literal"&gt;vlmeta&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;You can attach variable-length metadata (&lt;code class="docutils literal"&gt;vlmeta&lt;/code&gt;) to any group or to the root of the tree. This is useful for storing information like author names, dates, or experiment parameters. &lt;code class="docutils literal"&gt;vlmeta&lt;/code&gt; is essentially a dictionary where you can store your metadata.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-1" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Appending metadata to the TreeStore&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-2" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-2"&gt;&lt;/a&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TreeStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"my_experiment.b2z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"a"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# 'a' for append/modify&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-3" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-3"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Add metadata to the root&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-4" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-4"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"author"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"The Blosc Team"&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-5" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-5" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-5"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2025-08-17"&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-6" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-6" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-7" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-7" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-7"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Add metadata to a group&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-8" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-8" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-8"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Data from the first run"&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-9" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-9" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-9"&gt;&lt;/a&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-10" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-10" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-10"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Reading metadata&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-11" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-11" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-11"&gt;&lt;/a&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TreeStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"my_experiment.b2z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-12" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-12" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-12"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Root metadata:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[:])&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-13" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-13" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-13"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Group 1 metadata:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[:])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_0a4960b41dd74dfe9394b993bab8dbb0-1" name="rest_code_0a4960b41dd74dfe9394b993bab8dbb0-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0a4960b41dd74dfe9394b993bab8dbb0-1"&gt;&lt;/a&gt;Root metadata: {'author': 'The Blosc Team', 'date': '2025-08-17'}
&lt;a id="rest_code_0a4960b41dd74dfe9394b993bab8dbb0-2" name="rest_code_0a4960b41dd74dfe9394b993bab8dbb0-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0a4960b41dd74dfe9394b993bab8dbb0-2"&gt;&lt;/a&gt;Group 1 metadata: {'description': 'Data from the first run'}
&lt;/pre&gt;&lt;/div&gt;
&lt;/section&gt;
&lt;section id="working-with-subtrees-groups"&gt;
&lt;h3&gt;Working with Subtrees (Groups)&lt;/h3&gt;
&lt;p&gt;A group object can be retrieved from the &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; and treated as a smaller, independent &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt;. This capability is useful for better organizing your data access code.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-1" name="rest_code_743e986080d940c7a5d3a558b96d7817-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-1"&gt;&lt;/a&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TreeStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"my_experiment.b2z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-2" name="rest_code_743e986080d940c7a5d3a558b96d7817-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-2"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Get the group as a subtree&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-3" name="rest_code_743e986080d940c7a5d3a558b96d7817-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-3"&gt;&lt;/a&gt;    &lt;span class="n"&gt;group1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-4" name="rest_code_743e986080d940c7a5d3a558b96d7817-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-4"&gt;&lt;/a&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-5" name="rest_code_743e986080d940c7a5d3a558b96d7817-5" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-5"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Now you can access datasets relative to this group&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-6" name="rest_code_743e986080d940c7a5d3a558b96d7817-6" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-6"&gt;&lt;/a&gt;    &lt;span class="n"&gt;dataset2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;group1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"dataset2"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-7" name="rest_code_743e986080d940c7a5d3a558b96d7817-7" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-7"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Dataset 2 from group object:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataset2&lt;/span&gt;&lt;span class="p"&gt;[:])&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-8" name="rest_code_743e986080d940c7a5d3a558b96d7817-8" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-8"&gt;&lt;/a&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-9" name="rest_code_743e986080d940c7a5d3a558b96d7817-9" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-9"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# You can also list contents relative to the group&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-10" name="rest_code_743e986080d940c7a5d3a558b96d7817-10" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-10"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Contents of group1:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;group1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_00726fb9fa04417c9a004d9202070667-1" name="rest_code_00726fb9fa04417c9a004d9202070667-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_00726fb9fa04417c9a004d9202070667-1"&gt;&lt;/a&gt;Dataset 2 from group object: [0.0000000e+00 1.0001000e-04 2.0002000e-04 ... 9.9979997e-01 9.9989998e-01
&lt;a id="rest_code_00726fb9fa04417c9a004d9202070667-2" name="rest_code_00726fb9fa04417c9a004d9202070667-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_00726fb9fa04417c9a004d9202070667-2"&gt;&lt;/a&gt; 1.0000000e+00]
&lt;a id="rest_code_00726fb9fa04417c9a004d9202070667-3" name="rest_code_00726fb9fa04417c9a004d9202070667-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_00726fb9fa04417c9a004d9202070667-3"&gt;&lt;/a&gt;Contents of group1: ['/dataset2', '/dataset1']
&lt;/pre&gt;&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="iterating-through-a-treestore"&gt;
&lt;h2&gt;Iterating Through a TreeStore&lt;/h2&gt;
&lt;p&gt;You can easily iterate through all the nodes in a &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; to inspect its contents.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_4361a1b365db459fa0bf6050f21c2a7b-1" name="rest_code_4361a1b365db459fa0bf6050f21c2a7b-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_4361a1b365db459fa0bf6050f21c2a7b-1"&gt;&lt;/a&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TreeStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"my_experiment.b2z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_4361a1b365db459fa0bf6050f21c2a7b-2" name="rest_code_4361a1b365db459fa0bf6050f21c2a7b-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_4361a1b365db459fa0bf6050f21c2a7b-2"&gt;&lt;/a&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;a id="rest_code_4361a1b365db459fa0bf6050f21c2a7b-3" name="rest_code_4361a1b365db459fa0bf6050f21c2a7b-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_4361a1b365db459fa0bf6050f21c2a7b-3"&gt;&lt;/a&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NDArray&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;a id="rest_code_4361a1b365db459fa0bf6050f21c2a7b-4" name="rest_code_4361a1b365db459fa0bf6050f21c2a7b-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_4361a1b365db459fa0bf6050f21c2a7b-4"&gt;&lt;/a&gt;            &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Found dataset at '&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;' with shape &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_4361a1b365db459fa0bf6050f21c2a7b-5" name="rest_code_4361a1b365db459fa0bf6050f21c2a7b-5" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_4361a1b365db459fa0bf6050f21c2a7b-5"&gt;&lt;/a&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# It's a group&lt;/span&gt;
&lt;a id="rest_code_4361a1b365db459fa0bf6050f21c2a7b-6" name="rest_code_4361a1b365db459fa0bf6050f21c2a7b-6" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_4361a1b365db459fa0bf6050f21c2a7b-6"&gt;&lt;/a&gt;            &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Found group at '&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;' with metadata: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[:]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_2cecd0bf87a74a458107344c21bf1e65-1" name="rest_code_2cecd0bf87a74a458107344c21bf1e65-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_2cecd0bf87a74a458107344c21bf1e65-1"&gt;&lt;/a&gt;Found dataset at '/group1/dataset2' with shape (10000,)
&lt;a id="rest_code_2cecd0bf87a74a458107344c21bf1e65-2" name="rest_code_2cecd0bf87a74a458107344c21bf1e65-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_2cecd0bf87a74a458107344c21bf1e65-2"&gt;&lt;/a&gt;Found group at '/group1' with metadata: {'description': 'Data from the first run'}
&lt;a id="rest_code_2cecd0bf87a74a458107344c21bf1e65-3" name="rest_code_2cecd0bf87a74a458107344c21bf1e65-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_2cecd0bf87a74a458107344c21bf1e65-3"&gt;&lt;/a&gt;Found dataset at '/group1/dataset1' with shape (10,)
&lt;a id="rest_code_2cecd0bf87a74a458107344c21bf1e65-4" name="rest_code_2cecd0bf87a74a458107344c21bf1e65-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_2cecd0bf87a74a458107344c21bf1e65-4"&gt;&lt;/a&gt;Found dataset at '/dataset0' with shape (100,)
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That's it for this introduction to &lt;code class="docutils literal"&gt;blosc2.TreeStore&lt;/code&gt;! You now know how to create, read, and manipulate a hierarchical data structure that can hold compressed datasets and metadata. You can find the source code for this example in the &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/examples/tree-store-blog.py"&gt;blosc2 repository&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="some-benchmarks"&gt;
&lt;h2&gt;Some Benchmarks&lt;/h2&gt;
&lt;p&gt;&lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is based on powerful abstractions from the &lt;code class="docutils literal"&gt;blosc2&lt;/code&gt; library, so it is very fast. Here are some benchmarks comparing &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; to other data storage formats, like HDF5 and Zarr. We have used two different configurations: one with small arrays, where sizes follow a normal distribution centered at 10 MB each, and the other with larger arrays, where sizes follow a normal distribution centered at 1 GB each. We have compared the performance of &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; against HDF5 and Zarr for both small and large arrays, measuring the time taken to create and read datasets.  For comparing apples with apples, we have used the same compression codec (&lt;code class="docutils literal"&gt;zstd&lt;/code&gt;) and filter (&lt;code class="docutils literal"&gt;shuffle&lt;/code&gt;) for all three formats.&lt;/p&gt;
&lt;p&gt;For assessing different platforms, we have used a desktop with an Intel i9-13900K CPU and 32 GB of RAM, running Ubuntu 25.04, and also a Mac mini with an Apple M4 Pro processor and 24 GB of RAM. The benchmarks were run using &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/large-tree-store.py"&gt;this script&lt;/a&gt;.&lt;/p&gt;
&lt;section id="results-for-the-intel-i9-13900k-desktop"&gt;
&lt;h3&gt;Results for the Intel i9-13900K desktop&lt;/h3&gt;
&lt;p&gt;100 small arrays (around 10 MB each) scenario:&lt;/p&gt;
&lt;img alt="/images/new-treestore-blosc2/benchmark_comparison_b2z-i13900K-10M.png" class="align-center" src="https://blosc.org/images/new-treestore-blosc2/benchmark_comparison_b2z-i13900K-10M.png" style="width: 75%;"&gt;
&lt;p&gt;For the small arrays scenario, we can see that &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is the fastest to create datasets (due to use of multi-threading), but it is slower than HDF5 and Zarr when reading datasets.  The reason for this is two-fold: first, &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is designed to work using multi-threading, so it must setup the necessary threads at the beginning of the read operation, which takes some time; second, &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is using NDArray objects internally, which are using a double partitioning scheme (chunks and blocks) to store the data, which adds some overhead when reading small slices of data. Regarding the space used, &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is the most efficient, very close to HDF5, and significantly more efficient than Zarr.&lt;/p&gt;
&lt;p&gt;100 large arrays (around 1 GB each) scenario:&lt;/p&gt;
&lt;img alt="/images/new-treestore-blosc2/benchmark_comparison_b2z-i13900K-1G.png" class="align-center" src="https://blosc.org/images/new-treestore-blosc2/benchmark_comparison_b2z-i13900K-1G.png" style="width: 75%;"&gt;
&lt;p&gt;When handling larger arrays, &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; maintains its lead in creation and full-read performance. Although HDF5 and Zarr offer faster access to small data slices, &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; compensates by being the most storage-efficient format, followed by HDF5, with Zarr being the most space-intensive.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="results-for-the-apple-m4-pro-mac-mini"&gt;
&lt;h3&gt;Results for the Apple M4 Pro Mac mini&lt;/h3&gt;
&lt;p&gt;100 small arrays (around 10 MB each) scenario:&lt;/p&gt;
&lt;img alt="/images/new-treestore-blosc2/benchmark_comparison_b2z-MacM4-10M.png" class="align-center" src="https://blosc.org/images/new-treestore-blosc2/benchmark_comparison_b2z-MacM4-10M.png" style="width: 75%;"&gt;
&lt;p&gt;100 large arrays (around 1 GB each) scenario:&lt;/p&gt;
&lt;img alt="/images/new-treestore-blosc2/benchmark_comparison_b2z-MacM4-1G.png" class="align-center" src="https://blosc.org/images/new-treestore-blosc2/benchmark_comparison_b2z-MacM4-1G.png" style="width: 75%;"&gt;
&lt;p&gt;Consistent with the previous results, &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is the most space-efficient format and the fastest for creating and reading datasets, particularly for larger arrays. Its performance is slower than HDF5 and Zarr only when reading small data slices (access time). This can be improved by reducing the number of threads from the default of eight, which lessens the thread setup overhead. For more details on this, see these &lt;a class="reference external" href="https://www.blosc.org/docs/2025-EuroSciPy-Blosc2.pdf"&gt;slides comparing 8-thread vs 1-thread performance&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Notably, the Apple M4 Pro processor shows competitive performance against the Intel i9-13900K CPU, a high-end desktop processor that consumes up to 8x more power. This result underscores the efficiency of the ARM architecture in general and Apple silicon in particular.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In summary, &lt;code class="docutils literal"&gt;blosc2.TreeStore&lt;/code&gt; offers a straightforward yet potent solution for hierarchically organizing compressed datasets. By merging the high-performance compression of &lt;code class="docutils literal"&gt;blosc2.NDArray&lt;/code&gt; and &lt;code class="docutils literal"&gt;blosc2.SChunk&lt;/code&gt; with a flexible, filesystem-like structure and metadata support, it stands out as an excellent choice for managing complex data projects.&lt;/p&gt;
&lt;p&gt;As &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is currently in beta, we welcome feedback and suggestions for its improvement. For further details, please consult the official documentation for &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/reference/tree_store.html#blosc2.TreeStore"&gt;blosc2.TreeStore&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;</description><category>treestore hierarchical structure performance</category><guid>https://blosc.org/posts/new-treestore-blosc2/</guid><pubDate>Sun, 17 Aug 2025 10:33:20 GMT</pubDate></item><item><title>Blosc2 Gets Fancy (Indexing)</title><link>https://blosc.org/posts/blosc2-fancy-indexing/</link><dc:creator>Luke Shaw</dc:creator><description>&lt;p&gt;&lt;strong&gt;Update (2025-08-26)&lt;/strong&gt;: After some further effort, the 1D fast path mentioned below has been extended to the multidimensional case, with consequent speedups in Blosc2 3.7.3! See below plot comparing maximum and minimum indexing times for the Blosc2-supported fancy indexing cases mentioned below.&lt;/p&gt;
&lt;img alt="/images/blosc2-fancy-indexing/newfancybench.png" src="https://blosc.org/images/blosc2-fancy-indexing/newfancybench.png"&gt;
&lt;p&gt;---&lt;/p&gt;
&lt;p&gt;In response to requests from our users, the Blosc2 team has &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/release_notes/index.html"&gt;introduced a fancy indexing capability&lt;/a&gt; into the flagship Blosc2 &lt;code class="docutils literal"&gt;NDArray&lt;/code&gt; object. In the future, this could be extended to other classes within the Blosc2 library, such as &lt;code class="docutils literal"&gt;C2Array&lt;/code&gt; and &lt;code class="docutils literal"&gt;LazyArray&lt;/code&gt;.&lt;/p&gt;
&lt;section id="what-is-fancy-indexing"&gt;
&lt;h2&gt;What is Fancy Indexing?&lt;/h2&gt;
&lt;p&gt;In many array libraries, most famously &lt;code class="docutils literal"&gt;NumPy&lt;/code&gt;, &lt;em&gt;fancy indexing&lt;/em&gt; refers to a vectorized indexing format which allows for simultaneous selection and reshaping of arrays (see &lt;a class="reference external" href="https://jakevdp.github.io/PythonDataScienceHandbook/02.07-fancy-indexing.html"&gt;this excerpt&lt;/a&gt;). For example, one may wish to select three entries from a 1D array:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr = array([10, 11, 12])&lt;/pre&gt;
&lt;p&gt;which can be done like so:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[[1,2,1]]
&amp;gt;&amp;gt; array([11, 12, 11])&lt;/pre&gt;
&lt;p&gt;Note that the order of the indices is arbitrary (i.e. the elements of the output may occur in a different order to the original array) and indices may be repeated. Moreover, if the array is multidimensional, for example:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr = array([[10, 11],
             [12, 13],
             [14, 15]])&lt;/pre&gt;
&lt;p&gt;then the output consists of the relevant rows:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[[1,2,0]]
&amp;gt;&amp;gt; array([[12, 13],
          [14, 15],
          [10, 11]])&lt;/pre&gt;
&lt;p&gt;and so on for arbitrary numbers of dimensions.&lt;/p&gt;
&lt;p&gt;Indeed one can output arbitrary shapes, for example via:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[[[1,2],[0,1]]]
&amp;gt;&amp;gt; array([[[12, 13],
          [14, 15]],

         [[10, 11],
          [12, 13]]])&lt;/pre&gt;
&lt;p&gt;NumPy supports many different kinds of fancy indexing, a flavour of which can be seen from the following examples, where &lt;code class="docutils literal"&gt;row&lt;/code&gt; and &lt;code class="docutils literal"&gt;col&lt;/code&gt; are integer array objects. If they are not of the same shape then broadcasting conventions will be applied to try to massage the index into an understandable format.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal"&gt;arr[row]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;arr[[row,&lt;/span&gt; col]]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal"&gt;arr[row, col]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;arr[row[:,&lt;/span&gt; None], col]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal"&gt;arr[1, col]&lt;/code&gt; or &lt;code class="docutils literal"&gt;arr[1:9, col]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In addition, one may use a boolean mask, in combination with integer indices, slices, or integer arrays via&lt;/p&gt;
&lt;ol class="arabic simple" start="6"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;arr[row[:,&lt;/span&gt; None], mask]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;where the &lt;code class="docutils literal"&gt;mask&lt;/code&gt; must have the same length as the indexed dimension(s).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="support-for-fancy-indexing-and-ndindex"&gt;
&lt;h2&gt;Support for Fancy Indexing and &lt;code class="docutils literal"&gt;ndindex&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Other libraries for management of large arrays such as &lt;code class="docutils literal"&gt;zarr&lt;/code&gt; and &lt;code class="docutils literal"&gt;h5py&lt;/code&gt; offer fancy indexing support but neither are as comprehensive as NumPy. &lt;code class="docutils literal"&gt;h5py&lt;/code&gt;, which uses the HDF5 format, is quite limited in that one may only use one integer array, no repeated indices are allowed, and the array must be sorted in increasing order, although mixed slice and integer array indexing is possible.
&lt;code class="docutils literal"&gt;zarr&lt;/code&gt;, via its &lt;code class="docutils literal"&gt;vindex&lt;/code&gt; (for vectorized index), offers more support, but is rather limited when it comes to mixed indexing, as slices may not be used with integer arrays, and an integer array must be provided for every dimension of the array (i.e. &lt;code class="docutils literal"&gt;arr[row]&lt;/code&gt; fails on any non-1D &lt;code class="docutils literal"&gt;arr&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;This makes it difficult (in the case of &lt;code class="docutils literal"&gt;zarr&lt;/code&gt;) or impossible (in the case of &lt;code class="docutils literal"&gt;h5py&lt;/code&gt;) to do the kind of reshaping we saw in the introduction (i.e. case 2 above &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;arr[[[1,2],[0,1]]]&lt;/span&gt;&lt;/code&gt;). This lack of support is due to a combination of: 1) the computational difficulty of many of these operations; and 2) the at times counter-intuitive behaviour of fancy indexing (see the end of this blog post for more details).&lt;/p&gt;
&lt;p&gt;When implementing fancy indexing for Blosc2 we strove to match the functionality of NumPy as closely as possible, and we have almost been able to do so — all the 6 cases mentioned above are perfectly feasible with this new Blosc2 release! There are only some minor edge cases which are not supported (see Example 2 in the Addendum). This would not have been possible without the excellent &lt;a class="reference external" href="https://quansight-labs.github.io/ndindex/index.html"&gt;ndindex library&lt;/a&gt;, which offers many very useful, efficient functions for index conversion between different shapes and chunks. We can then call NumPy behind-the-scenes, chunk-by-chunk, and exploit its native support for fancy indexing, without having to load the entire array into memory.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="results-blosc2-zarr-h5py-and-numpy"&gt;
&lt;h2&gt;Results: Blosc2, Zarr, H5Py and NumPy&lt;/h2&gt;
&lt;p&gt;Hence, when averaging over the indexing cases above on 2D arrays of varying sizes, we observe only a minor slowdown for Blosc2 compared to NumPy when the array size is small compared to total memory (24GB), suggesting a small chunking-and-indexing overhead. As expected, when the array grows to an appreciable fraction of memory (16GB), loading the full NumPy array into memory starts to impact performance. The black error bars in the plots indicate the maximum and minimum times observed over the indexing cases (for which there is clearly a large variation).&lt;/p&gt;
&lt;p&gt;Note that for cases 4 and 6 with large &lt;code class="docutils literal"&gt;row&lt;/code&gt; or &lt;code class="docutils literal"&gt;col&lt;/code&gt; index arrays, broadcasting causes the resulting index (stored in memory) to be very large, and even for array sizes of 2GB computation is too slow. In the future, we would like to see if this can be improved.&lt;/p&gt;
&lt;img alt="/images/blosc2-fancy-indexing/fancyIdxNumpyBlosc22D.png" src="https://blosc.org/images/blosc2-fancy-indexing/fancyIdxNumpyBlosc22D.png"&gt;
&lt;p&gt;Blosc2 is also as fast or faster than Zarr and HDF5 even for the limited use cases that the latter two libraries both support. HDF5 in particular is especially slow when the indexing array is very large.&lt;/p&gt;
&lt;img alt="/images/blosc2-fancy-indexing/fancyIdxNumpyBlosc2ZarrHDF52D.png" src="https://blosc.org/images/blosc2-fancy-indexing/fancyIdxNumpyBlosc2ZarrHDF52D.png"&gt;
&lt;p&gt;These plots have been generated using a Mac mini with the Apple M4 Pro processor. The benchmark is available on the Blosc2 github repo &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/fancy_index.py"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Blosc2 offers a powerful and flexible fancy indexing functionality that is more extensive than that of Zarr and H5Py, while also being able to handle large arrays on-disk without loading them into memory. This makes it a great choice for applications that require complex indexing operations on large datasets.
Give it a try in your own projects! If you have questions, the Blosc2 community is here to help.&lt;/p&gt;
&lt;p&gt;If you appreciate what we're doing with Blosc2, please think about &lt;a class="reference external" href="https://www.blosc.org/pages/blosc-in-depth/#support-blosc/"&gt;supporting us&lt;/a&gt;. Your help lets us keep making these tools better.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="addendum-oindex-vindex-and-fancyindex-via-two-examples"&gt;
&lt;h2&gt;Addendum: Oindex, Vindex and FancyIndex via Two Examples&lt;/h2&gt;
&lt;p&gt;Zarr's implementation of fancy indexing is packaged as &lt;code class="docutils literal"&gt;vindex&lt;/code&gt; (vectorized indexing). It also offers another indexing functionality, called orthogonal indexing, via &lt;code class="docutils literal"&gt;oindex&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The reason for this dual support becomes clear when one considers a simple example.&lt;/p&gt;
&lt;section id="example-1"&gt;
&lt;h3&gt;Example 1&lt;/h3&gt;
&lt;p&gt;For a 2D array, we have seen that the fancy-indexing rules will cause the two index arrays below to be broadcast together:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[[0, 1], [2, 3]] -&amp;gt; [arr[0,2], arr[1,3]]&lt;/pre&gt;
&lt;p&gt;giving an output with two elements of shape (2,). This is &lt;em&gt;vindexing&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;However, one could understand this indexing as selecting rows 0 and 1 in the array, and then their intersection with columns 2 and 3. This gives an output with &lt;em&gt;four&lt;/em&gt; elements of shape (2, 2), with elements:&lt;/p&gt;
&lt;pre class="literal-block"&gt;[[arr[0,2], arr[0,3]],
 [arr[1,2], arr[1,3]]]&lt;/pre&gt;
&lt;p&gt;This is &lt;em&gt;oindexing&lt;/em&gt;. Clearly, given the same index, the output is in general different; it is for this reason that the debate about fancy indexing can be quite polemical, and why there is a &lt;a class="reference external" href="https://NumPy.org/neps/nep-0021-advanced-indexing.html"&gt;movement&lt;/a&gt; to introduce the vindex/oindex duality in NumPy.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="example-2"&gt;
&lt;h3&gt;Example 2&lt;/h3&gt;
&lt;p&gt;I have glossed over this until now, but vindex is &lt;em&gt;not&lt;/em&gt; the same as fancy indexing. For this reason Zarr does not support all the functionality of fancy indexing, since it only supports vindex. The most important distinction between the two is that it seeks to avoid certain unexpected fancy indexing behaviour, as can be seen by considering a 3D NumPy array of shape &lt;code class="docutils literal"&gt;(X, Y, Z)&lt;/code&gt; as in the &lt;a class="reference external" href="https://NumPy.org/neps/nep-0021-advanced-indexing.html#mixed-indexing"&gt;example here&lt;/a&gt;. Consider the unexpected behaviour of:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[:10, :, [0,1]] has shape (10, Y, 2).

arr[0, :, [0, 1]] has shape (2, Y), not (Y, 2)!!&lt;/pre&gt;
&lt;p&gt;NumPy indexing treats non-slice indices differently, and will always put the axes introduced by the index array first, unless the non-slice indexes are consecutive, in which case it will try to massage the result to something intuitive (which normally coincides with the result of an &lt;code class="docutils literal"&gt;oindex&lt;/code&gt;) — hence &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;arr[:,&lt;/span&gt; 0, [0, 1]]&lt;/code&gt; has shape &lt;code class="docutils literal"&gt;(X, 2)&lt;/code&gt;, not &lt;code class="docutils literal"&gt;(2, X)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The hypothesised NumPy &lt;code class="docutils literal"&gt;vindex&lt;/code&gt; would eliminate this transposition behaviour, and be internally consistent, always putting the axes introduced by the index array first. Unfortunately, this is difficult and costly, and so the alternative is to simply not allow such indexing and throw an error, or force the user to be very specific.&lt;/p&gt;
&lt;p&gt;Blosc2 will throw an error when one inserts a slice between array indices:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[:, 0, [0, 1]] -&amp;gt; shape (X, 2)
arr.vindex[0, :, [0,1]] -&amp;gt; ERROR&lt;/pre&gt;
&lt;p&gt;Zarr's &lt;code class="docutils literal"&gt;vindex&lt;/code&gt; (called by &lt;code class="docutils literal"&gt;__getitem__&lt;/code&gt;), by requiring integer array indices for all dimensions, throws an error for all mixed indices of this type:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[:, 0, [0, 1]] -&amp;gt; ERROR
arr[0, :, [0,1]] -&amp;gt; ERROR&lt;/pre&gt;
&lt;p&gt;Thus to reproduce the result of Blosc2 for the first case, one must use an explicit index array:&lt;/p&gt;
&lt;pre class="literal-block"&gt;idx = np.array([0,1]).reshape(1,-1)
arr[np.arange(X).reshape(-1,1), 0 , idx] -&amp;gt; shape (X, 2)&lt;/pre&gt;
&lt;p&gt;For both Blosc2 and Zarr, one must use an explicit index array like so for the second case:&lt;/p&gt;
&lt;pre class="literal-block"&gt;arr[0, np.arange(Y).reshape(-1,1), idx] -&amp;gt; shape (Y, 2)&lt;/pre&gt;
&lt;p&gt;Hopefully you now understand why fancy indexing can be so tricky, and why few libraries seek to support it to the same extent as NumPy - some would say it is perhaps not even desirable to do so!&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;</description><category>blosc2 fancyindex performance</category><guid>https://blosc.org/posts/blosc2-fancy-indexing/</guid><pubDate>Wed, 16 Jul 2025 13:33:20 GMT</pubDate></item><item><title>Efficient array concatenation launched in Blosc2</title><link>https://blosc.org/posts/blosc2-new-concatenate/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;&lt;strong&gt;Update (2025-06-23):&lt;/strong&gt; Recently, Luke Shaw added a &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/pull/427#pullrequestreview-2948922546"&gt;stack() function in Blosc2&lt;/a&gt;, using the concatenate feature described here. The new function allows you to stack arrays along a new axis, which is particularly useful for creating higher-dimensional arrays from lower-dimensional ones.  We have added a section at the end of this post to show the usage and performance of this new function.&lt;/p&gt;
&lt;p&gt;---&lt;/p&gt;
&lt;p&gt;Blosc2 just got a cool new trick: super-efficient array concatenation! If you've ever needed to combine several arrays into one, especially when dealing with lots of data, this new feature is for you. It's built to be fast and use as little memory as possible. This is especially true if your array sizes line up nicely with Blosc2's internal "chunks" (think of these as the building blocks of your compressed data). When this alignment happens, concatenation is lightning-fast, making it perfect for demanding tasks.&lt;/p&gt;
&lt;p&gt;You can use this new concatenate feature whether you're &lt;a class="reference external" href="https://www.blosc.org/c-blosc2/reference/b2nd.html#c.b2nd_concatenate"&gt;coding in C&lt;/a&gt; or &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/reference/autofiles/ndarray/blosc2.concatenate.html"&gt;Python&lt;/a&gt;, and it works with any Blosc2 NDArray (Blosc2's way of handling multi-dimensional arrays).&lt;/p&gt;
&lt;p&gt;Let's see how easy it is to use in Python. If you're familiar with NumPy, the &lt;cite&gt;blosc2.concatenate&lt;/cite&gt; function will feel very similar:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-1" name="rest_code_6b84ae8e67404befa8fe60f429021d30-1" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;blosc2&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-2" name="rest_code_6b84ae8e67404befa8fe60f429021d30-2" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-2"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Create some sample arrays&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-3" name="rest_code_6b84ae8e67404befa8fe60f429021d30-3" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arrayA.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-4" name="rest_code_6b84ae8e67404befa8fe60f429021d30-4" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-4"&gt;&lt;/a&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arrayB.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-5" name="rest_code_6b84ae8e67404befa8fe60f429021d30-5" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-5"&gt;&lt;/a&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arrayC.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-6" name="rest_code_6b84ae8e67404befa8fe60f429021d30-6" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-6"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Concatenate the arrays along the first axis&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-7" name="rest_code_6b84ae8e67404befa8fe60f429021d30-7" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-7"&gt;&lt;/a&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"destination.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-8" name="rest_code_6b84ae8e67404befa8fe60f429021d30-8" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-8"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# The result is a new Blosc2 NDArray containing the concatenated data&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-9" name="rest_code_6b84ae8e67404befa8fe60f429021d30-9" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-9"&gt;&lt;/a&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Output: (30, 20)&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-10" name="rest_code_6b84ae8e67404befa8fe60f429021d30-10" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-10"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# You can also concatenate along other axes&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-11" name="rest_code_6b84ae8e67404befa8fe60f429021d30-11" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-11"&gt;&lt;/a&gt;&lt;span class="n"&gt;result_axis1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"destination_axis1.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-12" name="rest_code_6b84ae8e67404befa8fe60f429021d30-12" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-12"&gt;&lt;/a&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_axis1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Output: (10, 60)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;cite&gt;blosc2.concatenate&lt;/cite&gt; function is pretty straightforward. You give it a list of the arrays you want to join together. You can also tell it which way to join them using the axis parameter (like joining them end-to-end or side-by-side).&lt;/p&gt;
&lt;p&gt;A really handy feature is that you can use urlpath and mode to save the combined array directly to a file. This is great when you're working with huge datasets because you don't have to load everything into memory at once. What you get back is a brand new, persistent Blosc2 NDArray with all your data combined.&lt;/p&gt;
&lt;section id="aligned-versus-non-aligned-concatenation"&gt;
&lt;h2&gt;Aligned versus Non-Aligned Concatenation&lt;/h2&gt;
&lt;p&gt;Blosc2's concatenate function is smart. It processes your data in small pieces of compressed data (chunks). This has two consequences. The first is that you can join very large arrays, stored on your disk, chunk-by-chunk without using up all your computer's memory. Secondly, if the chunks fit neatly into the arrays to be concatenated, the process is much faster. Why? Because Blosc2 can avoid a lot of extra work, chiefly decompressing and re-compressing the chunks.&lt;/p&gt;
&lt;p&gt;Let's look at some pictures to see what "aligned" and "unaligned" concatenation means. "Aligned" means that chunk boundaries of the arrays to be concatenated line up with each other. "Unaligned" means that this is not the case.&lt;/p&gt;
&lt;img alt="/images/blosc2-new-concatenate/concat-unaligned.png" src="https://blosc.org/images/blosc2-new-concatenate/concat-unaligned.png"&gt;
&lt;img alt="/images/blosc2-new-concatenate/concat-aligned.png" src="https://blosc.org/images/blosc2-new-concatenate/concat-aligned.png"&gt;
&lt;p&gt;The pictures show why "aligned" concatenation is faster. In Blosc2, all data pieces (chunks) inside an array must be the same size. So, if the chunks in the arrays you're joining match up ("aligned"), Blosc2 can combine them very quickly. It doesn't have to rearrange the data into new, same-sized chunks for the final array. This is a big deal for large arrays.&lt;/p&gt;
&lt;p&gt;If the arrays are "unaligned," Blosc2 has more work to do. It has to decompress and then re-compress the data to make the new chunks fit, which takes longer. There's one more small detail for this fast method to work: the first array's size needs to be a neat multiple of its chunk size along the direction you're joining.&lt;/p&gt;
&lt;p&gt;A big plus with Blosc2 is that it always processes data in these small chunks. This means it can combine enormous arrays without ever needing to load everything into your computer's memory at once.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance"&gt;
&lt;h2&gt;Performance&lt;/h2&gt;
&lt;p&gt;To show you how much faster this new concatenate feature is, we did a speed test using LZ4 as the internal compressor in Blosc2. We compared it to the usual way of joining arrays with &lt;cite&gt;numpy.concatenate&lt;/cite&gt;.&lt;/p&gt;
&lt;img alt="/images/blosc2-new-concatenate/benchmark-lz4-20k-i13900K.png" src="https://blosc.org/images/blosc2-new-concatenate/benchmark-lz4-20k-i13900K.png"&gt;
&lt;p&gt;The speed tests show that Blosc2's new concatenate is rather slow for small arrays (like 1,000 x 1,000). This is because it has to do a lot of work to set up the concatenation. But when you use larger arrays (like 20,000 x 20,000) that start to exceed the memory limits of our test machine (32 GB of RAM), Blosc2's new concatenate peformance is much better, and nearing the performance of NumPy's &lt;cite&gt;concatenate&lt;/cite&gt; function.&lt;/p&gt;
&lt;p&gt;However, if your array sizes line up well with Blosc2's internal chunks ("aligned" arrays), Blosc2 becomes much faster—typically more than 10x times faster than NumPy for large arrays. This is because it can skip a lot of the work of decompressing and re-compressing data, and the cost of copying compressed data is also lower (as much as the achieved compression ratio, which for this case is around 10x).&lt;/p&gt;
&lt;p&gt;Using the Zstd compressor with Blosc2 can make joining "aligned" arrays even quicker, since Zstd is good at making data smaller.&lt;/p&gt;
&lt;img alt="/images/blosc2-new-concatenate/benchmark-zstd-20k-i13900K.png" src="https://blosc.org/images/blosc2-new-concatenate/benchmark-zstd-20k-i13900K.png"&gt;
&lt;p&gt;So, when arrays are aligned, there's less data to copy (compression ratios here are around 20x), which speeds things up. If arrays aren't aligned, Zstd is a bit slower than the previous compressor (LZ4) because its decompression and re-compression algorithm is slower. Conclusion? Pick the compressor that works best for what you're doing!&lt;/p&gt;
&lt;/section&gt;
&lt;section id="stacking-arrays"&gt;
&lt;h2&gt;Stacking Arrays&lt;/h2&gt;
&lt;p&gt;We've also added a new &lt;cite&gt;stack()&lt;/cite&gt; function in Blosc2 that uses the concatenate feature. This function lets you stack arrays along a new axis, which is super useful for creating higher-dimensional arrays from lower-dimensional ones. Here's how it works:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-1" name="rest_code_df21eceede234f6baf21c132ad12bd53-1" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;blosc2&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-2" name="rest_code_df21eceede234f6baf21c132ad12bd53-2" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-2"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Create some sample arrays&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-3" name="rest_code_df21eceede234f6baf21c132ad12bd53-3" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arrayA.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-4" name="rest_code_df21eceede234f6baf21c132ad12bd53-4" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-4"&gt;&lt;/a&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arrayB.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-5" name="rest_code_df21eceede234f6baf21c132ad12bd53-5" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-5"&gt;&lt;/a&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arrayC.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-6" name="rest_code_df21eceede234f6baf21c132ad12bd53-6" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-6"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Stack the arrays along a new axis&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-7" name="rest_code_df21eceede234f6baf21c132ad12bd53-7" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-7"&gt;&lt;/a&gt;&lt;span class="n"&gt;stacked_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"stacked_destination.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-8" name="rest_code_df21eceede234f6baf21c132ad12bd53-8" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-8"&gt;&lt;/a&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stacked_result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Output: (3, 10, 20)&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-9" name="rest_code_df21eceede234f6baf21c132ad12bd53-9" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-9"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# You can also stack along other axes&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-10" name="rest_code_df21eceede234f6baf21c132ad12bd53-10" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-10"&gt;&lt;/a&gt;&lt;span class="n"&gt;stacked_result_axis1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"stacked_destination_axis1.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-11" name="rest_code_df21eceede234f6baf21c132ad12bd53-11" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-11"&gt;&lt;/a&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stacked_result_axis1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Output: (10, 3, 20)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Benchmarks for the &lt;cite&gt;stack()&lt;/cite&gt; function show that it performs similarly to the &lt;cite&gt;concat()&lt;/cite&gt; function, especially when the input arrays are aligned.  Here are the results for the same data sizes and machine used in the previous benchmarks, and using the LZ4 compressor.&lt;/p&gt;
&lt;img alt="/images/blosc2-new-concatenate/stack-lz4-20k-i13900K.png" src="https://blosc.org/images/blosc2-new-concatenate/stack-lz4-20k-i13900K.png"&gt;
&lt;p&gt;And here are the results for the Zstd compressor.&lt;/p&gt;
&lt;img alt="/images/blosc2-new-concatenate/stack-zstd-20k-i13900K.png" src="https://blosc.org/images/blosc2-new-concatenate/stack-zstd-20k-i13900K.png"&gt;
&lt;p&gt;As can be seen, the &lt;cite&gt;stack()&lt;/cite&gt; function is also very fast when the input arrays are aligned, and it performs well even for large arrays that don't fit into memory. Incidentally, when using the &lt;cite&gt;blosc2.stack()&lt;/cite&gt; function in the last dim, it is slightly faster than &lt;cite&gt;numpy.stack()&lt;/cite&gt; even when the arrays are not aligned; we are not sure why this is the case, but the fact that we can reproduces this behaviour is probably a sign that NumPy can optimize this use case better.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Blosc2's new concatenate and stack features are a great way to combine arrays quickly and without using too much memory. They are especially fast when your array sizes are an exact multiple of Blosc2's "chunks" (aligned arrays), making it perfect for big data jobs. They also work well for large arrays that don't fit into memory, as it processes data in small chunks. Finally, they are supported in both C and Python, so you can use them in your favorite programming language.&lt;/p&gt;
&lt;p&gt;Give it a try in your own projects! If you have questions, the Blosc2 community is here to help.&lt;/p&gt;
&lt;p&gt;If you appreciate what we're doing with Blosc2, please think about &lt;a class="reference external" href="https://www.blosc.org/pages/blosc-in-depth/#support-blosc/"&gt;supporting us&lt;/a&gt;. Your help lets us keep making these tools better.&lt;/p&gt;
&lt;/section&gt;</description><category>blosc2 concatenate performance</category><guid>https://blosc.org/posts/blosc2-new-concatenate/</guid><pubDate>Mon, 16 Jun 2025 13:33:20 GMT</pubDate></item><item><title>Make NDArray Transposition Fast (and Compressed!) within Blosc 2 </title><link>https://blosc.org/posts/optimizing-chunks-transpose/</link><dc:creator>Ricardo Sales Piquer</dc:creator><description>&lt;p&gt;&lt;strong&gt;Update (2025-04-30):&lt;/strong&gt; The &lt;code class="docutils literal"&gt;transpose&lt;/code&gt; function is now officially deprecated and
replaced by the new &lt;code class="docutils literal"&gt;permute_dims&lt;/code&gt;. This transition follows the Python array
API standard v2022.12, aiming to make Blosc2 even more compatible with modern
Python libraries and workflows.&lt;/p&gt;
&lt;p&gt;In contrast with the previous &lt;code class="docutils literal"&gt;transpose&lt;/code&gt;, the new &lt;code class="docutils literal"&gt;permute_dims&lt;/code&gt; offers:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Support for arrays of any number of dimensions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Full handling of arbitrary axis permutations, including support for
negative indices.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Moreover, I have found a new way to transpose matrices more efficiently for
Blosc2. This blog contains updated plots and discussions.&lt;/p&gt;
&lt;p&gt;---&lt;/p&gt;
&lt;p&gt;Matrix transposition is more than a textbook exercise, it plays a key role in
memory-bound operations where layout and access patterns can make or break
performance.&lt;/p&gt;
&lt;p&gt;When working with large datasets, efficient data transformation can significantly
improve both performance and compression ratios. In Blosc2, we recently implemented
a matrix transposition function, a fundamental operation that rearranges data by
swapping rows and columns. In this post, I'll share the design insights,
implementation details, performance considerations that went into this feature,
and an unexpected NumPy behaviour.&lt;/p&gt;
&lt;section id="what-was-the-old-behavior"&gt;
&lt;h2&gt;What was the old behavior?&lt;/h2&gt;
&lt;p&gt;Previously, calling &lt;code class="docutils literal"&gt;blosc2.transpose(A)&lt;/code&gt; would &lt;strong&gt;transpose the data within
each chunk&lt;/strong&gt;, and a new chunk shape would be chosen for the output array.
However, this new chunk shape was not necessarily aligned with the new memory
access patterns induced by the transpose. As a result, even though the output
looked correct, accessing data along the new axes still incurred a
significant overhead due to increased number of I/O operations. This
lead to performance bottlenecks, particularly in workloads that rely on
efficient memory access patterns.&lt;/p&gt;
&lt;img alt="Transposition explanation for old operation" class="align-center" src="https://blosc.org/images/blosc2-transpose/transpose2.png"&gt;
&lt;/section&gt;
&lt;section id="what-s-new"&gt;
&lt;h2&gt;What's new?&lt;/h2&gt;
&lt;p&gt;The &lt;code class="docutils literal"&gt;permute_dims&lt;/code&gt; function in Blosc2 has been redesigned to greatly improve
performance when working with compressed, multidimensional arrays. The main
improvement lies in &lt;strong&gt;transposing the chunk layout alongside the array data&lt;/strong&gt;,
which eliminates the overhead of cross-chunk access patterns.&lt;/p&gt;
&lt;p&gt;The new implementation transposes the chunk layout along with the data.
For example, an array with &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;chunks=(2,&lt;/span&gt; 5)&lt;/code&gt; that is transposed with
&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;axes=(1,&lt;/span&gt; 0)&lt;/code&gt; will result in an array with &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;chunks=(5,&lt;/span&gt; 2)&lt;/code&gt;. This ensures
that the output layout matches the new data order, making block access
contiguous and efficient.&lt;/p&gt;
&lt;p&gt;This logic generalizes to N-dimensional arrays and applies regardless of their
shape or chunk configuration.&lt;/p&gt;
&lt;img alt="Transposition explanation for new operation" class="align-center" src="https://blosc.org/images/blosc2-transpose/transpose3.png"&gt;
&lt;/section&gt;
&lt;section id="performance-benchmark-transposing-matrices-with-blosc2-vs-numpy"&gt;
&lt;h2&gt;Performance benchmark: Transposing matrices with Blosc2 vs NumPy&lt;/h2&gt;
&lt;p&gt;To evaluate the performance of the new matrix transposition implementation in
&lt;em&gt;Blosc2&lt;/em&gt;, I conducted a series of benchmarks comparing it to &lt;em&gt;NumPy&lt;/em&gt;, which
serves as the baseline due to its widespread use and high optimization level.
The goal was to observe how both approaches perform when handling matrices of
increasing size and to understand the impact of different chunk configurations
in Blosc2.&lt;/p&gt;
&lt;section id="benchmark-setup"&gt;
&lt;h3&gt;Benchmark setup&lt;/h3&gt;
&lt;p&gt;All tests were conducted using matrices filled with &lt;code class="docutils literal"&gt;float64&lt;/code&gt; values,
covering a wide range of sizes, starting from small &lt;code class="docutils literal"&gt;100×100&lt;/code&gt; matrices and
scaling up to very large matrices of size &lt;code class="docutils literal"&gt;17000×17000&lt;/code&gt;, covering data sizes
from just a few megabytes to over 2 GB. Each matrix was transposed using the
Blosc2 API under different chunking strategies:&lt;/p&gt;
&lt;p&gt;In the case of NumPy, I used the &lt;code class="docutils literal"&gt;.transpose()&lt;/code&gt; function followed by a
&lt;code class="docutils literal"&gt;.copy()&lt;/code&gt; to ensure that the operation was comparable to that of Blosc2. This
is because, by default, NumPy's transposition is a view operation that only
modifies the array's metadata, without actually rearranging the data in memory.
Adding &lt;code class="docutils literal"&gt;.copy()&lt;/code&gt; forces NumPy to perform a real memory reordering, making the
comparison with Blosc2 fair and accurate.&lt;/p&gt;
&lt;p&gt;For Blosc2, I tested the transposition function across several chunk
configurations. Specifically, I included:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Automatic chunking, where Blosc2 decides the optimal chunk size
internally.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fixed chunk sizes: &lt;code class="docutils literal"&gt;(150, 300)&lt;/code&gt;, &lt;code class="docutils literal"&gt;(1000, 1000)&lt;/code&gt; and
&lt;code class="docutils literal"&gt;(5000, 5000)&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These chunk sizes were chosen to represent a mix of square and rectangular
blocks, allowing me to study how chunk geometry impacts performance, especially
for very large matrices.&lt;/p&gt;
&lt;p&gt;Each combination of library and configuration was tested across all matrix sizes,
and the time taken to perform the transposition was recorded in seconds. This
comprehensive setup makes it possible to compare not just raw performance, but
also how well each method scales with data size and structure.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="results-and-discussion"&gt;
&lt;h3&gt;Results and discussion&lt;/h3&gt;
&lt;p&gt;The chart below summarizes the benchmark results for matrix transposition using
NumPy and Blosc2, across various chunk shapes and matrix sizes.&lt;/p&gt;
&lt;img alt="Transposition performance for new method" class="align-center" src="https://blosc.org/images/blosc2-transpose/performance-new.png"&gt;
&lt;p&gt;While NumPy sets a strong performance baseline, the behaviour of Blosc2 becomes
particularly interesting when we dive into how different chunk configurations
affect transposition speed. The following observations highlight how crucial the
choice of chunk shape is to achieving optimal performance.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Large square chunks (e.g., &lt;code class="docutils literal"&gt;(4000, 4000)&lt;/code&gt;) showed the worst performance,
especially with large matrices. Despite having fewer chunks, their size
seems to hinder cache performance and introduces memory pressure that
degrades throughput. Execution times were consistently higher than other
configurations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Small rectangular chunks such as &lt;code class="docutils literal"&gt;(150, 300)&lt;/code&gt; also underperformed.
As matrix size grew, execution times increased significantly,
reaching nearly 3 seconds at around 2200 MB, likely due to poor cache
utilization and the overhead of managing many tiny chunks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mid-sized square chunks like (1000, 1000) delivered consistently solid
results across all tested sizes. Their timings stay below ~1.2 s with
minimal variance, making them a reliable manual choice.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automatically selected chunks consistently achieved the best performance.
By adapting chunk layout to the data shape and size, the internal
heuristics outpaced all fixed configurations, even rivaling plain NumPy
transpose times.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;img alt="Blosc2 vs NumPy comparison" class="align-center" src="https://blosc.org/images/blosc2-transpose/Numpy-vs-Blosc2-new.png"&gt;
&lt;p&gt;The second plot provides a direct comparison between the standard NumPy
&lt;code class="docutils literal"&gt;transpose&lt;/code&gt; and the newly optimized Blosc2
version. It shows that Blosc2’s optimized implementation closely matches
NumPy's performance, even for larger matrices. The results confirm that with
good chunking strategies and proper memory handling, Blosc2 can achieve
performance on par with NumPy for transposition operations.&lt;/p&gt;
&lt;aside class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;Across all chunk configurations, there is an anomalous latency spike around
the 1500–1600 MB range. This unexpected behavior suggests some low-level
effect (e.g., memory management thresholds, buffer alignment issues, or shifts
in cache access patterns) that is not directly tied to chunk size but rather to
the overall matrix magnitude in that specific region.&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The benchmarks highlight one key insight: Blosc2 is highly sensitive to chunk
shape, and its performance can range from excellent to poor depending on how it
is configured. With the right chunk size, Blosc2 can offer both high-speed
transpositions and advanced features like compression and out-of-core
processing. However, misconfigured chunks, especially those that are too big
or too small, can drastically reduce its effectiveness. This makes chunk tuning
an essential step for anyone seeking to get the most out of Blosc2 for
large-scale matrix operations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="appendix-a-unexpected-numpy-behaviour"&gt;
&lt;h2&gt;Appendix A: Unexpected NumPy behaviour&lt;/h2&gt;
&lt;p&gt;While running the benchmarks, two unusual spikes were consistently observed in
the performance of NumPy around matrices of approximately &lt;strong&gt;500 MB&lt;/strong&gt;, &lt;strong&gt;1100 MB&lt;/strong&gt;
and &lt;strong&gt;2000 MB&lt;/strong&gt; in size. This can be clearly seen in the plot below:&lt;/p&gt;
&lt;img alt="NumPy transposition performance anomaly" class="align-center" src="https://blosc.org/images/blosc2-transpose/only-numpy.png"&gt;
&lt;p&gt;This sudden increase in transposition time is consistently reproducible and
does not seem to correlate with the gradual increase expected from larger
memory sizes.  We have also observed this behaviour in other machines,
although at different sizes.&lt;/p&gt;
&lt;p&gt;This observation reinforces the importance of testing under realistic and
varied conditions, as performance is not always linear or intuitive.&lt;/p&gt;
&lt;aside class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;See NumPy's issue &lt;a class="reference external" href="https://github.com/numpy/numpy/issues/28711"&gt;#28711&lt;/a&gt; for
more details.&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;</description><category>blosc2 optimization matrix transposition compression numpy</category><guid>https://blosc.org/posts/optimizing-chunks-transpose/</guid><pubDate>Tue, 08 Apr 2025 09:00:00 GMT</pubDate></item><item><title>Optimizing chunks for matrix multiplication in Blosc2</title><link>https://blosc.org/posts/optimizing-chunks-blosc2/</link><dc:creator>Ricardo Sales Piquer</dc:creator><description>&lt;p&gt;As data volumes continue to grow in fields like machine learning and scientific computing,
optimizing fundamental operations like matrix multiplication becomes increasingly critical.
Blosc2's chunk-based approach offers a new path to efficiency in these scenarios.&lt;/p&gt;
&lt;section id="matrix-multiplication"&gt;
&lt;h2&gt;Matrix Multiplication&lt;/h2&gt;
&lt;p&gt;Matrix multiplication is a fundamental operation in many scientific and
engineering applications. With the introduction of matrix multiplication into
Blosc2, users can now perform this operation on compressed arrays efficiently.
The key advantages of having matrix multiplication in Blosc2 include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compressed matrices in memory:&lt;/strong&gt;
Blosc2 enables matrices to be stored in a compressed format without sacrificing
the ability to perform operations directly on them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Efficiency with chunks&lt;/strong&gt;:
In computation-intensive applications, matrix multiplication can be executed
without fully decompressing the data, operating on small blocks of data independently,
saving both time and memory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Out-of-core computation:&lt;/strong&gt;
When matrices are too large to fit in main memory, Blosc2 facilitates out-of-core
processing. Data stored on disk is read and processed in optimized chunks,
allowing matrix multiplication operations without loading the entire dataset into
memory.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These features are especially valuable in big data environments and in scientific
or engineering applications where matrix sizes can be overwhelming, enabling
complex calculations efficiently.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;The matrix multiplication functionality is implemented in the &lt;code class="docutils literal"&gt;matmul&lt;/code&gt;
function. It supports Blosc2 &lt;code class="docutils literal"&gt;NDArray&lt;/code&gt; objects and leverages chunked
operations to perform the multiplication efficiently.&lt;/p&gt;
&lt;img alt="How blocked matrix multiplication works" class="align-center" src="https://blosc.org/images/blosc2-matmul/blocked-gemm.png"&gt;
&lt;p&gt;The image illustrates a &lt;strong&gt;blocked matrix multiplication&lt;/strong&gt; approach. The key idea
is to divide matrices into smaller blocks (or chunks) to optimize memory
access and computational efficiency.&lt;/p&gt;
&lt;p&gt;In the image, matrix &lt;cite&gt;A (M x K)&lt;/cite&gt; and matrix &lt;cite&gt;B (K x N)&lt;/cite&gt;
are partitioned into chunks, and these are partitioned into blocks. The resulting
matrix &lt;cite&gt;C (M x N)&lt;/cite&gt; is computed as a sum of block-wise multiplication.&lt;/p&gt;
&lt;p&gt;This method significantly improves cache utilization by ensuring that only the
necessary parts of the matrices are loaded into memory at any given time. In
Blosc2, storing matrix blocks as compressed chunks reduces memory footprint and
enhances performance by enabling on-the-fly decompression.&lt;/p&gt;
&lt;p&gt;Also, Blosc2 supports a wide range of data types. In addition to standard Python
types such as &lt;cite&gt;int&lt;/cite&gt;, &lt;cite&gt;float&lt;/cite&gt;, and &lt;cite&gt;complex&lt;/cite&gt;, it also fully supports various NumPy
types. The currently supported types include:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.int8&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.int16&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.int32&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.int64&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.float32&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.float64&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.complex64&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.complex128&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This versatility allows compression and subsequent processing to be
applied across diverse scenarios, tailored to the specific needs of each
application.&lt;/p&gt;
&lt;p&gt;Together, these features make Blosc2 a flexible and adaptable tool for various
scenarios, but especially suited for the handling of large datasets.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="benchmarks"&gt;
&lt;h2&gt;Benchmarks&lt;/h2&gt;
&lt;p&gt;The benchmarks have been designed to evaluate the performance of the &lt;code class="docutils literal"&gt;matmul&lt;/code&gt;
function under various conditions. Here are the key aspects of our
experimental setup and findings:&lt;/p&gt;
&lt;p&gt;Different matrix sizes were tested using both &lt;code class="docutils literal"&gt;float32&lt;/code&gt; and &lt;code class="docutils literal"&gt;float64&lt;/code&gt;
data types. All the matrices used for multiplication are square.
The variation in matrix sizes helps observe how the function scales and
how the overhead of chunk management impacts performance.&lt;/p&gt;
&lt;p&gt;The x-axis represents the size of the resulting matrix in megabytes (MB).
We used GFLOPS (Giga Floating-Point Operations per Second) to gauge the
computational throughput, allowing us to compare the efficiency of the
&lt;code class="docutils literal"&gt;matmul&lt;/code&gt; function relative to highly optimized libraries like NumPy.&lt;/p&gt;
&lt;p&gt;Blosc2 also incorporates a functionality to automatically select chunks, and
it is represented in the benchmark by "Auto".&lt;/p&gt;
&lt;img alt="Benchmark float32" class="align-center" src="https://blosc.org/images/blosc2-matmul/float32.png"&gt;
&lt;img alt="Benchmark float64" class="align-center" src="https://blosc.org/images/blosc2-matmul/float64.png"&gt;
&lt;p&gt;For smaller matrices, the overhead of managing chunks in Blosc2 can result in
lower GFLOPS compared to NumPy. As the matrix size increases, Blosc2 scales
well, approaching its performance to NumPy.&lt;/p&gt;
&lt;p&gt;Each chunk shape exhibits a peak performance when the matrix size matches the
chunk size, or is a multiple of the chunk shape.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The new matrix multiplication feature in Blosc2 introduces efficient, chunked
computation for compressed arrays. This allows users to handle large datasets
both in memory and on disk without sacrificing performance. The implementation
supports a wide range of data types, making it versatile for various numerical
applications.&lt;/p&gt;
&lt;p&gt;Real-world applications, such as neural network training, demonstrate the
potential benefits in scenarios where memory constraints and large data sizes
are common. While there are some limitations —such as support only for 2D arrays
and the overhead of blocking— the applicability looks promising, like
potential integration with deep learning frameworks.&lt;/p&gt;
&lt;p&gt;Overall, Blosc2 offers a compelling alternative for applications where the
advantages of compression and out-of-core computation are critical, paving
the way for more efficient processing of massive datasets.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="getting-my-feet-wet-with-blosc2"&gt;
&lt;h2&gt;Getting my feet wet with Blosc2&lt;/h2&gt;
&lt;p&gt;In the initial phase of the project, my biggest challenge was understanding how
Blosc2 manages data internally. For matrix multiplication, it was critical to
grasp how to choose the right chunks, since the operation requires that the
ranges of both matrices coincide. After some considerations and a few insightful
conversations with Francesc, I finally understood the underlying mechanics.
This breakthrough allowed me to begin implementing the first versions of my
solution, adjusting the data fragmentation so that each block was properly
aligned for precise computation.&lt;/p&gt;
&lt;p&gt;Another important aspect was adapting to the professional workflow of using Git
for version control. Embracing Git —with its branch creation, regular commits,
and conflict resolution— represented a significant shift in my development
approach. This experience not only improved the organization of my code and
facilitated collaboration but also instilled a structured and disciplined
mindset in managing my projects. This tool has shown to be both valuable and
extremely helpful.&lt;/p&gt;
&lt;p&gt;Finally, the moment when the function finally returned the correct result was
really exciting. After multiple iterations, the rigorous debugging process paid
off as everything fell into place. This breakthrough validated the robustness
of the implementation and boosted my confidence to further optimize and tackle
new challenges in data processing.&lt;/p&gt;
&lt;/section&gt;</description><category>blosc2 optimization matrix multiplication matmul compression</category><guid>https://blosc.org/posts/optimizing-chunks-blosc2/</guid><pubDate>Wed, 12 Mar 2025 09:00:00 GMT</pubDate></item><item><title>Mastering Persistent, Dynamic Reductions and Lazy Expressions in Blosc2</title><link>https://blosc.org/posts/persistent-reductions/</link><dc:creator>Oumaima Ech Chdig, Francesc Alted</dc:creator><description>&lt;p&gt;Working with large volumes of data is challenging, but Blosc2 offers unique tools to facilitate processing.&lt;/p&gt;
&lt;p&gt;Blosc2 is a powerful data compression library designed to handle and process large datasets effectively. One standout feature is its support for &lt;strong&gt;lazy expressions&lt;/strong&gt; and &lt;strong&gt;persistent and dynamic reductions&lt;/strong&gt;. These tools make it possible to define complex calculations that execute only when necessary, reducing memory usage and optimizing processing time, which can be a game-changer when dealing with massive arrays.&lt;/p&gt;
&lt;p&gt;In this guide, we’ll break down how to use these features to streamline data manipulation and get better performance out of your workflows. We’ll also see how resizing operand arrays is automatically reflected in the results, highlighting the flexibility of lazy expressions.&lt;/p&gt;
&lt;section id="getting-started-with-arrays-and-broadcasting"&gt;
&lt;h2&gt;Getting Started with Arrays and Broadcasting&lt;/h2&gt;
&lt;p&gt;Blosc2 works smoothly with arrays of various shapes and dimensions, enabling users to perform calculations such as addition or multiplication across arrays of different sizes. This is where &lt;strong&gt;broadcasting&lt;/strong&gt; comes in. With broadcasting, Blosc2 automatically aligns the shapes of arrays for easy operations. This means you don’t need to manually adjust array dimensions to match, a huge time-saver when working with multidimensional data.&lt;/p&gt;
&lt;p&gt;For example, let’s suppose we have an array representing a large dataset and, &lt;cite&gt;a&lt;/cite&gt;, another representing a smaller dimension, &lt;cite&gt;c&lt;/cite&gt;.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_46973db5a058442283d3c7f44274a199-1" name="rest_code_46973db5a058442283d3c7f44274a199-1" href="https://blosc.org/posts/persistent-reductions/#rest_code_46973db5a058442283d3c7f44274a199-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;fill_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_46973db5a058442283d3c7f44274a199-2" name="rest_code_46973db5a058442283d3c7f44274a199-2" href="https://blosc.org/posts/persistent-reductions/#rest_code_46973db5a058442283d3c7f44274a199-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fill_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_46973db5a058442283d3c7f44274a199-3" name="rest_code_46973db5a058442283d3c7f44274a199-3" href="https://blosc.org/posts/persistent-reductions/#rest_code_46973db5a058442283d3c7f44274a199-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As seen above, broadcasting works automatically (and efficiently) with arrays of compressed data.  Also, the correct data type of the result will be inferred from the operands and the expression. Thanks to this mechanism, the interpreter automatically adjusts the dimensions and data types of the arrays involved in the operation, allowing calculations to be performed without the need for manual adjustments.&lt;/p&gt;
&lt;img alt="/images/blosc2-broadcast.png" src="https://blosc.org/images/blosc2-broadcast.png" style="width: 50%;"&gt;
&lt;p&gt;This approach is ideal for quick and simple data analysis, especially when working with large volumes of information that require frequent operations across different dimensions.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="setting-up-and-saving-lazy-expressions"&gt;
&lt;h2&gt;Setting Up and Saving Lazy Expressions&lt;/h2&gt;
&lt;p&gt;Imagine you need to perform a calculation like &lt;cite&gt;sum(a, axis=0) + b * sin(c)&lt;/cite&gt;. Rather than immediately calculating this, Blosc2’s &lt;strong&gt;lazy expression&lt;/strong&gt; feature lets you store the expression for later. By using &lt;cite&gt;blosc2.lazyexpr&lt;/cite&gt;, you define complex mathematical formulas and only trigger their execution when required, and only for the part of the resulting array that you are interested in. This is highly advantageous for large computations that might not be needed right away or that may depend on evolving data.&lt;/p&gt;
&lt;p&gt;Let's see how that works with a little more complex expression:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_6ef5f7e34ce042c290947658d286d42d-1" name="rest_code_6ef5f7e34ce042c290947658d286d42d-1" href="https://blosc.org/posts/persistent-reductions/#rest_code_6ef5f7e34ce042c290947658d286d42d-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Create arrays with specific dimensions and values&lt;/span&gt;
&lt;a id="rest_code_6ef5f7e34ce042c290947658d286d42d-2" name="rest_code_6ef5f7e34ce042c290947658d286d42d-2" href="https://blosc.org/posts/persistent-reductions/#rest_code_6ef5f7e34ce042c290947658d286d42d-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"a.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6ef5f7e34ce042c290947658d286d42d-3" name="rest_code_6ef5f7e34ce042c290947658d286d42d-3" href="https://blosc.org/posts/persistent-reductions/#rest_code_6ef5f7e34ce042c290947658d286d42d-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"b.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6ef5f7e34ce042c290947658d286d42d-4" name="rest_code_6ef5f7e34ce042c290947658d286d42d-4" href="https://blosc.org/posts/persistent-reductions/#rest_code_6ef5f7e34ce042c290947658d286d42d-4"&gt;&lt;/a&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"c.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6ef5f7e34ce042c290947658d286d42d-5" name="rest_code_6ef5f7e34ce042c290947658d286d42d-5" href="https://blosc.org/posts/persistent-reductions/#rest_code_6ef5f7e34ce042c290947658d286d42d-5"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Define a lazy expression and the operands for later execution&lt;/span&gt;
&lt;a id="rest_code_6ef5f7e34ce042c290947658d286d42d-6" name="rest_code_6ef5f7e34ce042c290947658d286d42d-6" href="https://blosc.org/posts/persistent-reductions/#rest_code_6ef5f7e34ce042c290947658d286d42d-6"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Note that we are using a string version of the expression here&lt;/span&gt;
&lt;a id="rest_code_6ef5f7e34ce042c290947658d286d42d-7" name="rest_code_6ef5f7e34ce042c290947658d286d42d-7" href="https://blosc.org/posts/persistent-reductions/#rest_code_6ef5f7e34ce042c290947658d286d42d-7"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# so that it can be re-opened as-is later on&lt;/span&gt;
&lt;a id="rest_code_6ef5f7e34ce042c290947658d286d42d-8" name="rest_code_6ef5f7e34ce042c290947658d286d42d-8" href="https://blosc.org/posts/persistent-reductions/#rest_code_6ef5f7e34ce042c290947658d286d42d-8"&gt;&lt;/a&gt;&lt;span class="n"&gt;expression&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sum(a, axis=0) + b * sin(c)"&lt;/span&gt;
&lt;a id="rest_code_6ef5f7e34ce042c290947658d286d42d-9" name="rest_code_6ef5f7e34ce042c290947658d286d42d-9" href="https://blosc.org/posts/persistent-reductions/#rest_code_6ef5f7e34ce042c290947658d286d42d-9"&gt;&lt;/a&gt;&lt;span class="n"&gt;lazy_expression&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lazyexpr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6ef5f7e34ce042c290947658d286d42d-10" name="rest_code_6ef5f7e34ce042c290947658d286d42d-10" href="https://blosc.org/posts/persistent-reductions/#rest_code_6ef5f7e34ce042c290947658d286d42d-10"&gt;&lt;/a&gt;&lt;span class="n"&gt;lazy_expression&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"arrayResult.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this code, &lt;cite&gt;sum(a, axis=0) + b * sin(c)&lt;/cite&gt; is defined but not executed immediately. When you’re ready to use the result, you can call &lt;cite&gt;lazy_expression.compute()&lt;/cite&gt; (returns a Blosc2 array that is compressed by default) to run the calculation. Alternatively, you can specify the part of the result that you are interested in with &lt;cite&gt;lazy_expression[0, :]&lt;/cite&gt; (returns a NumPy array). This way, you save CPU and memory and only perform the computation when necessary.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="dynamic-computation-reusing-and-updating-results"&gt;
&lt;h2&gt;Dynamic Computation: Reusing and Updating Results&lt;/h2&gt;
&lt;p&gt;Another big advantage of Blosc2 is its ability to compute persistent expressions that are &lt;strong&gt;dynamic&lt;/strong&gt;: when an operand is enlarged, Blosc2 re-adapts the expression to account for its new shape. This approach significantly speeds up processing time, especially when working with frequently updated or real-time data.&lt;/p&gt;
&lt;p&gt;For instance, if you have an expression stored, and only part of your dataset changes, Blosc2 can apply reductions dynamically to efficiently update the sum:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-1" name="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-1" href="https://blosc.org/posts/persistent-reductions/#rest_code_6a74f440c47d4ebd9fba11fcabdb0098-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Resizing arrays and updating values&lt;/span&gt;
&lt;a id="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-2" name="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-2" href="https://blosc.org/posts/persistent-reductions/#rest_code_6a74f440c47d4ebd9fba11fcabdb0098-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resize&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;a id="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-3" name="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-3" href="https://blosc.org/posts/persistent-reductions/#rest_code_6a74f440c47d4ebd9fba11fcabdb0098-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;a id="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-4" name="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-4" href="https://blosc.org/posts/persistent-reductions/#rest_code_6a74f440c47d4ebd9fba11fcabdb0098-4"&gt;&lt;/a&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resize&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;a id="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-5" name="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-5" href="https://blosc.org/posts/persistent-reductions/#rest_code_6a74f440c47d4ebd9fba11fcabdb0098-5"&gt;&lt;/a&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;
&lt;a id="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-6" name="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-6" href="https://blosc.org/posts/persistent-reductions/#rest_code_6a74f440c47d4ebd9fba11fcabdb0098-6"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Open the saved file&lt;/span&gt;
&lt;a id="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-7" name="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-7" href="https://blosc.org/posts/persistent-reductions/#rest_code_6a74f440c47d4ebd9fba11fcabdb0098-7"&gt;&lt;/a&gt;&lt;span class="n"&gt;lazy_expression&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;url_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-8" name="rest_code_6a74f440c47d4ebd9fba11fcabdb0098-8" href="https://blosc.org/posts/persistent-reductions/#rest_code_6a74f440c47d4ebd9fba11fcabdb0098-8"&gt;&lt;/a&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lazy_expression&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this case, the final &lt;cite&gt;result&lt;/cite&gt; will have a shape of &lt;cite&gt;(30, 40)&lt;/cite&gt; (instead of the previous &lt;cite&gt;(20, 40)&lt;/cite&gt;). This allows for quick adaptability, which is crucial in data environments where values evolve constantly.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="why-persistent-reductions-and-lazy-expressions-matter"&gt;
&lt;h2&gt;Why Persistent Reductions and Lazy Expressions Matter&lt;/h2&gt;
&lt;p&gt;These features make Blosc2 a top choice for working with large datasets, as they allow for:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Broadcasting&lt;/strong&gt; of memory, on-disk or network operands.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Efficient use of CPU and memory&lt;/strong&gt; by only executing calculations when needed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dynamic expressions&lt;/strong&gt; that adapt to changing data in operands.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enhanced performance&lt;/strong&gt; due to streamlined, multi-threaded and pre-fetched calculations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Together, lazy expressions and persistent reductions provide a flexible, resource-efficient way to manage complex data processes. They’re perfect for real-time analysis, evolving datasets, or any high-performance computing tasks requiring dynamic data handling.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Blosc2’s features offer a way to make data processing smarter and faster. If you work with large arrays or require adaptable workflows, Blosc2 can help you make the most of your data processing resources.&lt;/p&gt;
&lt;p&gt;For more in-depth guidance, visit the &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/getting_started/tutorials/05.persistent-reductions.html"&gt;full tutorial on Blosc2&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;</description><category>data-processing</category><category>large-datasets</category><category>lazy-expressions</category><category>persistent-reduction</category><guid>https://blosc.org/posts/persistent-reductions/</guid><pubDate>Tue, 05 Nov 2024 12:58:20 GMT</pubDate></item></channel></rss>