<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Blosc Home Page  (Posts about reductions)</title><link>https://blosc.org/</link><description></description><atom:link href="https://blosc.org/categories/reductions.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 &lt;a href="mailto:blosc@blosc.org"&gt;The Blosc Developers&lt;/a&gt; </copyright><lastBuildDate>Wed, 04 Mar 2026 11:43:33 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>N-dimensional reductions with Blosc2</title><link>https://blosc.org/posts/ndim-reductions/</link><dc:creator>Oumaima Ech Chdig, Francesc Alted</dc:creator><description>&lt;p&gt;NumPy is widely recognized for its ability to perform efficient computations and manipulations on multidimensional arrays. This library is fundamental for many aspects of data analysis and science due to its speed and flexibility in handling numerical data. However, when datasets reach considerable sizes, working with uncompressed data can result in prolonged access times and intensive memory usage, which can negatively impact overall performance.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://www.blosc.org/python-blosc2"&gt;Python-Blosc2&lt;/a&gt; leverages the power of NumPy to perform reductions on compressed multidimensional arrays. But, by compressing data with Blosc2, it is possible to reduce the memory and storage space required to store large datasets, while maintaining fast reduction times. This is especially beneficial for systems with memory constraints, as it allows for faster data access and operation.&lt;/p&gt;
&lt;p&gt;In this blog, we will explore how Python-Blosc2 can perform data reductions with in-memory &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/reference/ndarray.html"&gt;NDArray&lt;/a&gt; objects (or any other object fulfilling the &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/reference/lazyarray.html"&gt;LazyArray interface&lt;/a&gt;) and how the speed of these operations can be optimized by using different chunk shapes, compression levels and codecs. We will then compare the performance of Python-Blosc2 with NumPy.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The code snippets shown in this blog are part of a &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/doc/getting_started/tutorials/04.reductions.ipynb"&gt;Jupyter notebook&lt;/a&gt; that you can run on your own machine. For that, you will need to install a recent version of Python-Blosc2: &lt;cite&gt;pip install 'blosc2&amp;gt;=3.0.0b3'&lt;/cite&gt;; feel free to experiment with different parameters and share your results with us!&lt;/p&gt;
&lt;section id="the-3d-array"&gt;
&lt;h2&gt;The 3D array&lt;/h2&gt;
&lt;p&gt;We will use a 3D array of type float64 with shape (1000, 1000, 1000). This array will be filled with values from 0 to 1000, and the goal will be to compute the sum of values in stripes of 100 elements in one axis, and including all the values in the other axis. We will perform reductions along the X, Y, and Z axes, comparing Blosc2 performance (with and without compression) against NumPy.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="reducing-with-numpy"&gt;
&lt;h2&gt;Reducing with NumPy&lt;/h2&gt;
&lt;p&gt;We will start by performing different sum reductions using NumPy.  First, summing along the X, Y, and Z axes (and getting 2D arrays as result) and then summing along all axis (and getting an scalar as result).&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_5915d57e050a4565985e2725fd28dfc0-1" name="rest_code_5915d57e050a4565985e2725fd28dfc0-1" href="https://blosc.org/posts/ndim-reductions/#rest_code_5915d57e050a4565985e2725fd28dfc0-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;axes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"X"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"all"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_5915d57e050a4565985e2725fd28dfc0-2" name="rest_code_5915d57e050a4565985e2725fd28dfc0-2" href="https://blosc.org/posts/ndim-reductions/#rest_code_5915d57e050a4565985e2725fd28dfc0-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;meas_np&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"sum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="s2"&gt;"time"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}}&lt;/span&gt;
&lt;a id="rest_code_5915d57e050a4565985e2725fd28dfc0-3" name="rest_code_5915d57e050a4565985e2725fd28dfc0-3" href="https://blosc.org/posts/ndim-reductions/#rest_code_5915d57e050a4565985e2725fd28dfc0-3"&gt;&lt;/a&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;a id="rest_code_5915d57e050a4565985e2725fd28dfc0-4" name="rest_code_5915d57e050a4565985e2725fd28dfc0-4" href="https://blosc.org/posts/ndim-reductions/#rest_code_5915d57e050a4565985e2725fd28dfc0-4"&gt;&lt;/a&gt;    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s2"&gt;"all"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
&lt;a id="rest_code_5915d57e050a4565985e2725fd28dfc0-5" name="rest_code_5915d57e050a4565985e2725fd28dfc0-5" href="https://blosc.org/posts/ndim-reductions/#rest_code_5915d57e050a4565985e2725fd28dfc0-5"&gt;&lt;/a&gt;    &lt;span class="n"&gt;t0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;a id="rest_code_5915d57e050a4565985e2725fd28dfc0-6" name="rest_code_5915d57e050a4565985e2725fd28dfc0-6" href="https://blosc.org/posts/ndim-reductions/#rest_code_5915d57e050a4565985e2725fd28dfc0-6"&gt;&lt;/a&gt;    &lt;span class="n"&gt;meas_np&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sum"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_5915d57e050a4565985e2725fd28dfc0-7" name="rest_code_5915d57e050a4565985e2725fd28dfc0-7" href="https://blosc.org/posts/ndim-reductions/#rest_code_5915d57e050a4565985e2725fd28dfc0-7"&gt;&lt;/a&gt;    &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;t0&lt;/span&gt;
&lt;a id="rest_code_5915d57e050a4565985e2725fd28dfc0-8" name="rest_code_5915d57e050a4565985e2725fd28dfc0-8" href="https://blosc.org/posts/ndim-reductions/#rest_code_5915d57e050a4565985e2725fd28dfc0-8"&gt;&lt;/a&gt;    &lt;span class="n"&gt;meas_np&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;t0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/section&gt;
&lt;section id="reducing-with-blosc2"&gt;
&lt;h2&gt;Reducing with Blosc2&lt;/h2&gt;
&lt;p&gt;Now let's create the Blosc2 array from the NumPy array.  First, let's define the parameters for Blosc2: number of threads, compression levels, codecs, and chunk sizes. We will exercise different combinations of these parameters (including no compression) to evaluate the performance of Python-Blosc2 in reducing data in 3D arrays.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_a58fe08dc88f4dc7b27658348dfa2906-1" name="rest_code_a58fe08dc88f4dc7b27658348dfa2906-1" href="https://blosc.org/posts/ndim-reductions/#rest_code_a58fe08dc88f4dc7b27658348dfa2906-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Params for Blosc2&lt;/span&gt;
&lt;a id="rest_code_a58fe08dc88f4dc7b27658348dfa2906-2" name="rest_code_a58fe08dc88f4dc7b27658348dfa2906-2" href="https://blosc.org/posts/ndim-reductions/#rest_code_a58fe08dc88f4dc7b27658348dfa2906-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;clevels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_a58fe08dc88f4dc7b27658348dfa2906-3" name="rest_code_a58fe08dc88f4dc7b27658348dfa2906-3" href="https://blosc.org/posts/ndim-reductions/#rest_code_a58fe08dc88f4dc7b27658348dfa2906-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;codecs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Codec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LZ4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Codec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The function shown below is responsible for creating the different arrays and performing the reductions for each combination of parameters.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-1" name="rest_code_445d799d305e44f09176b32754fcac2f-1" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Create a 3D array of type float64&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-2" name="rest_code_445d799d305e44f09176b32754fcac2f-2" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-2"&gt;&lt;/a&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;measure_blosc2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-3" name="rest_code_445d799d305e44f09176b32754fcac2f-3" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-3"&gt;&lt;/a&gt;    &lt;span class="n"&gt;meas&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-4" name="rest_code_445d799d305e44f09176b32754fcac2f-4" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-4"&gt;&lt;/a&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;codec&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;codecs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-5" name="rest_code_445d799d305e44f09176b32754fcac2f-5" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-5"&gt;&lt;/a&gt;        &lt;span class="n"&gt;meas&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;codec&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-6" name="rest_code_445d799d305e44f09176b32754fcac2f-6" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-6"&gt;&lt;/a&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;clevel&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;clevels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-7" name="rest_code_445d799d305e44f09176b32754fcac2f-7" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-7"&gt;&lt;/a&gt;            &lt;span class="n"&gt;meas&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;codec&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;clevel&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"sum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="s2"&gt;"time"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}}&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-8" name="rest_code_445d799d305e44f09176b32754fcac2f-8" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-8"&gt;&lt;/a&gt;            &lt;span class="n"&gt;cparams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"clevel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;clevel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"codec"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;codec&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-9" name="rest_code_445d799d305e44f09176b32754fcac2f-9" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-9"&gt;&lt;/a&gt;            &lt;span class="n"&gt;a1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cparams&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cparams&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-10" name="rest_code_445d799d305e44f09176b32754fcac2f-10" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-10"&gt;&lt;/a&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;clevel&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-11" name="rest_code_445d799d305e44f09176b32754fcac2f-11" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-11"&gt;&lt;/a&gt;                &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"cratio for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;codec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; + SHUFFLE: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;a1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;schunk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cratio&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;.1f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;x"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-12" name="rest_code_445d799d305e44f09176b32754fcac2f-12" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-12"&gt;&lt;/a&gt;            &lt;span class="c1"&gt;# Iterate on Blosc2 and NumPy arrays&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-13" name="rest_code_445d799d305e44f09176b32754fcac2f-13" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-13"&gt;&lt;/a&gt;            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-14" name="rest_code_445d799d305e44f09176b32754fcac2f-14" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-14"&gt;&lt;/a&gt;                &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s2"&gt;"all"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-15" name="rest_code_445d799d305e44f09176b32754fcac2f-15" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-15"&gt;&lt;/a&gt;                &lt;span class="n"&gt;t0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-16" name="rest_code_445d799d305e44f09176b32754fcac2f-16" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-16"&gt;&lt;/a&gt;                &lt;span class="c1"&gt;# Perform the sum of the stripe (defined by the slice_)&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-17" name="rest_code_445d799d305e44f09176b32754fcac2f-17" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-17"&gt;&lt;/a&gt;                &lt;span class="n"&gt;meas&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;codec&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;clevel&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"sum"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-18" name="rest_code_445d799d305e44f09176b32754fcac2f-18" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-18"&gt;&lt;/a&gt;                &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;t0&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-19" name="rest_code_445d799d305e44f09176b32754fcac2f-19" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-19"&gt;&lt;/a&gt;                &lt;span class="n"&gt;meas&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;codec&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;clevel&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-20" name="rest_code_445d799d305e44f09176b32754fcac2f-20" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-20"&gt;&lt;/a&gt;                &lt;span class="c1"&gt;# If interested, you can uncomment the following line to check the results&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-21" name="rest_code_445d799d305e44f09176b32754fcac2f-21" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-21"&gt;&lt;/a&gt;                &lt;span class="c1"&gt;#np.testing.assert_allclose(meas[codec][clevel]["sum"][axis],&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-22" name="rest_code_445d799d305e44f09176b32754fcac2f-22" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-22"&gt;&lt;/a&gt;                &lt;span class="c1"&gt;#                           meas_np["sum"][axis])&lt;/span&gt;
&lt;a id="rest_code_445d799d305e44f09176b32754fcac2f-23" name="rest_code_445d799d305e44f09176b32754fcac2f-23" href="https://blosc.org/posts/ndim-reductions/#rest_code_445d799d305e44f09176b32754fcac2f-23"&gt;&lt;/a&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;meas&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;section id="automatic-chunking"&gt;
&lt;h3&gt;Automatic chunking&lt;/h3&gt;
&lt;p&gt;Let's plot the results for the X, Y, and Z axes, comparing the performance of Python-Blosc2 with different configurations against NumPy.&lt;/p&gt;
&lt;img alt="/images/ndim-reductions/plot_automatic_chunking.png" src="https://blosc.org/images/ndim-reductions/plot_automatic_chunking.png" style="width: 50%;"&gt;
&lt;p&gt;We can see that reduction along the X axis is much slower than those along the Y and Z axis for the Blosc2 case. This is because the automatically computed chunk shape is (1, 1000, 1000) making the overhead of partial sums larger. In addition, we see that, with the exception of the X axis, Blosc2+LZ4+SHUFFLE actually achieves far better performance than NumPy.  Finally, when not using compression inside Blosc2, we never see an advantage. See later for a discussion on these results.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="manual-chunking"&gt;
&lt;h3&gt;Manual chunking&lt;/h3&gt;
&lt;p&gt;Let's try to improve the performance by manually setting the chunk size. In the next case, we want to make performance similar along the three axes, so we will set the chunk size to (100, 100, 100) (8 MB).&lt;/p&gt;
&lt;img alt="/images/ndim-reductions/plot_manual_chunking.png" src="https://blosc.org/images/ndim-reductions/plot_manual_chunking.png" style="width: 50%;"&gt;
&lt;p&gt;In this case, performance in the X axis is already faster than Y and Z axes for Blosc2. Interestingly, performance is also faster than NumPy in X axis, while being very similar in Y and Z axis.&lt;/p&gt;
&lt;p&gt;We could proceed further and try to fine tune the chunk size to get even better performance, but this is out of the scope of this blog (and more a task for &lt;a class="reference external" href="https://ironarray.io/btune"&gt;Btune&lt;/a&gt;). Instead, we will try to make some sense on the results above; see below.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="why-blosc2-can-be-faster-than-numpy"&gt;
&lt;h2&gt;Why Blosc2 can be faster than NumPy?&lt;/h2&gt;
&lt;p&gt;As Blosc2 is using the NumPy machinery for computing reductions behind the scenes, why is Blosc2 faster than NumPy in several cases above? The answer lies in the way Blosc2 and NumPy access data in memory.&lt;/p&gt;
&lt;p&gt;Blosc2 splits data into chunks and blocks to compress and decompress data efficiently. When accessing data, a full chunk is fetched from memory and decompressed by the CPU (as seen in the image below, left side). If the chunk size is small enough to fit in the CPU cache, the CPU can write the decompressed chunk faster, as it does not need to travel back to the main memory. Later, when NumPy is called to perform the reduction on the decompressed chunk, it can access the data faster, as it is already in the CPU cache (image below, right side).&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;img alt="/images/ndim-reductions/Blosc2-decompress.png" class="align-center" src="https://blosc.org/images/ndim-reductions/Blosc2-decompress.png" style="width: 75%;"&gt;
&lt;/td&gt;
&lt;td&gt;&lt;img alt="/images/ndim-reductions/Blosc2-NumPy.png" class="align-center" src="https://blosc.org/images/ndim-reductions/Blosc2-NumPy.png" style="width: 75%;"&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;But for allowing NumPy go faster, Blosc2 needs to decompress several chunks prior to NumPy performing the reduction operation. The decompressed chunks are stored on a queue, waiting for further processing; this is why Blosc2 needs to handle several (3 or 4) chunks simultaneously. In our case, the L3 cache size of our CPU (Intel 13900K) is 36 MB, and Blosc2 has chosen 8 MB for the chunk size, allowing to store up to 4 chunks in L3, which is near to optimal.  Also, when we have chosen the chunk size to be (100, 100, 100), the chunk size is still 8 MB, which continues to be fine indeed.&lt;/p&gt;
&lt;p&gt;All in all, it is not that Blosc2 is faster than NumPy, but rather that &lt;em&gt;it is allowing NumPy to leverage the CPU cache more efficiently&lt;/em&gt;.  Having said this, we still need some explanation on why the performance can be so different along the X, Y, and Z axes, specially for the first chunk shape (automatic) above.  Let's address this in the next section.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performing-reductions-on-3d-arrays"&gt;
&lt;h2&gt;Performing reductions on 3D arrays&lt;/h2&gt;
&lt;img alt="/images/ndim-reductions/3D-cube-plane.png" src="https://blosc.org/images/ndim-reductions/3D-cube-plane.png" style="width: 45%;"&gt;
&lt;p&gt;On a three-dimensional environment, like the one shown in the image, data is organized in a cubic space with three axes: X, Y, and Z. By default, Blosc2 chooses the chunk size so that it fits in the CPU cache comfortably. On the other hand, it tries to follow the NumPy convention of storing data row-wise; so, this is why the default chunk shape has been chosen as (1, 1000, 1000).  In this case, it is clear that reduction times along different axes are not going to be the same, as the sizes of the chunk in different axes are not uniform (actually, there is a large asymmetry).&lt;/p&gt;
&lt;p&gt;The difference in cost while traversing data values can be visualized more easily on a 2D array:&lt;/p&gt;
&lt;img alt="/images/ndim-reductions/memory-access-2D-x.png" src="https://blosc.org/images/ndim-reductions/memory-access-2D-x.png" style="width: 70%;"&gt;
&lt;p&gt;Reduction along the X axis: When accessing a row (red line), the CPU can access these values (red points) from memory sequentially, but they need to be stored on an accumulator. The next rows needs to be fetched from memory and be added to the accumulator. If the size of the accumulator is large (in this case is &lt;cite&gt;1000 * 1000 * 8 = 8 MB&lt;/cite&gt;), it does not fit in low level CPU caches, and has to be peformed in the relatively slow L3.&lt;/p&gt;
&lt;img alt="/images/ndim-reductions/memory-access-2D-y.png" src="https://blosc.org/images/ndim-reductions/memory-access-2D-y.png" style="width: 55%;"&gt;
&lt;p&gt;Reducing along the Y axis: When accessing a row (green line), the CPU can access these values (green points) from memory sequentially but, contrarily to the case above, they don't even need an accumulator, and the sum of the row (marked as an &lt;cite&gt;*&lt;/cite&gt;) is final.  So, although the number of sum operations is the same as above, the required time is smaller because there is no need of updating &lt;em&gt;all&lt;/em&gt; the values of the accumulator per row, but only one at a time, which is faster in modern CPUs.&lt;/p&gt;
&lt;section id="tweaking-the-chunk-size"&gt;
&lt;h3&gt;Tweaking the chunk size&lt;/h3&gt;
&lt;img alt="/images/ndim-reductions/3D-cube.png" src="https://blosc.org/images/ndim-reductions/3D-cube.png" style="width: 40%;"&gt;
&lt;p&gt;However, when Blosc2 is instructed to create chunks that are the same size for all the axes (chunks=(100, 100, 100)), the situation changes. In this case, an accumulator is needed for each chunk (sub-cube in figure above), but as it is relatively small (&lt;cite&gt;100 * 100 * 8 = 80 KB&lt;/cite&gt;), and fits in L2, so accumulation in the X axis is faster than in the previous scenario (remember that it needs to do the accumulation in L3).&lt;/p&gt;
&lt;p&gt;Incidentally, now Blosc2 performance along X axis is even better than in the Y and Z axes, as the CPU can access data in a more efficient way. Furthermore, Blosc2 performance is up to 1.5x better than NumPy in the X axis (while being similar, or even a bit better along Y and Z axes), which is a quite remarkable feat.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="effect-of-using-different-codecs-in-python-blosc2"&gt;
&lt;h3&gt;Effect of using different codecs in Python-Blosc2&lt;/h3&gt;
&lt;p&gt;Compression and decompression consume CPU and memory resources. Differentiating between various codecs and configurations allows for evaluating how each option impacts the use of these resources, helping to choose the most efficient option for the operating environment. Finding the right balance between compression ratio and speed is crucial for optimizing performance.&lt;/p&gt;
&lt;p&gt;In the plots above, we can see how using the LZ4 codec is striking such a balance, as it achieves the best performance in general, even above a non-compressed scenario. This is because LZ4 is tuned towards speed, and the time to compress and decompress the data is very low. On the other hand, ZSTD is a codec that is optimized for compression ratio (although not shown, in this case it typically compresses between 2x and 3x more than LZ4), and hence it is a bit slower.  However, it is still faster than the non-compressed case, as compression requires reduced memory transmission, and this compensates for the additional CPU time required for compression and decompression.&lt;/p&gt;
&lt;p&gt;We have just scraped the surface for some of the compression parameters that can be tuned in Blosc2. You can use the &lt;cite&gt;cparams&lt;/cite&gt; dict with the different parameters in &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/reference/autofiles/top_level/blosc2.compress2.html#blosc2"&gt;blosc2.compress2()&lt;/a&gt;  to set the compression level, &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/reference/autofiles/top_level/blosc2.Codec.html"&gt;codec&lt;/a&gt; , &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/reference/autofiles/top_level/blosc2.Filter.html"&gt;filters&lt;/a&gt; and other parameters.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Understanding the balance between space savings and the additional time required to process the data is important. Testing different compression settings can help finding the method that offers the best trade-off between reduced size and processing time. The fact that Blosc2 automatically chooses the chunk shape, makes it easy for the user to get a decently good performance, without having to worry about the details of the CPU cache. In addition, as we have shown, we can fine tune the chunk shape in case the default one does not fit our needs (e.g. we need more uniform performance along all axes).&lt;/p&gt;
&lt;p&gt;Besides the sum() reduction exercised here, Blosc2 supports a fair range of reduction operators (mean, std, min, max, all, any, etc.), and you are invited to &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/reference/reduction_functions.html"&gt;explore them&lt;/a&gt;.  Moreover, it is also possible to use reductions even for very large arrays that are stored on disk. This opens the door to a wide range of possibilities for data analysis and science, allowing for efficient reductions on large datasets that are compressed on-disk and with minimal memory usage. We will explore this in a forthcoming blog.&lt;/p&gt;
&lt;p&gt;We would like to thank &lt;a class="reference external" href="https://ironarray.io"&gt;ironArray&lt;/a&gt; for supporting the development of the computing capabilities of Blosc2.  Then, to NumFOCUS for recently providing a small grant that is helping us to improve the documentation for the project.  Last but not least, we would like to thank the Blosc community for providing so many valuable insights and feedback that have helped us to improve the performance and usability of Blosc2.&lt;/p&gt;
&lt;/section&gt;</description><category>in-memory</category><category>ndim</category><category>reductions</category><guid>https://blosc.org/posts/ndim-reductions/</guid><pubDate>Wed, 28 Aug 2024 10:32:20 GMT</pubDate></item></channel></rss>