<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Blosc Home Page  (Posts by Francesc Alted)</title><link>https://blosc.org/</link><description></description><atom:link href="https://blosc.org/authors/francesc-alted.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 &lt;a href="mailto:blosc@blosc.org"&gt;The Blosc Developers&lt;/a&gt; </copyright><lastBuildDate>Wed, 04 Mar 2026 11:43:34 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>The Surprising Speed of Compressed Data: A Roofline Story</title><link>https://blosc.org/posts/roofline-analysis-blosc2/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;Can a library designed for computing with compressed data ever hope to outperform highly optimized numerical engines like NumPy and Numexpr? The answer is complex, and it hinges on the "memory wall" — a phenomenon which occurs when system memory limitations start to drag on CPU. This post uses Roofline analysis to explore this very question, dissecting the performance of Blosc2 and revealing the surprising scenarios where it can gain a competitive edge.&lt;/p&gt;
&lt;aside class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update on 2026-02-06:&lt;/strong&gt; We have published a follow-up post, &lt;a class="reference external" href="https://ironarray.io/blog/miniexpr-powered-blosc2"&gt;Python-Blosc2 4.0: Unleashing Compute Speed with miniexpr&lt;/a&gt;, which revisits this topic. This new post explains how the integration of miniexpr into Blosc2's compute engine has significantly improved performance—especially for in-memory operations—updating the conclusions drawn in this original analysis. We highly recommend reading the new post for the latest insights.&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="tl-dr"&gt;
&lt;h2&gt;TL;DR&lt;/h2&gt;
&lt;p&gt;Before we dive in, here's what we discovered:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;For in-memory tasks, Blosc2's overhead can make it slower than Numexpr, especially on x86 CPUs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This changes on Apple Silicon, where Blosc2's performance is much more competitive.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For on-disk tasks, Blosc2 consistently outperforms NumPy/Numexpr on both platforms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The "memory wall" is real, and disk I/O is an even bigger one, which is where compression shines.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="a-trip-down-memory-lane"&gt;
&lt;h2&gt;A Trip Down Memory Lane&lt;/h2&gt;
&lt;p&gt;Let's rewind to 2008. NumPy 1.0 was just a toddler, and the computing world was buzzing with the arrival of multi-core CPUs and their shiny new SIMD instructions. On the &lt;a class="reference external" href="https://mail.python.org/archives/list/numpy-discussion@python.org/thread/YPX5PGM5WZXQAMQ5AZLLEU67D5RZBOVH/#YFX3G2RYHTIYMFDPCHKHED5F7CT4OTVK"&gt;NumPy mailing list&lt;/a&gt;, a group of us were brainstorming how to harness this new power to make Python's number-crunching faster.&lt;/p&gt;
&lt;p&gt;The idea seemed simple: trust newer compilers to use SIMD (and, possibly, data alignment) to perform operations on multiple data points at once. To test this, a &lt;a class="reference external" href="https://mail.python.org/archives/list/numpy-discussion@python.org/message/S2IEJV7U7TXHQLEMORGME6KIGRZTG33L/"&gt;simple benchmark&lt;/a&gt; was shared: multiply two large vectors element-wise. Developers from around the community ran the code and shared their results. What came back was a revelation.&lt;/p&gt;
&lt;p&gt;For small arrays that fit snugly into the CPU's high-speed cache, SIMD was quite good at accelerating computations. But as soon as the arrays grew larger, the performance boost vanished. Some of us were already suspicious about the new "memory wall" that had been growing lately, seemingly due to the widening gap between CPU speeds and memory bandwidth.  However, a conclusive answer (and solution) was still lacking.&lt;/p&gt;
&lt;p&gt;But amidst the confusion, a curious anomaly emerged. One machine, belonging to NumPy legend Charles Harris, was consistently outperforming the rest—even those with faster processors. It made no sense. We checked our code, our compilers, everything. Yet, his machine remained inexplicably faster. The answer, when it finally came, wasn't in the software at all. Charles, a hardware wizard, had &lt;a class="reference external" href="https://mail.python.org/archives/list/numpy-discussion@python.org/message/YFX3G2RYHTIYMFDPCHKHED5F7CT4OTVK/"&gt;tinkered with his BIOS to overclock his RAM&lt;/a&gt; from 667 MHz to a whopping 800 MHz.&lt;/p&gt;
&lt;p&gt;That was my lightbulb moment: for data-intensive tasks, raw CPU clock speed was not the limiting factor; memory bandwidth was what truly mattered.&lt;/p&gt;
&lt;p&gt;This led me to a wild idea: what if we could make memory &lt;em&gt;effectively&lt;/em&gt; faster? What if we could compress data in memory and decompress it on-the-fly, just in time for the CPU? This would &lt;a class="reference external" href="https://www.blosc.org/docs/StarvingCPUs-CISE-2010.pdf"&gt;slash the amount of data being moved&lt;/a&gt;, boosting our effective memory bandwidth. That idea became the seed for &lt;a class="reference external" href="https://www.blosc.org"&gt;Blosc&lt;/a&gt;, a project I started in 2010 that has been &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2"&gt;my passion ever since&lt;/a&gt;. Now, 15 years later, it is time to revisit that idea and see how well it holds up in today's computing landscape.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="roofline-model-understanding-the-memory-wall"&gt;
&lt;h2&gt;Roofline Model: Understanding the Memory Wall&lt;/h2&gt;
&lt;p&gt;Not all computations are equally affected by the memory wall - in general performance can be either CPU-bound or memory-bound. To diagnose which resource is the limiting factor, the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Roofline_model"&gt;Roofline model&lt;/a&gt; provides an insightful analytical framework. This model &lt;a class="reference external" href="https://docs.nersc.gov/tools/performance/roofline/"&gt;plots computational performance against arithmetic intensity&lt;/a&gt; (i.e. floating-point operations per second versus memory accesses per second) to visually determine whether a task is constrained by CPU speed or memory bandwidth.&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-intro.avif" src="https://blosc.org/images/roofline-surprising-story/roofline-intro.avif"&gt;
&lt;p&gt;We will use Roofline plots to analyze Blosc2's performance, compared to that of NumPy and Numexpr. NumPy, with its highly optimized linear algebra backends, and Numexpr, with its efficient evaluation of element-wise expressions, together form a strong performance baseline for the full range of arithmetic intensities tested.&lt;/p&gt;
&lt;p&gt;To highlight the role of memory bandwidth, we will conduct our benchmarks on an AMD Ryzen 7800X3D CPU at two different memory speeds: the standard 4800 MTS and an overclocked 6000 MTS. This allows us to directly observe how memory frequency impacts computational performance.&lt;/p&gt;
&lt;p&gt;To cover a range of computational scenarios, our benchmarks include five operations with varying arithmetic intensities:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Very Low&lt;/strong&gt;: A simple element-wise addition (a + b + c).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Low&lt;/strong&gt;: A moderately complex element-wise expression (sqrt(a + 2 * b + (c / 2)) ^ 1.2).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Medium&lt;/strong&gt;: A highly complex element-wise calculation involving trigonometric and exponential functions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High&lt;/strong&gt;: Matrix multiplication on small matrices (labeled matmul0).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Very High&lt;/strong&gt;: Matrix multiplication on large matrices (labeled matmul1 and matmul2).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-mem-speed-AMD-7800X3D.png" src="https://blosc.org/images/roofline-surprising-story/roofline-mem-speed-AMD-7800X3D.png"&gt;
&lt;p&gt;The Roofline plot confirms that increasing memory speed only benefits memory-bound operations (low arithmetic intensity), while CPU-bound tasks (high arithmetic intensity) are unaffected, as expected. Although this might suggest the "memory wall" is not a major obstacle, low-intensity operations like element-wise calculations, reductions, and selections are extremely common and often create performance bottlenecks. Therefore, optimizing for memory performance remains crucial.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="the-in-memory-surprise-why-wasn-t-compression-faster"&gt;
&lt;h2&gt;The In-Memory Surprise: Why Wasn't Compression Faster?&lt;/h2&gt;
&lt;p&gt;We benchmarked Blosc2 (both compressed and uncompressed) against NumPy and Numexpr. For this test, Blosc2 was configured with the LZ4 codec and shuffle filter, a setup known for its balance of speed and compression ratio.  The benchmarks were executed on an AMD Ryzen 7800X3D CPU with memory speed set to 6000 MTS, ensuring optimal memory bandwidth for the tests.&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-7800X3D-mem-def.png" src="https://blosc.org/images/roofline-surprising-story/roofline-7800X3D-mem-def.png"&gt;
&lt;p&gt;The analysis reveals a surprising outcome: for memory-bound operations, Blosc2 is up to five times slower than Numexpr. Although operating on compressed data provides a marginal improvement over uncompressed Blosc2, it is not enough to overcome this performance gap. This result is unexpected because Blosc2 leverages Numexpr internally, and the reduced memory bandwidth from compression should theoretically lead to better performance in these scenarios.&lt;/p&gt;
&lt;p&gt;To understand this counter-intuitive result, we must examine Blosc2's core architecture. The key lies in its double partitioning scheme, which, while powerful, introduces an overhead that can negate the benefits of compression in memory-bound contexts.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="unpacking-the-overhead-a-look-inside-blosc2-s-architecture"&gt;
&lt;h2&gt;Unpacking the Overhead: A Look Inside Blosc2's Architecture&lt;/h2&gt;
&lt;p&gt;The performance characteristics of Blosc2 are rooted in its double partitioning architecture, which organizes data into chunks and blocks.&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/double-partition-b2nd.avif" src="https://blosc.org/images/roofline-surprising-story/double-partition-b2nd.avif"&gt;
&lt;p&gt;This design is crucial for both aligning with the CPU's memory hierarchy and enabling efficient multidimensional array representation (important for things like e.g. n-dimensional slicing). However, this structure introduces an inherent overhead from additional indexing logic. In memory-bound scenarios, this latency counteracts the performance gains from reduced memory traffic, explaining why Blosc2 does not surpass Numexpr.&lt;/p&gt;
&lt;p&gt;Conversely, as arithmetic intensity increases, the computational demands begin to dominate the total execution time. In these CPU-bound regimes, the partitioning overhead is effectively amortized, allowing Blosc2 to close the performance gap and eventually match NumPy's performance in tasks like large matrix multiplications.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="modern-arm-architectures"&gt;
&lt;h2&gt;Modern ARM Architectures&lt;/h2&gt;
&lt;p&gt;CPU architecture is a rapidly evolving field. To investigate how these changes impact performance, we extended our analysis to the Apple Silicon M4 Pro, a modern ARM-based processor.&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-m4pro-mem-def.png" src="https://blosc.org/images/roofline-surprising-story/roofline-m4pro-mem-def.png"&gt;
&lt;p&gt;The results show that Blosc2 performs significantly better on this platform, narrowing the performance gap with NumPy/NumExpr, especially for operations on compressed data. While compute engines optimized for uncompressed data still hold an edge, these findings suggest that compression will play an increasingly important role in improving computational performance in the future.&lt;/p&gt;
&lt;p&gt;However, while the in-memory results are revealing, they don't tell the whole story. Blosc2 was designed not just to fight the memory wall, but to conquer an even greater bottleneck: disk I/O. Although compression has the benefit of fitting more data into RAM when used in-memory (which is per se extremely interesting in these times, where &lt;a class="reference external" href="https://arstechnica.com/gadgets/2025/11/spiking-memory-prices-mean-that-it-is-once-again-a-horrible-time-to-build-a-pc/"&gt;RAM prices skyrocketed&lt;/a&gt;), its true power is unleashed when computations move off-motherboard. Now, let's shift the battlefield to the disk and see how Blosc2 performs in its native territory.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="a-different-battlefield-blosc2-shines-with-on-disk-data"&gt;
&lt;h2&gt;A Different Battlefield: Blosc2 Shines with On-Disk Data&lt;/h2&gt;
&lt;p&gt;Blosc2's architecture extends its computational engine to operate seamlessly on data stored on disk, a significant advantage for large-scale analysis.  This is particularly relevant in scenarios where datasets exceed available memory, necessitating out-of-core processing, as commonly encountered in data science, machine learning workflows or &lt;a class="reference external" href="https://ironarray.io/cat2cloud"&gt;cloud computing environments&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Our on-disk benchmarks were designed to use datasets larger than the system's available memory to prevent filesystem caching from influencing the results. To establish a baseline, we implemented an out-of-core solution for NumPy/NumExpr, leveraging memory-mapped files. Here Blosc2 has a performance edge, particularly for memory-bound operations on compressed data, being able to send and receive data faster to disk than the memory-mapped NumPy arrays.&lt;/p&gt;
&lt;p&gt;In this case, we've used high-performance NVMe SSDs (NVMe 4.0) to minimize the impact of disk speed on the results.  We also switched to the ZSTD codec for Blosc2, as its superior compression ratio over LZ4 further minimizes data transfer to and from the disk.&lt;/p&gt;
&lt;p&gt;First, let's see the results for the AMD Ryzen 7800X3D system:&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-7800X3D-disk-def.png" src="https://blosc.org/images/roofline-surprising-story/roofline-7800X3D-disk-def.png"&gt;
&lt;p&gt;The plots above show that Blosc2 outperforms both NumPy and Numexpr for all low-to-medium intensity operations. This is because the high latency of disk I/O amortizes the overhead of Blosc2's double partitioning scheme. Furthermore, the reduced bandwidth required for compressed data gives Blosc2 an additional performance advantage in this scenario.&lt;/p&gt;
&lt;p&gt;Now, let's see the results for the Apple Silicon M4 Pro system:&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-m4pro-disk-def.png" src="https://blosc.org/images/roofline-surprising-story/roofline-m4pro-disk-def.png"&gt;
&lt;p&gt;On the Apple Silicon M4 Pro system, Blosc2 again outperforms both NumPy and Numexpr for all on-disk operations, mirroring the results from the AMD system. However, the performance advantage is even more significant here, especially for memory-bound tasks. This is mainly because memory-mapped arrays are less efficient on Apple Silicon than on x86_64 systems, increasing the overhead for the NumPy/Numexpr baseline.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="roofline-plot-in-memory-vs-on-disk"&gt;
&lt;h2&gt;Roofline Plot: In-Memory vs On-Disk&lt;/h2&gt;
&lt;p&gt;To better understand the trade-offs between in-memory and on-disk processing with Blosc2, the following plot contrasts their performance characteristics for compressed data:&lt;/p&gt;
&lt;img alt="/images/roofline-surprising-story/roofline-mem-disk-def.png" src="https://blosc.org/images/roofline-surprising-story/roofline-mem-disk-def.png"&gt;
&lt;p&gt;A notable finding for the AMD system is that Blosc2's on-disk operations are noticeably faster than its in-memory operations, especially for memory-bound tasks (low arithmetic intensity). This is likely due to two factors: first, the larger datasets used for on-disk tests allow Blosc2 to use more efficient internal partitions (chunks and blocks), and second, parallel data reads from disk further reduce bandwidth requirements.&lt;/p&gt;
&lt;p&gt;In contrast, for CPU-bound tasks (high arithmetic intensity), on-disk performance is comparable to, albeit slightly slower than, in-memory performance. The analysis also reveals a specific weakness: small matrix multiplications (matmul0) are significantly slower on-disk, identifying a clear target for future optimization.&lt;/p&gt;
&lt;p&gt;In contrast to the AMD system, the Apple Silicon M4 Pro shows that Blosc2's on-disk operations are slower than in-memory, a difference that is most significant for memory-bound tasks. This performance disparity suggests that current on-disk optimizations may favor x86_64 architectures over ARM.&lt;/p&gt;
&lt;p&gt;As with the AMD platform, CPU-bound operations exhibit similar performance for both on-disk and in-memory contexts. The notable exception remains the small matrix multiplication (matmul0), which performs significantly worse on-disk. This recurring pattern pinpoints a clear opportunity for future optimization efforts.&lt;/p&gt;
&lt;p&gt;Finally, and in addition to its on-disk performance, Blosc2 offers a significant cost advantage. With the &lt;a class="reference external" href="https://arstechnica.com/gadgets/2025/11/spiking-memory-prices-mean-that-it-is-once-again-a-horrible-time-to-build-a-pc/"&gt;recent rise in SSD prices&lt;/a&gt;, compressing data on disk becomes an economically attractive strategy, allowing you to store more data in less space and thereby reduce hardware expenses.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="reproducibility"&gt;
&lt;h2&gt;Reproducibility&lt;/h2&gt;
&lt;p&gt;All the &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/roofline-analysis.py"&gt;benchmarks&lt;/a&gt; and &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/roofline-plot.py"&gt;plots&lt;/a&gt; presented in this blog post can be reproduced. You are invited to run the scripts on your own hardware to explore the performance characteristics of Blosc2 in different environments. In case you get interesting results, please consider sharing them with the community!&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusions"&gt;
&lt;h2&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;In this blog post, we explored the Roofline model to analyze the performance of Blosc2, NumPy, and Numexpr. We've confirmed that memory-bound operations are significantly affected by the "memory wall", making data compression of interest when maximizing performance. However, for in-memory operations, the overhead of Blosc2's double partitioning scheme can be a limiting factor, especially on x86_64 architectures. Encouragingly, this performance gap narrows considerably on modern ARM platforms like Apple Silicon, suggesting a promising future.&lt;/p&gt;
&lt;p&gt;The situation changes dramatically for on-disk operations. Here, Blosc2 consistently outperforms NumPy and Numexpr, as the high latency of disk I/O (even if we used SSDs here) amortizes its internal overhead. This makes Blosc2 a compelling choice for out-of-core computations, one of its primary use cases.&lt;/p&gt;
&lt;p&gt;Overall, this analysis has provided valuable insights, highlighting the importance of the memory hierarchy. It has also exposed specific areas for improvement, such as the performance of small matrix multiplications. As Blosc2 continues to evolve, I am confident we can address these points and further enhance its performance, making it an even more powerful tool for numerical computations in Python.&lt;/p&gt;
&lt;hr class="docutils"&gt;
&lt;p&gt;Read more about &lt;a class="reference external" href="https://ironarray.io"&gt;ironArray SLU&lt;/a&gt; — the company behind Blosc2, Caterva2, Numexpr and other high-performance data processing libraries.&lt;/p&gt;
&lt;p&gt;Compress Better, Compute Bigger!&lt;/p&gt;
&lt;/section&gt;</description><category>Blosc2</category><category>memory wall</category><category>numexpr</category><category>numpy</category><category>performance</category><category>roofline</category><guid>https://blosc.org/posts/roofline-analysis-blosc2/</guid><pubDate>Thu, 27 Nov 2025 08:05:21 GMT</pubDate></item><item><title>TreeStore: Endowing Your Data With Hierarchical Structure</title><link>https://blosc.org/posts/new-treestore-blosc2/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;When working with large and complex datasets, having a way to organize your data efficiently is crucial. &lt;code class="docutils literal"&gt;blosc2.TreeStore&lt;/code&gt; is a powerful feature in the &lt;code class="docutils literal"&gt;blosc2&lt;/code&gt; library that allows you to store and manage your compressed arrays in a hierarchical, tree-like structure, much like a filesystem. This container, typically saved with a &lt;code class="docutils literal"&gt;.b2z&lt;/code&gt; extension, can hold not only &lt;code class="docutils literal"&gt;blosc2.NDArray&lt;/code&gt; or &lt;code class="docutils literal"&gt;blosc2.SChunk&lt;/code&gt; objects but also metadata, making it a versatile tool for data organization.&lt;/p&gt;
&lt;section id="what-is-a-treestore"&gt;
&lt;h2&gt;What is a TreeStore?&lt;/h2&gt;
&lt;p&gt;A &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; lets you arrange your data into groups (like directories) and datasets (like files). Each dataset is a &lt;code class="docutils literal"&gt;blosc2.NDArray&lt;/code&gt; or &lt;code class="docutils literal"&gt;blosc2.SChunk&lt;/code&gt; instance, benefiting from Blosc2's high-performance compression. This structure is ideal for scenarios where data has a natural hierarchy, such as in scientific experiments, simulations, or any project with multiple related datasets.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="basic-usage-creating-and-populating-a-treestore"&gt;
&lt;h2&gt;Basic Usage: Creating and Populating a TreeStore&lt;/h2&gt;
&lt;p&gt;Creating a &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is straightforward. You can use a &lt;code class="docutils literal"&gt;with&lt;/code&gt; statement to ensure the store is properly managed. Inside the &lt;code class="docutils literal"&gt;with&lt;/code&gt; block, you can create groups and datasets using a path-like syntax.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-1" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;blosc2&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-2" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-2"&gt;&lt;/a&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-3" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-3"&gt;&lt;/a&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-4" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-4"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Create a new TreeStore&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-5" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-5" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-5"&gt;&lt;/a&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TreeStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"my_experiment.b2z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-6" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-6" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-6"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# You can store numpy arrays, which are converted to blosc2.NDArray&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-7" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-7" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-7"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/dataset0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-8" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-8" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-8"&gt;&lt;/a&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-9" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-9" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-9"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Create a group with a dataset that can be a blosc2 NDArray&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-10" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-10" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-10"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1/dataset1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-11" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-11" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-11"&gt;&lt;/a&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-12" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-12" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-12"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# You can also store blosc2 arrays directly (vlmeta included)&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-13" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-13" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-13"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-14" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-14" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-14"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ext&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"desc"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dataset2 metadata"&lt;/span&gt;
&lt;a id="rest_code_a57c3fc754e643f1a7493822fed3c0ec-15" name="rest_code_a57c3fc754e643f1a7493822fed3c0ec-15" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_a57c3fc754e643f1a7493822fed3c0ec-15"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1/dataset2"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ext&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this example, we created a &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; in a file named &lt;code class="docutils literal"&gt;my_experiment.b2z&lt;/code&gt;.&lt;/p&gt;
&lt;img alt="/images/new-treestore-blosc2/tree-store-blog.png" class="align-center" src="https://blosc.org/images/new-treestore-blosc2/tree-store-blog.png" style="width: 90%;"&gt;
&lt;p&gt;It contains two groups, &lt;code class="docutils literal"&gt;root&lt;/code&gt; and &lt;code class="docutils literal"&gt;group1&lt;/code&gt;, each holding datasets.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="reading-from-a-treestore"&gt;
&lt;h2&gt;Reading from a TreeStore&lt;/h2&gt;
&lt;p&gt;To access the data, you open the &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; in read mode (&lt;code class="docutils literal"&gt;'r'&lt;/code&gt;) and use the same path-like keys to retrieve your arrays.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-1" name="rest_code_0507044fc58946738f7db9fd6207b65b-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Open the TreeStore in read-only mode ('r')&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-2" name="rest_code_0507044fc58946738f7db9fd6207b65b-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-2"&gt;&lt;/a&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TreeStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"my_experiment.b2z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-3" name="rest_code_0507044fc58946738f7db9fd6207b65b-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-3"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Access a dataset&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-4" name="rest_code_0507044fc58946738f7db9fd6207b65b-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-4"&gt;&lt;/a&gt;    &lt;span class="n"&gt;dataset1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1/dataset1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-5" name="rest_code_0507044fc58946738f7db9fd6207b65b-5" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-5"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Dataset 1:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataset1&lt;/span&gt;&lt;span class="p"&gt;[:])&lt;/span&gt;  &lt;span class="c1"&gt;# Use [:] to decompress and get a NumPy array&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-6" name="rest_code_0507044fc58946738f7db9fd6207b65b-6" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-7" name="rest_code_0507044fc58946738f7db9fd6207b65b-7" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-7"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Access the external array that has been stored internally&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-8" name="rest_code_0507044fc58946738f7db9fd6207b65b-8" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-8"&gt;&lt;/a&gt;    &lt;span class="n"&gt;dataset2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1/dataset2"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-9" name="rest_code_0507044fc58946738f7db9fd6207b65b-9" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-9"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Dataset 2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataset2&lt;/span&gt;&lt;span class="p"&gt;[:])&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-10" name="rest_code_0507044fc58946738f7db9fd6207b65b-10" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-10"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Dataset 2 metadata:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataset2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[:])&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-11" name="rest_code_0507044fc58946738f7db9fd6207b65b-11" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-11"&gt;&lt;/a&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-12" name="rest_code_0507044fc58946738f7db9fd6207b65b-12" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-12"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# List all paths in the store&lt;/span&gt;
&lt;a id="rest_code_0507044fc58946738f7db9fd6207b65b-13" name="rest_code_0507044fc58946738f7db9fd6207b65b-13" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0507044fc58946738f7db9fd6207b65b-13"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Paths in TreeStore:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-1" name="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-1"&gt;&lt;/a&gt;Dataset 1: [0 1 2 3 4 5 6 7 8 9]
&lt;a id="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-2" name="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-2"&gt;&lt;/a&gt;Dataset 2 [0.0000000e+00 1.0001000e-04 2.0002000e-04 ... 9.9979997e-01 9.9989998e-01
&lt;a id="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-3" name="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-3"&gt;&lt;/a&gt; 1.0000000e+00]
&lt;a id="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-4" name="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-4"&gt;&lt;/a&gt;Dataset 2 metadata: {b'desc': 'dataset2 metadata'}
&lt;a id="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-5" name="rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-5" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9cd32d9d6eb945aea6c0ed7e37b35882-5"&gt;&lt;/a&gt;Paths in TreeStore: ['/group1/dataset2', '/group2', '/group1', '/group2/another_dataset', '/group1/dataset1']
&lt;/pre&gt;&lt;/div&gt;
&lt;/section&gt;
&lt;section id="advanced-usage-metadata-and-subtrees"&gt;
&lt;h2&gt;Advanced Usage: Metadata and Subtrees&lt;/h2&gt;
&lt;p&gt;&lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; becomes even more powerful when you use metadata and interact with subtrees (groups).&lt;/p&gt;
&lt;section id="storing-metadata-with-vlmeta"&gt;
&lt;h3&gt;Storing Metadata with &lt;code class="docutils literal"&gt;vlmeta&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;You can attach variable-length metadata (&lt;code class="docutils literal"&gt;vlmeta&lt;/code&gt;) to any group or to the root of the tree. This is useful for storing information like author names, dates, or experiment parameters. &lt;code class="docutils literal"&gt;vlmeta&lt;/code&gt; is essentially a dictionary where you can store your metadata.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-1" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-1"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Appending metadata to the TreeStore&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-2" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-2"&gt;&lt;/a&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TreeStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"my_experiment.b2z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"a"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# 'a' for append/modify&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-3" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-3"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Add metadata to the root&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-4" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-4"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"author"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"The Blosc Team"&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-5" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-5" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-5"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2025-08-17"&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-6" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-6" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-7" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-7" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-7"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Add metadata to a group&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-8" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-8" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-8"&gt;&lt;/a&gt;    &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Data from the first run"&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-9" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-9" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-9"&gt;&lt;/a&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-10" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-10" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-10"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Reading metadata&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-11" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-11" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-11"&gt;&lt;/a&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TreeStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"my_experiment.b2z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-12" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-12" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-12"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Root metadata:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[:])&lt;/span&gt;
&lt;a id="rest_code_9b505a43ee3a45cebcff02e2e8253770-13" name="rest_code_9b505a43ee3a45cebcff02e2e8253770-13" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_9b505a43ee3a45cebcff02e2e8253770-13"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Group 1 metadata:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[:])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_0a4960b41dd74dfe9394b993bab8dbb0-1" name="rest_code_0a4960b41dd74dfe9394b993bab8dbb0-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0a4960b41dd74dfe9394b993bab8dbb0-1"&gt;&lt;/a&gt;Root metadata: {'author': 'The Blosc Team', 'date': '2025-08-17'}
&lt;a id="rest_code_0a4960b41dd74dfe9394b993bab8dbb0-2" name="rest_code_0a4960b41dd74dfe9394b993bab8dbb0-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_0a4960b41dd74dfe9394b993bab8dbb0-2"&gt;&lt;/a&gt;Group 1 metadata: {'description': 'Data from the first run'}
&lt;/pre&gt;&lt;/div&gt;
&lt;/section&gt;
&lt;section id="working-with-subtrees-groups"&gt;
&lt;h3&gt;Working with Subtrees (Groups)&lt;/h3&gt;
&lt;p&gt;A group object can be retrieved from the &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; and treated as a smaller, independent &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt;. This capability is useful for better organizing your data access code.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-1" name="rest_code_743e986080d940c7a5d3a558b96d7817-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-1"&gt;&lt;/a&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TreeStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"my_experiment.b2z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-2" name="rest_code_743e986080d940c7a5d3a558b96d7817-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-2"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Get the group as a subtree&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-3" name="rest_code_743e986080d940c7a5d3a558b96d7817-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-3"&gt;&lt;/a&gt;    &lt;span class="n"&gt;group1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/group1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-4" name="rest_code_743e986080d940c7a5d3a558b96d7817-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-4"&gt;&lt;/a&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-5" name="rest_code_743e986080d940c7a5d3a558b96d7817-5" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-5"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Now you can access datasets relative to this group&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-6" name="rest_code_743e986080d940c7a5d3a558b96d7817-6" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-6"&gt;&lt;/a&gt;    &lt;span class="n"&gt;dataset2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;group1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"dataset2"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-7" name="rest_code_743e986080d940c7a5d3a558b96d7817-7" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-7"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Dataset 2 from group object:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataset2&lt;/span&gt;&lt;span class="p"&gt;[:])&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-8" name="rest_code_743e986080d940c7a5d3a558b96d7817-8" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-8"&gt;&lt;/a&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-9" name="rest_code_743e986080d940c7a5d3a558b96d7817-9" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-9"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# You can also list contents relative to the group&lt;/span&gt;
&lt;a id="rest_code_743e986080d940c7a5d3a558b96d7817-10" name="rest_code_743e986080d940c7a5d3a558b96d7817-10" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_743e986080d940c7a5d3a558b96d7817-10"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Contents of group1:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;group1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_00726fb9fa04417c9a004d9202070667-1" name="rest_code_00726fb9fa04417c9a004d9202070667-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_00726fb9fa04417c9a004d9202070667-1"&gt;&lt;/a&gt;Dataset 2 from group object: [0.0000000e+00 1.0001000e-04 2.0002000e-04 ... 9.9979997e-01 9.9989998e-01
&lt;a id="rest_code_00726fb9fa04417c9a004d9202070667-2" name="rest_code_00726fb9fa04417c9a004d9202070667-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_00726fb9fa04417c9a004d9202070667-2"&gt;&lt;/a&gt; 1.0000000e+00]
&lt;a id="rest_code_00726fb9fa04417c9a004d9202070667-3" name="rest_code_00726fb9fa04417c9a004d9202070667-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_00726fb9fa04417c9a004d9202070667-3"&gt;&lt;/a&gt;Contents of group1: ['/dataset2', '/dataset1']
&lt;/pre&gt;&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="iterating-through-a-treestore"&gt;
&lt;h2&gt;Iterating Through a TreeStore&lt;/h2&gt;
&lt;p&gt;You can easily iterate through all the nodes in a &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; to inspect its contents.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_4361a1b365db459fa0bf6050f21c2a7b-1" name="rest_code_4361a1b365db459fa0bf6050f21c2a7b-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_4361a1b365db459fa0bf6050f21c2a7b-1"&gt;&lt;/a&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TreeStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"my_experiment.b2z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_4361a1b365db459fa0bf6050f21c2a7b-2" name="rest_code_4361a1b365db459fa0bf6050f21c2a7b-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_4361a1b365db459fa0bf6050f21c2a7b-2"&gt;&lt;/a&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;a id="rest_code_4361a1b365db459fa0bf6050f21c2a7b-3" name="rest_code_4361a1b365db459fa0bf6050f21c2a7b-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_4361a1b365db459fa0bf6050f21c2a7b-3"&gt;&lt;/a&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NDArray&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;a id="rest_code_4361a1b365db459fa0bf6050f21c2a7b-4" name="rest_code_4361a1b365db459fa0bf6050f21c2a7b-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_4361a1b365db459fa0bf6050f21c2a7b-4"&gt;&lt;/a&gt;            &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Found dataset at '&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;' with shape &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_4361a1b365db459fa0bf6050f21c2a7b-5" name="rest_code_4361a1b365db459fa0bf6050f21c2a7b-5" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_4361a1b365db459fa0bf6050f21c2a7b-5"&gt;&lt;/a&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# It's a group&lt;/span&gt;
&lt;a id="rest_code_4361a1b365db459fa0bf6050f21c2a7b-6" name="rest_code_4361a1b365db459fa0bf6050f21c2a7b-6" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_4361a1b365db459fa0bf6050f21c2a7b-6"&gt;&lt;/a&gt;            &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Found group at '&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;' with metadata: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlmeta&lt;/span&gt;&lt;span class="p"&gt;[:]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="code"&gt;&lt;pre class="code text"&gt;&lt;a id="rest_code_2cecd0bf87a74a458107344c21bf1e65-1" name="rest_code_2cecd0bf87a74a458107344c21bf1e65-1" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_2cecd0bf87a74a458107344c21bf1e65-1"&gt;&lt;/a&gt;Found dataset at '/group1/dataset2' with shape (10000,)
&lt;a id="rest_code_2cecd0bf87a74a458107344c21bf1e65-2" name="rest_code_2cecd0bf87a74a458107344c21bf1e65-2" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_2cecd0bf87a74a458107344c21bf1e65-2"&gt;&lt;/a&gt;Found group at '/group1' with metadata: {'description': 'Data from the first run'}
&lt;a id="rest_code_2cecd0bf87a74a458107344c21bf1e65-3" name="rest_code_2cecd0bf87a74a458107344c21bf1e65-3" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_2cecd0bf87a74a458107344c21bf1e65-3"&gt;&lt;/a&gt;Found dataset at '/group1/dataset1' with shape (10,)
&lt;a id="rest_code_2cecd0bf87a74a458107344c21bf1e65-4" name="rest_code_2cecd0bf87a74a458107344c21bf1e65-4" href="https://blosc.org/posts/new-treestore-blosc2/#rest_code_2cecd0bf87a74a458107344c21bf1e65-4"&gt;&lt;/a&gt;Found dataset at '/dataset0' with shape (100,)
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That's it for this introduction to &lt;code class="docutils literal"&gt;blosc2.TreeStore&lt;/code&gt;! You now know how to create, read, and manipulate a hierarchical data structure that can hold compressed datasets and metadata. You can find the source code for this example in the &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/examples/tree-store-blog.py"&gt;blosc2 repository&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="some-benchmarks"&gt;
&lt;h2&gt;Some Benchmarks&lt;/h2&gt;
&lt;p&gt;&lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is based on powerful abstractions from the &lt;code class="docutils literal"&gt;blosc2&lt;/code&gt; library, so it is very fast. Here are some benchmarks comparing &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; to other data storage formats, like HDF5 and Zarr. We have used two different configurations: one with small arrays, where sizes follow a normal distribution centered at 10 MB each, and the other with larger arrays, where sizes follow a normal distribution centered at 1 GB each. We have compared the performance of &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; against HDF5 and Zarr for both small and large arrays, measuring the time taken to create and read datasets.  For comparing apples with apples, we have used the same compression codec (&lt;code class="docutils literal"&gt;zstd&lt;/code&gt;) and filter (&lt;code class="docutils literal"&gt;shuffle&lt;/code&gt;) for all three formats.&lt;/p&gt;
&lt;p&gt;For assessing different platforms, we have used a desktop with an Intel i9-13900K CPU and 32 GB of RAM, running Ubuntu 25.04, and also a Mac mini with an Apple M4 Pro processor and 24 GB of RAM. The benchmarks were run using &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/large-tree-store.py"&gt;this script&lt;/a&gt;.&lt;/p&gt;
&lt;section id="results-for-the-intel-i9-13900k-desktop"&gt;
&lt;h3&gt;Results for the Intel i9-13900K desktop&lt;/h3&gt;
&lt;p&gt;100 small arrays (around 10 MB each) scenario:&lt;/p&gt;
&lt;img alt="/images/new-treestore-blosc2/benchmark_comparison_b2z-i13900K-10M.png" class="align-center" src="https://blosc.org/images/new-treestore-blosc2/benchmark_comparison_b2z-i13900K-10M.png" style="width: 75%;"&gt;
&lt;p&gt;For the small arrays scenario, we can see that &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is the fastest to create datasets (due to use of multi-threading), but it is slower than HDF5 and Zarr when reading datasets.  The reason for this is two-fold: first, &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is designed to work using multi-threading, so it must setup the necessary threads at the beginning of the read operation, which takes some time; second, &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is using NDArray objects internally, which are using a double partitioning scheme (chunks and blocks) to store the data, which adds some overhead when reading small slices of data. Regarding the space used, &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is the most efficient, very close to HDF5, and significantly more efficient than Zarr.&lt;/p&gt;
&lt;p&gt;100 large arrays (around 1 GB each) scenario:&lt;/p&gt;
&lt;img alt="/images/new-treestore-blosc2/benchmark_comparison_b2z-i13900K-1G.png" class="align-center" src="https://blosc.org/images/new-treestore-blosc2/benchmark_comparison_b2z-i13900K-1G.png" style="width: 75%;"&gt;
&lt;p&gt;When handling larger arrays, &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; maintains its lead in creation and full-read performance. Although HDF5 and Zarr offer faster access to small data slices, &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; compensates by being the most storage-efficient format, followed by HDF5, with Zarr being the most space-intensive.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="results-for-the-apple-m4-pro-mac-mini"&gt;
&lt;h3&gt;Results for the Apple M4 Pro Mac mini&lt;/h3&gt;
&lt;p&gt;100 small arrays (around 10 MB each) scenario:&lt;/p&gt;
&lt;img alt="/images/new-treestore-blosc2/benchmark_comparison_b2z-MacM4-10M.png" class="align-center" src="https://blosc.org/images/new-treestore-blosc2/benchmark_comparison_b2z-MacM4-10M.png" style="width: 75%;"&gt;
&lt;p&gt;100 large arrays (around 1 GB each) scenario:&lt;/p&gt;
&lt;img alt="/images/new-treestore-blosc2/benchmark_comparison_b2z-MacM4-1G.png" class="align-center" src="https://blosc.org/images/new-treestore-blosc2/benchmark_comparison_b2z-MacM4-1G.png" style="width: 75%;"&gt;
&lt;p&gt;Consistent with the previous results, &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is the most space-efficient format and the fastest for creating and reading datasets, particularly for larger arrays. Its performance is slower than HDF5 and Zarr only when reading small data slices (access time). This can be improved by reducing the number of threads from the default of eight, which lessens the thread setup overhead. For more details on this, see these &lt;a class="reference external" href="https://www.blosc.org/docs/2025-EuroSciPy-Blosc2.pdf"&gt;slides comparing 8-thread vs 1-thread performance&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Notably, the Apple M4 Pro processor shows competitive performance against the Intel i9-13900K CPU, a high-end desktop processor that consumes up to 8x more power. This result underscores the efficiency of the ARM architecture in general and Apple silicon in particular.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In summary, &lt;code class="docutils literal"&gt;blosc2.TreeStore&lt;/code&gt; offers a straightforward yet potent solution for hierarchically organizing compressed datasets. By merging the high-performance compression of &lt;code class="docutils literal"&gt;blosc2.NDArray&lt;/code&gt; and &lt;code class="docutils literal"&gt;blosc2.SChunk&lt;/code&gt; with a flexible, filesystem-like structure and metadata support, it stands out as an excellent choice for managing complex data projects.&lt;/p&gt;
&lt;p&gt;As &lt;code class="docutils literal"&gt;TreeStore&lt;/code&gt; is currently in beta, we welcome feedback and suggestions for its improvement. For further details, please consult the official documentation for &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/reference/tree_store.html#blosc2.TreeStore"&gt;blosc2.TreeStore&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;</description><category>treestore hierarchical structure performance</category><guid>https://blosc.org/posts/new-treestore-blosc2/</guid><pubDate>Sun, 17 Aug 2025 10:33:20 GMT</pubDate></item><item><title>Efficient array concatenation launched in Blosc2</title><link>https://blosc.org/posts/blosc2-new-concatenate/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;&lt;strong&gt;Update (2025-06-23):&lt;/strong&gt; Recently, Luke Shaw added a &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/pull/427#pullrequestreview-2948922546"&gt;stack() function in Blosc2&lt;/a&gt;, using the concatenate feature described here. The new function allows you to stack arrays along a new axis, which is particularly useful for creating higher-dimensional arrays from lower-dimensional ones.  We have added a section at the end of this post to show the usage and performance of this new function.&lt;/p&gt;
&lt;p&gt;---&lt;/p&gt;
&lt;p&gt;Blosc2 just got a cool new trick: super-efficient array concatenation! If you've ever needed to combine several arrays into one, especially when dealing with lots of data, this new feature is for you. It's built to be fast and use as little memory as possible. This is especially true if your array sizes line up nicely with Blosc2's internal "chunks" (think of these as the building blocks of your compressed data). When this alignment happens, concatenation is lightning-fast, making it perfect for demanding tasks.&lt;/p&gt;
&lt;p&gt;You can use this new concatenate feature whether you're &lt;a class="reference external" href="https://www.blosc.org/c-blosc2/reference/b2nd.html#c.b2nd_concatenate"&gt;coding in C&lt;/a&gt; or &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/reference/autofiles/ndarray/blosc2.concatenate.html"&gt;Python&lt;/a&gt;, and it works with any Blosc2 NDArray (Blosc2's way of handling multi-dimensional arrays).&lt;/p&gt;
&lt;p&gt;Let's see how easy it is to use in Python. If you're familiar with NumPy, the &lt;cite&gt;blosc2.concatenate&lt;/cite&gt; function will feel very similar:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-1" name="rest_code_6b84ae8e67404befa8fe60f429021d30-1" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;blosc2&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-2" name="rest_code_6b84ae8e67404befa8fe60f429021d30-2" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-2"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Create some sample arrays&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-3" name="rest_code_6b84ae8e67404befa8fe60f429021d30-3" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arrayA.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-4" name="rest_code_6b84ae8e67404befa8fe60f429021d30-4" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-4"&gt;&lt;/a&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arrayB.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-5" name="rest_code_6b84ae8e67404befa8fe60f429021d30-5" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-5"&gt;&lt;/a&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arrayC.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-6" name="rest_code_6b84ae8e67404befa8fe60f429021d30-6" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-6"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Concatenate the arrays along the first axis&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-7" name="rest_code_6b84ae8e67404befa8fe60f429021d30-7" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-7"&gt;&lt;/a&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"destination.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-8" name="rest_code_6b84ae8e67404befa8fe60f429021d30-8" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-8"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# The result is a new Blosc2 NDArray containing the concatenated data&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-9" name="rest_code_6b84ae8e67404befa8fe60f429021d30-9" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-9"&gt;&lt;/a&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Output: (30, 20)&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-10" name="rest_code_6b84ae8e67404befa8fe60f429021d30-10" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-10"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# You can also concatenate along other axes&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-11" name="rest_code_6b84ae8e67404befa8fe60f429021d30-11" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-11"&gt;&lt;/a&gt;&lt;span class="n"&gt;result_axis1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"destination_axis1.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b84ae8e67404befa8fe60f429021d30-12" name="rest_code_6b84ae8e67404befa8fe60f429021d30-12" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_6b84ae8e67404befa8fe60f429021d30-12"&gt;&lt;/a&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_axis1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Output: (10, 60)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;cite&gt;blosc2.concatenate&lt;/cite&gt; function is pretty straightforward. You give it a list of the arrays you want to join together. You can also tell it which way to join them using the axis parameter (like joining them end-to-end or side-by-side).&lt;/p&gt;
&lt;p&gt;A really handy feature is that you can use urlpath and mode to save the combined array directly to a file. This is great when you're working with huge datasets because you don't have to load everything into memory at once. What you get back is a brand new, persistent Blosc2 NDArray with all your data combined.&lt;/p&gt;
&lt;section id="aligned-versus-non-aligned-concatenation"&gt;
&lt;h2&gt;Aligned versus Non-Aligned Concatenation&lt;/h2&gt;
&lt;p&gt;Blosc2's concatenate function is smart. It processes your data in small pieces of compressed data (chunks). This has two consequences. The first is that you can join very large arrays, stored on your disk, chunk-by-chunk without using up all your computer's memory. Secondly, if the chunks fit neatly into the arrays to be concatenated, the process is much faster. Why? Because Blosc2 can avoid a lot of extra work, chiefly decompressing and re-compressing the chunks.&lt;/p&gt;
&lt;p&gt;Let's look at some pictures to see what "aligned" and "unaligned" concatenation means. "Aligned" means that chunk boundaries of the arrays to be concatenated line up with each other. "Unaligned" means that this is not the case.&lt;/p&gt;
&lt;img alt="/images/blosc2-new-concatenate/concat-unaligned.png" src="https://blosc.org/images/blosc2-new-concatenate/concat-unaligned.png"&gt;
&lt;img alt="/images/blosc2-new-concatenate/concat-aligned.png" src="https://blosc.org/images/blosc2-new-concatenate/concat-aligned.png"&gt;
&lt;p&gt;The pictures show why "aligned" concatenation is faster. In Blosc2, all data pieces (chunks) inside an array must be the same size. So, if the chunks in the arrays you're joining match up ("aligned"), Blosc2 can combine them very quickly. It doesn't have to rearrange the data into new, same-sized chunks for the final array. This is a big deal for large arrays.&lt;/p&gt;
&lt;p&gt;If the arrays are "unaligned," Blosc2 has more work to do. It has to decompress and then re-compress the data to make the new chunks fit, which takes longer. There's one more small detail for this fast method to work: the first array's size needs to be a neat multiple of its chunk size along the direction you're joining.&lt;/p&gt;
&lt;p&gt;A big plus with Blosc2 is that it always processes data in these small chunks. This means it can combine enormous arrays without ever needing to load everything into your computer's memory at once.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance"&gt;
&lt;h2&gt;Performance&lt;/h2&gt;
&lt;p&gt;To show you how much faster this new concatenate feature is, we did a speed test using LZ4 as the internal compressor in Blosc2. We compared it to the usual way of joining arrays with &lt;cite&gt;numpy.concatenate&lt;/cite&gt;.&lt;/p&gt;
&lt;img alt="/images/blosc2-new-concatenate/benchmark-lz4-20k-i13900K.png" src="https://blosc.org/images/blosc2-new-concatenate/benchmark-lz4-20k-i13900K.png"&gt;
&lt;p&gt;The speed tests show that Blosc2's new concatenate is rather slow for small arrays (like 1,000 x 1,000). This is because it has to do a lot of work to set up the concatenation. But when you use larger arrays (like 20,000 x 20,000) that start to exceed the memory limits of our test machine (32 GB of RAM), Blosc2's new concatenate peformance is much better, and nearing the performance of NumPy's &lt;cite&gt;concatenate&lt;/cite&gt; function.&lt;/p&gt;
&lt;p&gt;However, if your array sizes line up well with Blosc2's internal chunks ("aligned" arrays), Blosc2 becomes much faster—typically more than 10x times faster than NumPy for large arrays. This is because it can skip a lot of the work of decompressing and re-compressing data, and the cost of copying compressed data is also lower (as much as the achieved compression ratio, which for this case is around 10x).&lt;/p&gt;
&lt;p&gt;Using the Zstd compressor with Blosc2 can make joining "aligned" arrays even quicker, since Zstd is good at making data smaller.&lt;/p&gt;
&lt;img alt="/images/blosc2-new-concatenate/benchmark-zstd-20k-i13900K.png" src="https://blosc.org/images/blosc2-new-concatenate/benchmark-zstd-20k-i13900K.png"&gt;
&lt;p&gt;So, when arrays are aligned, there's less data to copy (compression ratios here are around 20x), which speeds things up. If arrays aren't aligned, Zstd is a bit slower than the previous compressor (LZ4) because its decompression and re-compression algorithm is slower. Conclusion? Pick the compressor that works best for what you're doing!&lt;/p&gt;
&lt;/section&gt;
&lt;section id="stacking-arrays"&gt;
&lt;h2&gt;Stacking Arrays&lt;/h2&gt;
&lt;p&gt;We've also added a new &lt;cite&gt;stack()&lt;/cite&gt; function in Blosc2 that uses the concatenate feature. This function lets you stack arrays along a new axis, which is super useful for creating higher-dimensional arrays from lower-dimensional ones. Here's how it works:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-1" name="rest_code_df21eceede234f6baf21c132ad12bd53-1" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;blosc2&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-2" name="rest_code_df21eceede234f6baf21c132ad12bd53-2" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-2"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Create some sample arrays&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-3" name="rest_code_df21eceede234f6baf21c132ad12bd53-3" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arrayA.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-4" name="rest_code_df21eceede234f6baf21c132ad12bd53-4" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-4"&gt;&lt;/a&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arrayB.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-5" name="rest_code_df21eceede234f6baf21c132ad12bd53-5" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-5"&gt;&lt;/a&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arrayC.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-6" name="rest_code_df21eceede234f6baf21c132ad12bd53-6" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-6"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Stack the arrays along a new axis&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-7" name="rest_code_df21eceede234f6baf21c132ad12bd53-7" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-7"&gt;&lt;/a&gt;&lt;span class="n"&gt;stacked_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"stacked_destination.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-8" name="rest_code_df21eceede234f6baf21c132ad12bd53-8" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-8"&gt;&lt;/a&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stacked_result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Output: (3, 10, 20)&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-9" name="rest_code_df21eceede234f6baf21c132ad12bd53-9" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-9"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# You can also stack along other axes&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-10" name="rest_code_df21eceede234f6baf21c132ad12bd53-10" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-10"&gt;&lt;/a&gt;&lt;span class="n"&gt;stacked_result_axis1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blosc2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urlpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"stacked_destination_axis1.b2nd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_df21eceede234f6baf21c132ad12bd53-11" name="rest_code_df21eceede234f6baf21c132ad12bd53-11" href="https://blosc.org/posts/blosc2-new-concatenate/#rest_code_df21eceede234f6baf21c132ad12bd53-11"&gt;&lt;/a&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stacked_result_axis1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Output: (10, 3, 20)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Benchmarks for the &lt;cite&gt;stack()&lt;/cite&gt; function show that it performs similarly to the &lt;cite&gt;concat()&lt;/cite&gt; function, especially when the input arrays are aligned.  Here are the results for the same data sizes and machine used in the previous benchmarks, and using the LZ4 compressor.&lt;/p&gt;
&lt;img alt="/images/blosc2-new-concatenate/stack-lz4-20k-i13900K.png" src="https://blosc.org/images/blosc2-new-concatenate/stack-lz4-20k-i13900K.png"&gt;
&lt;p&gt;And here are the results for the Zstd compressor.&lt;/p&gt;
&lt;img alt="/images/blosc2-new-concatenate/stack-zstd-20k-i13900K.png" src="https://blosc.org/images/blosc2-new-concatenate/stack-zstd-20k-i13900K.png"&gt;
&lt;p&gt;As can be seen, the &lt;cite&gt;stack()&lt;/cite&gt; function is also very fast when the input arrays are aligned, and it performs well even for large arrays that don't fit into memory. Incidentally, when using the &lt;cite&gt;blosc2.stack()&lt;/cite&gt; function in the last dim, it is slightly faster than &lt;cite&gt;numpy.stack()&lt;/cite&gt; even when the arrays are not aligned; we are not sure why this is the case, but the fact that we can reproduces this behaviour is probably a sign that NumPy can optimize this use case better.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Blosc2's new concatenate and stack features are a great way to combine arrays quickly and without using too much memory. They are especially fast when your array sizes are an exact multiple of Blosc2's "chunks" (aligned arrays), making it perfect for big data jobs. They also work well for large arrays that don't fit into memory, as it processes data in small chunks. Finally, they are supported in both C and Python, so you can use them in your favorite programming language.&lt;/p&gt;
&lt;p&gt;Give it a try in your own projects! If you have questions, the Blosc2 community is here to help.&lt;/p&gt;
&lt;p&gt;If you appreciate what we're doing with Blosc2, please think about &lt;a class="reference external" href="https://www.blosc.org/pages/blosc-in-depth/#support-blosc/"&gt;supporting us&lt;/a&gt;. Your help lets us keep making these tools better.&lt;/p&gt;
&lt;/section&gt;</description><category>blosc2 concatenate performance</category><guid>https://blosc.org/posts/blosc2-new-concatenate/</guid><pubDate>Mon, 16 Jun 2025 13:33:20 GMT</pubDate></item><item><title>Exploring lossy compression with Blosc2</title><link>https://blosc.org/posts/blosc2-lossy-compression/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;In the realm of data compression, efficiency is key. Whether you're dealing with massive datasets or simply aiming to optimize storage space and transmission speeds, the choice of compression algorithm can make a significant difference.  In this blog post, we'll delve into the world of lossy compression using Blosc2, exploring its capabilities, advantages, and potential applications.&lt;/p&gt;
&lt;section id="understanding-lossy-compression"&gt;
&lt;h2&gt;Understanding lossy compression&lt;/h2&gt;
&lt;p&gt;Unlike lossless compression, where the original data can be perfectly reconstructed from the compressed version, lossy compression involves discarding some information to achieve higher compression ratios. While this inevitably results in a loss of fidelity, the trade-off is often justified by the significant reduction in storage size.&lt;/p&gt;
&lt;p&gt;Lossy compression techniques are commonly employed in scenarios where minor degradation in quality is acceptable, such as multimedia applications (e.g., images, audio, and video) and scientific data analysis. By intelligently discarding less crucial information, lossy compression algorithms can achieve substantial compression ratios while maintaining perceptual quality within acceptable bounds.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="lossy-codecs-in-blosc2"&gt;
&lt;h2&gt;Lossy codecs in Blosc2&lt;/h2&gt;
&lt;p&gt;In the context of Blosc2, lossy compression can be achieved either through a combination of traditional compression algorithms and filters that can selectively discard less critical data, or by using codecs specially meant for doing so.&lt;/p&gt;
&lt;section id="filters-for-truncating-precision"&gt;
&lt;h3&gt;Filters for truncating precision&lt;/h3&gt;
&lt;p&gt;Since its inception, Blosc2 has featured the &lt;a class="reference external" href="https://www.blosc.org/c-blosc2/reference/utility_variables.html#c.BLOSC_TRUNC_PREC"&gt;TRUNC_PREC filter&lt;/a&gt;, which is meant to discard the least significant bits from floating-point values (be they float32 or float64). This filter operates by zeroing out the designated bits slated for removal, resulting in enhanced compression. To see the impact on compression ratio and speed, an illustrative &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/examples/compress2_decompress2.py"&gt;example here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A particularly useful use case of the &lt;cite&gt;TRUNC_PREC&lt;/cite&gt; filter is to truncate precision of float32/float64 types to either 8 or 16 bit; this is a quick and dirty way to ‘fake’ float8 or float16 types, which are very much used in AI nowadays, and contain storage needs.&lt;/p&gt;
&lt;p&gt;In that vein, we recently implemented the &lt;a class="reference external" href="https://www.blosc.org/c-blosc2/reference/utility_variables.html#c.BLOSC_FILTER_INT_TRUNC"&gt;INT_TRUNC filter&lt;/a&gt;, which does the same as &lt;cite&gt;TRUNC_PREC&lt;/cite&gt;, but for integers (int8, int16, int32 and int64, and their unsigned counterparts).  With both &lt;cite&gt;TRUNC_PREC&lt;/cite&gt; and &lt;cite&gt;INT_TRUNC&lt;/cite&gt;, you can specify an acceptable precision for most numerical data types.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="codecs-for-ndim-datasets"&gt;
&lt;h3&gt;Codecs for NDim datasets&lt;/h3&gt;
&lt;p&gt;Blosc2 has support for &lt;a class="reference external" href="https://zfp.readthedocs.io/"&gt;ZFP&lt;/a&gt;, another codec that is very useful for compressing multidimensional datasets.  Although ZFP itself supports both lossless and lossy compression, Blosc2 makes use of its lossy capabilities only (the lossless ones are supposed to be already covered by other codecs in Blosc2).  See this &lt;a class="reference external" href="https://www.blosc.org/posts/support-lossy-zfp/"&gt;blog post&lt;/a&gt; for more info on the kind of lossy compression that can be achieved with ZFP.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="codecs-for-images"&gt;
&lt;h3&gt;Codecs for images&lt;/h3&gt;
&lt;p&gt;In addition, we recently included support for a couple of codecs that support the JPEG 2000 standard. One is &lt;a class="reference external" href="https://github.com/Blosc/blosc2_openhtj2k"&gt;OpenJ2HK&lt;/a&gt;, and the other is &lt;a class="reference external" href="https://github.com/Blosc/blosc2_grok"&gt;grok&lt;/a&gt;.  Both have good, high quality JPEG 2000 implementations, but grok is a bit more advanced and has support for 16-bit gray images; we have &lt;a class="reference external" href="https://www.blosc.org/posts/blosc2-grok-release"&gt;blogged about it&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="experimental-filters"&gt;
&lt;h3&gt;Experimental filters&lt;/h3&gt;
&lt;p&gt;Finally, you may want to experiment with some filters and codecs that were mainly designed to be a learning tool for people wanting to implement their own ones.  Among them you can find:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/tree/main/plugins/filters/ndcell"&gt;NDCELL&lt;/a&gt;: A filter that groups data in multidimensional cells, reordering them so that the codec can find better repetition patterns on a cell-by-cell basis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/tree/main/plugins/filters/ndmean"&gt;NDMEAN&lt;/a&gt;: A multidimensional filter for lossy compression in multidimensional cells, replacing all elements in a cell by the mean of the cell.  This allows for better compressions by the actual compression codec (e.g. NDLZ).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/tree/main/plugins/codecs/ndlz"&gt;NDLZ&lt;/a&gt;: A compressor based on the Lempel-Ziv algorithm for 2-dim datasets.  Although this is a lossless compressor, it is actually meant to be used in combination with the NDCELL and NDMEAN above, providing lossy compression for the latter case.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Again, the codecs in this section are not specially efficient, but can be used for learning about the compression pipeline in Blosc2.  For more info on how to implement (and register) your own filters, see &lt;a class="reference external" href="https://www.blosc.org/posts/registering-plugins/"&gt;this blog post&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="applications-and-use-cases"&gt;
&lt;h2&gt;Applications and use cases&lt;/h2&gt;
&lt;p&gt;The versatility of Blosc2's lossy compression capabilities opens up a myriad of applications across different domains. In scientific computing, for example, where large volumes of data are generated and analyzed, lossy compression can significantly reduce storage requirements without significantly impacting the accuracy of results.&lt;/p&gt;
&lt;p&gt;Similarly, in multimedia applications, such as image and video processing, lossy compression can help minimize bandwidth usage and storage costs while maintaining perceptual quality within acceptable limits.&lt;/p&gt;
&lt;section id="compressing-images-with-jpeg-2000-and-with-int-trunc"&gt;
&lt;h3&gt;Compressing images with JPEG 2000 and with INT_TRUNC&lt;/h3&gt;
&lt;p&gt;As an illustration, a recent study involved the compression of substantial volumes of 16-bit grayscale images sourced from different &lt;a class="reference external" href="https://www.leaps-innov.eu/"&gt;synchrotron facilities in Europe&lt;/a&gt;. While achieving efficient compression ratios necessitates the use of lossy compression techniques, it is essential to exercise caution to preserve key features for clear visual examination and accurate numerical analysis. Below, we provide an overview of how Blosc2 can employ various codecs and quality settings within filters to accomplish this task.&lt;/p&gt;
&lt;img alt="Lossy compression (quality)" src="https://blosc.org/images/blosc2-lossy-compression/SSIM-cratio-MacOS-M1.png" style="width: 50%;"&gt;
&lt;p&gt;The SSIM index, derived from the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Structural_similarity"&gt;Structural Similarity Measure&lt;/a&gt;, gauges the perceived quality of an image, with values closer to 1 indicating higher fidelity. You can appreciate the varying levels of fidelity achievable through the utilization of different filters and codecs.&lt;/p&gt;
&lt;p&gt;In terms of performance, each of these compression methods also showcases significantly varied speeds (tested on a MacBook Air with an M1 processor):&lt;/p&gt;
&lt;img alt="Lossy compression (speed)" src="https://blosc.org/images/blosc2-lossy-compression/speed-cratio-MacOS-M1.png" style="width: 100%;"&gt;
&lt;p&gt;A pivotal benefit of Blosc2's strategy for lossy compression lies in its adaptability and configurability. This enables tailoring to unique needs and limitations, guaranteeing optimal performance across various scenarios.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="using-blosc2-within-hdf5"&gt;
&lt;h3&gt;Using Blosc2 within HDF5&lt;/h3&gt;
&lt;p&gt;HDF5 is a widely used data format, and both major Python wrappers, h5py (via hdf5plugin) and PyTables, offer basic support for Blosc2. However, accessing the full capabilities of the Blosc2 compression pipeline is somewhat restricted because the current &lt;a class="reference external" href="https://github.com/PyTables/PyTables/tree/master/hdf5-blosc2/src"&gt;hdf5-blosc2 filter&lt;/a&gt;, available in PyTables (and used by hdf5plugin), is not yet equipped to transmit all the necessary parameters to the HDF5 data pipeline.&lt;/p&gt;
&lt;p&gt;Thankfully, HDF5 includes support for the &lt;a class="reference external" href="https://docs.hdfgroup.org/archive/support/HDF5/doc1.8/Advanced/DirectChunkWrite/UsingDirectChunkWrite.pdf"&gt;direct chunking mechanism&lt;/a&gt;, which enables the direct transmission of pre-compressed chunks to HDF5, bypassing its standard data pipeline. Since h5py also offers this functionality, it's entirely feasible to leverage all the advanced features of Blosc2, including lossy compression. Below are a couple of examples illustrating how this process operates:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/Blosc/blosc2_grok/blob/main/bench/encode-hdf5.ipynb"&gt;https://github.com/Blosc/blosc2_grok/blob/main/bench/encode-hdf5.ipynb&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://gist.github.com/t20100/80960ec46abd3a863e85876c013834bb"&gt;https://gist.github.com/t20100/80960ec46abd3a863e85876c013834bb&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Lossy compression is a powerful tool for optimizing storage space, reducing bandwidth usage, and improving overall efficiency in data handling. With Blosc2, developers have access to a robust and flexible compression library for both lossless and lossy compression modes.&lt;/p&gt;
&lt;p&gt;With its advanced compression methodologies and adept memory management, Blosc2 empowers users to strike a harmonious balance between compression ratio, speed, and fidelity. This attribute renders it especially suitable for scenarios where resource limitations or performance considerations hold significant weight.&lt;/p&gt;
&lt;p&gt;Finally, there are ongoing efforts towards integrating fidelity into our &lt;a class="reference external" href="https://blosc.org/btune"&gt;BTune AI tool&lt;/a&gt;. This enhancement will empower the tool to autonomously identify the most suitable codecs and filters, balancing compression level, precision, and &lt;strong&gt;fidelity&lt;/strong&gt; according to user-defined preferences. Keep an eye out for updates!&lt;/p&gt;
&lt;p&gt;Whether you're working with scientific data, multimedia content, or large-scale datasets, Blosc2 offers a comprehensive solution for efficient data compression and handling.&lt;/p&gt;
&lt;section id="special-thanks-to-sponsors-and-developers"&gt;
&lt;h3&gt;Special thanks to sponsors and developers&lt;/h3&gt;
&lt;p&gt;Gratitude goes out to our sponsors over the years, with special recognition to the &lt;a class="reference external" href="https://www.leaps-innov.eu/"&gt;LEAPS collaboration&lt;/a&gt; and &lt;a class="reference external" href="https://numfocus.org"&gt;NumFOCUS&lt;/a&gt;, whose support has been instrumental in advancing the lossy compression capabilities within Blosc2.&lt;/p&gt;
&lt;p&gt;The Blosc2 project is the outcome of the work of &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/graphs/contributors"&gt;many developers&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;</description><category>blosc2 lossy compression</category><guid>https://blosc.org/posts/blosc2-lossy-compression/</guid><pubDate>Tue, 13 Feb 2024 01:32:20 GMT</pubDate></item><item><title>Bytedelta: Enhance Your Compression Toolset</title><link>https://blosc.org/posts/bytedelta-enhance-compression-toolset/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;&lt;cite&gt;Bytedelta&lt;/cite&gt; is a new filter that calculates the difference between bytes
in a data stream.  Combined with the shuffle filter, it can improve compression
for some datasets.  Bytedelta is based on &lt;a class="reference external" href="https://aras-p.info/blog/2023/03/01/Float-Compression-7-More-Filtering-Optimization/"&gt;initial work by Aras Pranckevičius&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: We have a brief introduction to bytedelta in the 3rd section of
&lt;a class="reference external" href="https://www.blosc.org/docs/Blosc2-WP7-LEAPS-Innov-2023.pdf"&gt;this presentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The basic concept is simple: after applying the shuffle filter,&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/shuffle-filter.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/shuffle-filter.png" style="width: 75%;"&gt;
&lt;p&gt;then compute the difference for each byte in the byte streams (also called splits in Blosc terminology):&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/bytedelta-filter.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/bytedelta-filter.png" style="width: 75%;"&gt;
&lt;p&gt;The key insight enabling the bytedelta algorithm lies in its implementation, especially the use of SIMD on Intel/AMD and ARM NEON CPUs, making the filter overhead minimal.&lt;/p&gt;
&lt;p&gt;Although Aras's original code implemented shuffle and bytedelta together, it was limited to a specific item size (4 bytes). Making it more general would require significant effort.  Instead, for Blosc2 we built on the existing shuffle filter and created a new one that just does bytedelta. When we insert both in the &lt;a class="reference external" href="https://www.blosc.org/docs/Blosc2-Intro-PyData-Global-2021.pdf"&gt;Blosc2 filter pipeline&lt;/a&gt; (it supports up to 6 chained filters), it leads to a completely general filter that works for any type size supported by existing shuffle filter.&lt;/p&gt;
&lt;p&gt;With that said, the &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/pull/456"&gt;implementation of the bytedelta filter&lt;/a&gt; has been a breeze thanks to the &lt;a class="reference external" href="https://www.blosc.org/posts/registering-plugins/"&gt;plugin support in C-Blosc2&lt;/a&gt;. You can also implement your own filters and codecs on your own, or if you are too busy, &lt;a class="reference external" href="mailto:contact@blosc.org"&gt;we will be happy to assist you&lt;/a&gt;.&lt;/p&gt;
&lt;section id="compressing-era5-datasets"&gt;
&lt;h2&gt;Compressing ERA5 datasets&lt;/h2&gt;
&lt;p&gt;The best approach to evaluate a new filter is to apply it to real data. For this, we will use some of the &lt;a class="reference external" href="https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5"&gt;ERA5 datasets&lt;/a&gt;, representing different measurements and labeled as "wind", "snow", "flux", "pressure" and "precip". They all contain floating point data (float32) and we will use a full month of each one, accounting for 2.8 GB for each dataset.&lt;/p&gt;
&lt;p&gt;The diverse datasets exhibit rather dissimilar complexity, which proves advantageous for testing diverse compression scenarios. For instance, the wind dataset appears as follows:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/wind-colormap.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/wind-colormap.png" style="width: 100%;"&gt;
&lt;p&gt;The image shows the intricate network of winds across the globe on October 1, 1987. The South American continent is visible on the right side of the map.&lt;/p&gt;
&lt;p&gt;Another example is the snow dataset:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/snow-colormap.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/snow-colormap.png" style="width: 100%;"&gt;
&lt;p&gt;This time the image is quite flat. Here one can spot Antarctica, Greenland, North America and of course, Siberia, which was pretty full of snow by 1987-10-01 23:00:00 already.&lt;/p&gt;
&lt;p&gt;Let's see how the new bytedelta filter performs when compressing these datasets.  All the plots below have been made using a box with an Intel i13900k processor, 32 GB of RAM and using Clear Linux.&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cratio-vs-filter.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cratio-vs-filter.png" style="width: 100%;"&gt;
&lt;p&gt;In the box plot above, we summarized the compression ratios for all datasets using different codecs (BLOSCLZ, LZ4, LZ4HC and ZSTD). The main takeaway is that using bytedelta yields the best median compression ratio: bytedelta achieves a median of 5.86x, compared to 5.62x for bitshuffle, 5.1x for shuffle, and 3.86x for codecs without filters.  Overall, bytedelta seems to improve compression ratios here, which is good news.&lt;/p&gt;
&lt;p&gt;While the compression ratio is a useful metric for evaluating the new bytedelta filter, there is more to consider. For instance, does the filter work better on some data sets than others? How does it impact the performance of different codecs? If you're interested in learning more, read on.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="effects-on-various-datasets"&gt;
&lt;h2&gt;Effects on various datasets&lt;/h2&gt;
&lt;p&gt;Let's see how different filters behave on various datasets:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cratio-vs-dset.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cratio-vs-dset.png" style="width: 100%;"&gt;
&lt;p&gt;Here we see that, for datasets that compress easily (precip, snow), the behavior is quite different from those that are less compressible. For precip, bytedelta actually worsens results, whereas for snow, it slightly improves them. For less compressible datasets, the trend is more apparent, as can be seen in this zoomed in image:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cratio-vs-dset-zoom.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cratio-vs-dset-zoom.png" style="width: 100%;"&gt;
&lt;p&gt;In these cases, bytedelta clearly provides a better compression ratio, most specifically with the pressure dataset, where compression ratio by using bytedelta has increased by 25% compared to the second best, bitshuffle (5.0x vs 4.0x, using ZSTD clevel 9). Overall, only one dataset (precip) shows an actual decrease. This is good news for bytedelta indeed.&lt;/p&gt;
&lt;p&gt;Furthermore, Blosc2 supports another compression parameter for splitting the compressed streams into bytes with the same significance. Normally, this leads to better speed but less compression ratio, so this is automatically activated for faster codecs, whereas it is disabled for slower ones. However, it turns out that, when we activate splitting for all the codecs, we find a welcome surprise: bytedelta enables ZSTD to find significantly better compression paths, resulting in higher compression ratios.&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cratio-vs-dset-always-split-zoom.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cratio-vs-dset-always-split-zoom.png" style="width: 100%;"&gt;
&lt;p&gt;As can be seen, in general ZSTD + bytedelta can compress these datasets better. For the pressure dataset in particular, it goes up to 5.7x, 37% more than the second best, bitshuffle (5.7x vs 4.1x, using ZSTD clevel 9).  Note also that this new highest is 14% more than without splitting (the default).&lt;/p&gt;
&lt;p&gt;This shows that when compressing, you cannot just trust your intuition for setting compression parameters - there is no substitute for experimentation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="effects-on-different-codecs"&gt;
&lt;h2&gt;Effects on different codecs&lt;/h2&gt;
&lt;p&gt;Now, let's see how bytedelta affects performance for different codecs and compression levels.&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cratio-vs-codec.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cratio-vs-codec.png" style="width: 100%;"&gt;
&lt;p&gt;Interestingly, on average bytedelta proves most useful for ZSTD and higher compression levels of ZLIB (Blosc2 comes with &lt;a class="reference external" href="https://github.com/zlib-ng/zlib-ng"&gt;ZLIB-NG&lt;/a&gt;). On the other hand, the fastest codecs (LZ4, BLOSCLZ) seem to benefit more from bitshuffle instead.&lt;/p&gt;
&lt;p&gt;Regarding compression speed, in general we can see that bytedelta has little effect on performance:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cspeed-vs-codec.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cspeed-vs-codec.png" style="width: 100%;"&gt;
&lt;p&gt;As we can see, compression algorithms like BLOSCLZ, LZ4 and ZSTD can achieve extremely high speeds. LZ4 reaches and surpasses speeds of 30 GB/s, even when using bytedelta. BLOSCLZ and ZSTD can also exceed 20 GB/s, which is quite impressive.&lt;/p&gt;
&lt;p&gt;Let’s see the compression speed grouped by compression levels:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cspeed-vs-codec-clevel.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cspeed-vs-codec-clevel.png" style="width: 100%;"&gt;
&lt;p&gt;Here one can see that, to achieve the highest compression rates when combined with shuffle and bytedelta, the codecs require significant CPU resources; this is especially noticeable in the zoomed-in view:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cspeed-vs-codec-clevel-zoom.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cspeed-vs-codec-clevel-zoom.png" style="width: 100%;"&gt;
&lt;p&gt;where capable compressors like ZSTD do require up to 2x more time to compress when using bytedelta, especially for high compression levels (6 and 9).&lt;/p&gt;
&lt;p&gt;Now, let us examine decompression speeds:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/dspeed-vs-codec.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/dspeed-vs-codec.png" style="width: 100%;"&gt;
&lt;p&gt;In general, decompression is faster than compression. BLOSCLZ, LZ4 and LZ4HC can achieve over 100 GB/s. BLOSCLZ reaches nearly 180 GB/s using no filters on the snow dataset (lowest complexity).&lt;/p&gt;
&lt;p&gt;Let’s see the decompression speed grouped by compression levels:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/dspeed-vs-codec-clevel.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/dspeed-vs-codec-clevel.png" style="width: 100%;"&gt;
&lt;p&gt;The bytedelta filter noticeably reduces speed for most codecs, up to 20% or more.  ZSTD performance is less impacted.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="achieving-a-balance-between-compression-ratio-and-speed"&gt;
&lt;h2&gt;Achieving a balance between compression ratio and speed&lt;/h2&gt;
&lt;p&gt;Often, you want to achieve a good balance of compression and speed, rather than extreme values of either. We will conclude by showing plots depicting a combination of both metrics and how bytedelta influences them.&lt;/p&gt;
&lt;p&gt;Let's first represent the compression ratio versus compression speed:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cratio-vs-cspeed.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cratio-vs-cspeed.png" style="width: 100%;"&gt;
&lt;p&gt;As we can see, the shuffle filter is typically found on the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Pareto_front"&gt;Pareto frontier&lt;/a&gt; (in this case, the point furthest to the right and top). Bytedelta comes next.  In contrast, not using a filter at all is on the opposite side.  This is typically the case for most real-world numerical datasets.&lt;/p&gt;
&lt;p&gt;Let's now group filters and datasets and calculate the mean values of combining
(in this case, multiplying) the compression ratio and compression speed for all codecs.&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cspeed-vs-filter.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cspeed-vs-filter.png" style="width: 100%;"&gt;
&lt;p&gt;As can be seen, bytedelta works best with the wind dataset (which is quite complex), while bitshuffle does a good job in general for the others. The shuffle filter wins on the snow dataset (low complexity).&lt;/p&gt;
&lt;p&gt;If we group by compression level:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cratio_x_cspeed-vs-codec-clevel.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cratio_x_cspeed-vs-codec-clevel.png" style="width: 100%;"&gt;
&lt;p&gt;We see that bytedelta works well with LZ4 here, and also with ZSTD at the lowest compression level (1).&lt;/p&gt;
&lt;p&gt;Let's revise the compression ratio versus decompression speed comparison:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cratio-vs-dspeed.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cratio-vs-dspeed.png" style="width: 100%;"&gt;
&lt;p&gt;Let's group together the datasets and calculate the mean for all codecs:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cratio_x_dspeed-vs-filter-dset.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cratio_x_dspeed-vs-filter-dset.png" style="width: 100%;"&gt;
&lt;p&gt;In this case, shuffle generally prevails, with bitshuffle also doing reasonably well, winning on precip and pressure datasets.&lt;/p&gt;
&lt;p&gt;Also, let’s group the data by compression level:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cratio_x_dspeed-vs-codec-clevel.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cratio_x_dspeed-vs-codec-clevel.png" style="width: 100%;"&gt;
&lt;p&gt;We find that bytedelta compression does not outperform shuffle compression in any scenario. This is unsurprising since decompression is typically fast, and bytedelta's extra processing can decrease performance more easily. We also see that LZ4HC (clevel 6 and 9) + shuffle strikes the best balance in this scenario.&lt;/p&gt;
&lt;p&gt;Finally, let's consider the balance between compression ratio, compression speed, and decompression speed:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cratio_x_cspeed_dspeed-vs-dset.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cratio_x_cspeed_dspeed-vs-dset.png" style="width: 100%;"&gt;
&lt;p&gt;Here the winners are shuffle and bitshuffle, depending on the data set, but bytedelta never wins.&lt;/p&gt;
&lt;p&gt;If we group by compression levels:&lt;/p&gt;
&lt;img alt="/images/bytedelta-enhance-compression-toolset/cratio_x_cspeed_dspeed-vs-codec-clevel.png" class="align-center" src="https://blosc.org/images/bytedelta-enhance-compression-toolset/cratio_x_cspeed_dspeed-vs-codec-clevel.png" style="width: 100%;"&gt;
&lt;p&gt;Overall, we see LZ4 as the clear winner at any level, especially when combined with shuffle. On the other hand, bytedelta did not win in any scenario here.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="benchmarks-for-other-computers"&gt;
&lt;h2&gt;Benchmarks for other computers&lt;/h2&gt;
&lt;p&gt;We have run the benchmarks presented here in an assortment of different boxes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.blosc.org/docs/era5-pds/plot_transcode_data-m1.html"&gt;MacBook Air with M1 processor and 8 GB RAM. MacOSX 13.1.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.blosc.org/docs/era5-pds/plot_transcode_data-m1.html"&gt;AMD Ryzen 9 5950X processor and 32 GB RAM. Debian 22.04.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.blosc.org/docs/era5-pds/plot_transcode_data-i10k.html"&gt;Intel i9-10940X processor and 64 GB RAM. Debian 22.04.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.blosc.org/docs/era5-pds/plot_transcode_data-i13k.html"&gt;Intel i9-13900K processor and 32 GB RAM. Clear Linux.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Also, find here a couple of runs using the i9-13900K box above, but with the always split and never split settings:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.blosc.org/docs/era5-pds/plot_transcode_data-i13k-always-split.html"&gt;Intel i9-13900K. Always Split.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.blosc.org/docs/era5-pds/plot_transcode_data-i13k-never-split.html"&gt;Intel i9-13900K. Never Split.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Reproducing the benchmarks is straightforward. First, &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/download_data.py"&gt;download the data&lt;/a&gt;; the downloaded files will be in the new &lt;cite&gt;era5_pds/&lt;/cite&gt; directory.  Then perform &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/transcode_data.py"&gt;the series of benchmarks&lt;/a&gt;; this is takes time, so grab coffee and wait 30 min (fast workstations) to 6 hours (slow laptops).  Finally, run the &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/plot_transcode_data.ipynb"&gt;plotting Jupyter notebook&lt;/a&gt; to explore your results.  If you wish to share your results with the &lt;a class="reference external" href="mailto:contact@blosc.org"&gt;Blosc development team&lt;/a&gt;, we will appreciate hearing from you!&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Bytedelta can achieve higher compression ratios in most datasets, specially in combination with capable codecs like ZSTD, with a maximum gain of 37% (pressure) over other codecs; only in one case (precip) compression ratio decreases. By compressing data more efficiently, bytedelta can reduce file sizes even more, accelerating transfer and storage.&lt;/p&gt;
&lt;p&gt;On the other hand, while bytedelta excels at achieving high compression ratios, this requires more computing power. We have found that for striking a good balance between high compression and fast compression/decompression, other filters, particularly shuffle, are superior overall.&lt;/p&gt;
&lt;p&gt;We've learned that no single codec/filter combination is best for all datasets:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;ZSTD (clevel 9) + bytedelta can get better absolute compression ratio for most of the datasets.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LZ4 + shuffle is well-balanced for all metrics (compression ratio, speed, decompression speed).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LZ4 (clevel 6) and ZSTD (clevel 1) + shuffle strike a good balance of compression ratio and speed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LZ4HC (clevel 6 and 9) + shuffle balances well compression ratio and decompression speed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;BLOSCLZ without filters achieves best decompression speed (at least in one instance).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In summary, the optimal choice depends on your priorities.&lt;/p&gt;
&lt;p&gt;As a final note, the Blosc development team is working on BTune, a new deep learning tuner for Blosc2. BTune can be trained to automatically recognize different kinds of datasets and choose the optimal codec and filters to achieve the best balance, based on the user's needs. This would create a much more intelligent compressor that can adapt itself to your data faster, without requiring time-consuming manual tuning. If interested, &lt;a class="reference external" href="mailto:contact@blosc.org"&gt;contact us&lt;/a&gt;; we are looking for beta testers!&lt;/p&gt;
&lt;/section&gt;</description><category>Blosc2</category><category>bytedelta</category><category>filter</category><guid>https://blosc.org/posts/bytedelta-enhance-compression-toolset/</guid><pubDate>Fri, 24 Mar 2023 11:32:20 GMT</pubDate></item><item><title>100 Trillion Rows Baby</title><link>https://blosc.org/posts/100-trillion-baby/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;In recently released PyTables 3.8.0 we gave support for an optimized path for writing and reading Table instances with Blosc2 cooperating with the HDF5 machinery.  On the &lt;a class="reference external" href="https://www.blosc.org/posts/blosc2-pytables-perf"&gt;blog describing its implementation&lt;/a&gt; we have shown how it collaborates with the HDF5 library so as to get top-class I/O performance.&lt;/p&gt;
&lt;p&gt;Since then, we have been aware (thanks to &lt;a class="reference external" href="https://github.com/PyTables/PyTables/issues/991"&gt;Mark Kittisopikul&lt;/a&gt;) of the introduction of the &lt;cite&gt;H5Dchunk_iter&lt;/cite&gt; function in HDF5 1.14 series. This predates the functionality of &lt;cite&gt;H5Dget_chunk_info&lt;/cite&gt;, and makes retrieving the offsets of the chunks in the HDF5 file way more efficiently, specially on files with a large number of chunks - H5Dchunk_iter cost is O(n), whereas H5Dget_chunk_info is O(n^2).&lt;/p&gt;
&lt;p&gt;As we decided to implement support for &lt;cite&gt;H5Dchunk_iter&lt;/cite&gt; in PyTables, we were curious on the sort of boost this could provide reading tables created from real data.  Keep reading for the experiments we've conducted about this.&lt;/p&gt;
&lt;section id="effect-on-relatively-small-datasets"&gt;
&lt;h2&gt;Effect on (relatively small) datasets&lt;/h2&gt;
&lt;p&gt;We start by reading a table with real data coming from our usual &lt;a class="reference external" href="https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5"&gt;ERA5 database&lt;/a&gt;.  We fetched one year (2000 to be specific) of data with five different ERA5 datasets with the same shape and the same coordinates (latitude, longitude and time). This data has been stored on a table with 8 columns with 32 bytes per row and with 9 millions rows (for a grand total of 270 GB); the number of chunks is about 8K.&lt;/p&gt;
&lt;p&gt;When using compression, the size is typically reduced between a factor of 6x (LZ4 + shuffle) and  9x (Zstd + bitshuffle); in any case, the resulting file size is larger than the RAM available in our box (32 GB), so we can safely exclude OS filesystem caching effects here. Let's have a look at the results on reading this dataset inside PyTables (using shuffle only; for bitshuffle results are just a bit slower):&lt;/p&gt;
&lt;img alt="/images/100-trillion-baby/real-data-9Grow-seq.png" src="https://blosc.org/images/100-trillion-baby/real-data-9Grow-seq.png" style="width: 50%;"&gt;
&lt;img alt="/images/100-trillion-baby/real-data-9Grow-rand.png" src="https://blosc.org/images/100-trillion-baby/real-data-9Grow-rand.png" style="width: 50%;"&gt;
&lt;p&gt;We see how the improvement when using HDF5 1.14 (and hence H5Dchunk_iter) for reading data sequentially (via a PyTables query) is not that noticeable, but for random queries, the speedup is way more apparent. For comparison purposes, we added the figures for Blosc1+LZ4; one can notice the great job of Blosc2, specially in terms of random reads due to the double partitioning and HDF5 pipeline replacement.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="a-trillion-rows-table"&gt;
&lt;h2&gt;A trillion rows table&lt;/h2&gt;
&lt;p&gt;But 8K chunks is not such a large figure, and we are interested in using datasets with a larger amount. As it is very time consuming to download large amounts of real data for our benchmarks purposes, we have decided to use synthetic data (basically, a bunch of zeros) just to explore how the new H5Dchunk_iter function scales when handling extremely large datasets in HDF5.&lt;/p&gt;
&lt;p&gt;Now we will be creating a large table with 1 trillion rows, with the same 8 fields than in the previous section, but whose values are zeros (remember, we are trying to push HDF5 / Blosc2 to their limits, so data content is not important here).  With that, we are getting a table with 845K chunks, which is about 100x more than in the previous section.&lt;/p&gt;
&lt;p&gt;With this, lets' have a look at the plots for the read speed:&lt;/p&gt;
&lt;img alt="/images/100-trillion-baby/synth-data-9Grow-seq.png" src="https://blosc.org/images/100-trillion-baby/synth-data-9Grow-seq.png" style="width: 50%;"&gt;
&lt;img alt="/images/100-trillion-baby/synth-data-9Grow-rand.png" src="https://blosc.org/images/100-trillion-baby/synth-data-9Grow-rand.png" style="width: 50%;"&gt;
&lt;p&gt;As expected, we are getting significantly better results when using HDF5 1.14 (with H5Dchunk_iter) in both sequential and random cases.  For comparison purposes, we have added Blosc1-Zstd which does not make use of the new functionality. In particular, note how Blosc1 gets better results for random reads than Blosc2 with HDF5 1.12; as this is somehow unexpected, if you have an explanation, please chime in.&lt;/p&gt;
&lt;p&gt;It is worth noting that even though the data are made of zeros, Blosc2 still needs to compress/decompress the full 32 TB thing.  And the same goes for numexpr, which is used internally to perform the computations for the query in the sequential read case.  This is testimonial of the optimization efforts in the data flow (i.e. avoiding as much memory copies as possible) inside PyTables.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="trillion-rows-baby"&gt;
&lt;h2&gt;100 trillion rows baby&lt;/h2&gt;
&lt;p&gt;As a final exercise, we took the previous experiment to the limit, and made a table with 100 trillion (that’s a 1 followed with 14 zeros!) rows and measured different interesting aspects.  It is worth noting that the total size for this case is 2.8 PB (&lt;strong&gt;petabyte&lt;/strong&gt;), and the number of chunks is around 85 millions (finally, large enough to fully demonstrate the scalability of the new H5Dchunk_iter functionality).&lt;/p&gt;
&lt;p&gt;Here it is the speed of random and sequential reads:&lt;/p&gt;
&lt;img alt="/images/100-trillion-baby/synth-data-100Trow-seq.png" src="https://blosc.org/images/100-trillion-baby/synth-data-100Trow-seq.png" style="width: 50%;"&gt;
&lt;img alt="/images/100-trillion-baby/synth-data-100Trow-rand.png" src="https://blosc.org/images/100-trillion-baby/synth-data-100Trow-rand.png" style="width: 50%;"&gt;
&lt;p&gt;As we can see, despite the large amount of chunks, the sequential read speed actually improved up to more than 75 GB/s.  Regarding the random read latency, it increased to 60 µs; this is not too bad actually, as in real life the latencies during random reads in such a large files are determined by the storage media, which is no less than 100 µs for the fastest SSDs nowadays.&lt;/p&gt;
&lt;p&gt;The script that creates the table and reads it can be found at &lt;a class="reference external" href="https://github.com/PyTables/PyTables/blob/master/bench/100-trillion-baby.py"&gt;bench/100-trillion-rows-baby.py&lt;/a&gt;.  For the curious, it took about 24 hours to run on a Linux box wearing an Intel 13900K CPU with 32 GB of RAM. The memory consumption during writing was about 110 MB, whereas for reading was 1.7 GB steadily (pretty good for a multi-petabyte table).  The final size for the file has been 17 GB, for a compression ratio of more than 175000x.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;As we have seen, the H5Dchunk_iter function recently introduced in HDF5 1.14 is confirmed to be of a big help in performing reads more efficiently.  We have also demonstrated that scalability is excellent, reaching phenomenal sequential speeds (exceeding 75 GB/s with synthetic data) that cannot be easily achieved by the most modern I/O subsystems, and hence avoiding unnecessary bottlenecks.&lt;/p&gt;
&lt;p&gt;Indeed, the combo HDF5 / Blosc2 is able to handle monster sized tables (on the petabyte ballpark) without becoming a significant bottleneck in performance.  Not that you need to handle such a sheer amount of data anytime soon, but it is always reassuring to use a tool that is not going to take a step back in daunting scenarios like this.&lt;/p&gt;
&lt;p&gt;If you regularly store and process large datasets and need advice to partition your data, or choosing the best combination of codec, filters, chunk and block sizes, or many other aspects of compression, do not hesitate to contact the Blosc team at &lt;cite&gt;contact (at) blosc.org&lt;/cite&gt;.  We have more than 30 years of cumulated experience in storage systems like HDF5, Blosc and efficient I/O in general; but most importantly, we have the ability to integrate these innovative technologies quickly into your products, enabling a faster access to these innovations.&lt;/p&gt;
&lt;/section&gt;</description><category>pytables blosc2 hdf5</category><guid>https://blosc.org/posts/100-trillion-baby/</guid><pubDate>Fri, 10 Feb 2023 10:32:20 GMT</pubDate></item><item><title>20 years of PyTables</title><link>https://blosc.org/posts/pytables-20years/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;Back in October 2002 the first version of &lt;a class="reference external" href="https://www.pytables.org"&gt;PyTables&lt;/a&gt; was released.  It was an attempt to store a large amount of tabular data while being able to provide a hierarchical structure around it.  Here it is the first public announcement by me:&lt;/p&gt;
&lt;pre class="literal-block"&gt;Hi!,

PyTables is a Python package which allows dealing with HDF5 tables.
Such a table is defined as a collection of records whose values are
stored in fixed-length fields.  PyTables is intended to be easy-to-use,
and tried to be a high-performance interface to HDF5.  To achieve this,
the newest improvements in Python 2.2 (like generators or slots and
metaclasses in brand-new classes) has been used.  Python creation
extension tool has been chosen to access the HDF5 library.

This package should be platform independent, but until now I’ve tested
it only with Linux.  It’s the first public release (v 0.1), and it is
in alpha state.&lt;/pre&gt;
&lt;p&gt;As noted, PyTables was an early adopter of generators and metaclasses that were introduced in the new (by that time) Python 2.2.  It turned out that generators demonstrated to be an excellent tool in many libraries related with data science. Also, Pyrex adoption (which was released just a &lt;a class="reference external" href="http://blog.behnel.de/posts/cython-is-20/"&gt;few months ago&lt;/a&gt;) greatly simplified the wrapping of native C libraries like HDF5.&lt;/p&gt;
&lt;p&gt;By that time there were not that much Python libraries for persisting tabular data with a format that allowed on-the-flight compression, and that gave PyTables a chance to be considered as a good option.  Some months later, PyCon 2003 accepted our &lt;a class="reference external" href="http://www.pytables.org/docs/pycon2003.pdf"&gt;first talk about PyTables&lt;/a&gt;.  Since then, we (mainly me, with the support from Scott Prater on the documentation part) gave several presentations in different international conferences, like SciPy or EuroSciPy and its popularity skyrocketed somehow.&lt;/p&gt;
&lt;section id="carabos-coop-v"&gt;
&lt;h2&gt;Cárabos Coop. V.&lt;/h2&gt;
&lt;p&gt;In 2005, and after receiving some good inputs on PyTables by some customers (including &lt;a class="reference external" href="https://www.hdfgroup.org"&gt;The HDF Group&lt;/a&gt;), we decided to try to make a life out of PyTables development and together with Vicent Mas and &lt;a class="reference external" href="https://elvil.net"&gt;Ivan Vilata&lt;/a&gt;, we set out to create a cooperative called Cárabos Coop V.  Unfortunately, and after 3 years of enthusiastic (and hard) work, we did not succeed in making the project profitable, and we had to close by 2008.&lt;/p&gt;
&lt;p&gt;During this period we managed to make a professional version of PyTables that was using out-of core indexes (aka OPSI) as well as a GUI called &lt;a class="reference external" href="https://vitables.org"&gt;ViTables&lt;/a&gt;.  After closing Cárabos we open sourced both technologies, and we are happy to say that they are still in good use, most specially &lt;a class="reference external" href="https://www.pytables.org/docs/OPSI-indexes.pdf"&gt;OPSI indexes&lt;/a&gt;, that are meant to &lt;a class="reference external" href="http://www.pytables.org/usersguide/optimization.html#indexed-searches"&gt;perform fast queries in very large datasets&lt;/a&gt;; OPSI can still be used straight from pandas.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="crew-renewal"&gt;
&lt;h2&gt;Crew renewal&lt;/h2&gt;
&lt;p&gt;After Cárabos closure, I (Francesc Alted) continued to maintain PyTables for a while, but in 2010 I expressed my desire to handover the project, and shortly after, a new gang of people, including Anthony Scopatz and Antonio Valentino, with Andrea Bedini joining shortly after, stepped ahead and took the challenge.  This is where open source is strong: whenever a project faces difficulties, there are always people eager to jump up to the wagon and continue providing traction for it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="attempt-to-merge-with-h5py"&gt;
&lt;h2&gt;Attempt to merge with h5py&lt;/h2&gt;
&lt;p&gt;Meanwhile, the &lt;a class="reference external" href="http://www.h5py.org"&gt;h5py package&lt;/a&gt; was receiving a great adoption, specially from the community that valued more the multidimensional arrays than the tabular side of the things.  There was a feeling that we were duplicating efforts and by 2016, Andrea Bedini, with the help of Anthony Scopatz, organized a &lt;a class="reference external" href="https://curtinic.github.io/python-and-hdf5-hackfest/"&gt;HackFest in Perth, Australia&lt;/a&gt; where developers of the h5py and PyTables gathered to attempt a merge of the two projects.  After the initial work there, we continued this effort with a grant from NumFOCUS.&lt;/p&gt;
&lt;p&gt;Unfortunately, the effort demonstrated to be rather complex, and we could not finish it properly (for the sake of curiosity, the attempt  &lt;a class="reference external" href="https://github.com/PyTables/PyTables/pull/634"&gt;is still available&lt;/a&gt;).  At any rate, we are actively encouraging people using both packages depending on the need; see for example, the &lt;a class="reference external" href="https://github.com/tomkooij/scipy2017"&gt;tutorial on h5py/PyTables&lt;/a&gt;  that Tom Kooij taught at SciPy 2017.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="satellite-projects-blosc-and-numexpr"&gt;
&lt;h2&gt;Satellite Projects: Blosc and numexpr&lt;/h2&gt;
&lt;p&gt;As many other open sources libraries, PyTables stands in the shoulders of giants, and makes use of amazing libraries like HDF5 or NumPy for doing its magic.  In addition to that, and in order to allow PyTables push against the hardware I/O and computational limits, it leverages two high-performance packages: &lt;a class="reference external" href="https://www.blosc.org"&gt;Blosc&lt;/a&gt; and &lt;a class="reference external" href="https://github.com/pydata/numexpr"&gt;numexpr&lt;/a&gt;.  Blosc is in charge of compressing data efficiently and at very high speeds to overcome the limits imposed by the I/O subsystem, while numexpr allows to get maximum performance from computations in CPU when querying large tables.  Both projects have been substantially improved by the PyTables crew, and actually, they are quite popular by themselves.&lt;/p&gt;
&lt;p&gt;Specifically, the Blosc compressor, although born out of the needs of PyTables, it spun off as a standalone compressor (or meta-compressor, as it can use several codecs internally) meant to &lt;a class="reference external" href="https://www.blosc.org/pages/blosc-in-depth/"&gt;accelerate not just disk I/O, but also memory access in general&lt;/a&gt;.  In an unexpected twist, &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2"&gt;Blosc2&lt;/a&gt;, has developed its own &lt;a class="reference external" href="https://www.blosc.org/posts/blosc2-ndim-intro/"&gt;multi-level data partitioning system&lt;/a&gt;, which goes beyond the single-level partitions in HDF5, and is &lt;a class="reference external" href="https://www.blosc.org/posts/blosc2-pytables-perf/"&gt;currently helping PyTables&lt;/a&gt; to reach new performance heights. By teaming with the HDF5 library (and hence PyTables), Blosc2 is allowing PyTables to &lt;a class="reference external" href="https://www.blosc.org/posts/100-trillion-baby/"&gt;query 100 trillion rows in human timeframes&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="thank-you"&gt;
&lt;h2&gt;Thank you!&lt;/h2&gt;
&lt;p&gt;It has been a long way since PyTables started 20 years ago.  We are happy to have helped in providing a useful framework for data storage and querying needs for many people during the journey.&lt;/p&gt;
&lt;p&gt;Many thanks to all maintainers and contributors (either with code or donations) to the project; they are too numerous to mention them all here, but if you are reading this and are among them, you should be proud to have contributed to PyTables. In hindsight, the road may have been certainly bumpy, but it somehow worked and many difficulties have been surpassed; such is the magic and grace of Open Source!&lt;/p&gt;
&lt;/section&gt;</description><category>pytables 20years</category><guid>https://blosc.org/posts/pytables-20years/</guid><pubDate>Sat, 31 Dec 2022 12:32:20 GMT</pubDate></item><item><title>C-Blosc2 Ready for General Review</title><link>https://blosc.org/posts/blosc2-ready-general-review/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;On behalf of the Blosc team, we are happy to announce the &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/releases/tag/v2.0.0.rc1"&gt;first C-Blosc2
release (Release Candidate 1)&lt;/a&gt;
that is meant to be reviewed by users.  As of now
we are declaring both the API and the format frozen, and we are seeking for
feedback from the community so as to better check the library and declare it
apt for its use in production.&lt;/p&gt;
&lt;section id="some-history"&gt;
&lt;h2&gt;Some history&lt;/h2&gt;
&lt;p&gt;The next generation Blosc (aka Blosc2) started back in 2015 as a way
to overcome some limitations of the Blosc compressor, mainly the limitation
of 2 GB for the size of data to be compressed.  But it turned out that I wanted
to make thinks a bit more complete, and provide a native serialization too.
During that process Google awarded my contributions to Blosc with the
&lt;a class="reference external" href="https://www.blosc.org/posts/prize-push-Blosc2/"&gt;Open Source Peer Bonus Program&lt;/a&gt; in 2017.
This award represented a big emotional push for me in
persisting in the efforts towards producing a stable release.&lt;/p&gt;
&lt;p&gt;Back in 2018, Zeeman Wang from Huawei invited me to go to their central headquarters in Shenzhen to meet
a series of developers that were trying to use compression in a series of scenarios.
During two weeks we had a series of productive meetings, and I got aware of the many
possibilities that compression is opening in industry: since making phones with
limited hardware to work faster to accelerate computations on high-end computers.
That was also a great opportunity for me to better know a millennial culture; I was
genuinely interested to see how people live, eat and socialize in China.&lt;/p&gt;
&lt;p&gt;In 2020, &lt;a class="reference external" href="https://www.blosc.org/posts/blosc-donation/"&gt;Huawei graciously offered a grant to the Blosc project&lt;/a&gt; to complete the project.  Since then,
we have got donations from several other sources (like NumFOCUS, Python Software Foundation,
ESRF among them).  Lately &lt;a class="reference external" href="https://ironarray.io"&gt;ironArray&lt;/a&gt; is sponsoring
two of us (Aleix Alcacer and myself) to work partial time on Blosc related projects.&lt;/p&gt;
&lt;p&gt;Thanks to all this support, the Blosc development team has been able to grow quite a lot (we are currently 5 people in the core team) and we
have been able to work hard at producing a series of improvements in different projects under the Blosc umbrella, in particular &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2"&gt;C-Blosc2&lt;/a&gt;,
&lt;a class="reference external" href="https://github.com/Blosc/python-blosc2"&gt;Python-Blosc2&lt;/a&gt;,
&lt;a class="reference external" href="https://github.com/Blosc/caterva"&gt;Caterva&lt;/a&gt; and &lt;a class="reference external" href="https://github.com/Blosc/cat4py"&gt;cat4py&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As you see, there is a lot of development going on around C-Blosc2 other than C-Blosc2 itself.  In this installment I am going to focus just on the main features that C-Blosc2 is bringing, but hopefully all the other projects in the ecosystem will also complement its existing functionality.  When all these projects would be ready, we hope that users will be able to use them to store big amounts of data in a way that is both efficient, easy-to-use and most importantly, adapted to their needs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="new-features-of-c-blosc2"&gt;
&lt;h2&gt;New features of C-Blosc2&lt;/h2&gt;
&lt;p&gt;Here it is the list of the main features that we are releasing today:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;64-bit containers:&lt;/strong&gt; the first-class container in C-Blosc2 is the &lt;cite&gt;super-chunk&lt;/cite&gt; or, for brevity, &lt;cite&gt;schunk&lt;/cite&gt;, that is made by smaller chunks which are essentially C-Blosc1 32-bit containers.  The super-chunk can be backed or not by another container which is called a &lt;cite&gt;frame&lt;/cite&gt; (see later).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;More filters:&lt;/strong&gt; besides &lt;cite&gt;shuffle&lt;/cite&gt; and &lt;cite&gt;bitshuffle&lt;/cite&gt; already present in C-Blosc1, C-Blosc2 already implements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;delta&lt;/cite&gt;: the stored blocks inside a chunk are diff'ed with respect to first block in the chunk.  The idea is that, in some situations, the diff will have more zeros than the original data, leading to better compression.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;trunc_prec&lt;/cite&gt;: it zeroes the least significant bits of the mantissa of float32 and float64 types.  When combined with the &lt;cite&gt;shuffle&lt;/cite&gt; or &lt;cite&gt;bitshuffle&lt;/cite&gt; filter, this leads to more contiguous zeros, which are compressed better.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A filter pipeline:&lt;/strong&gt; the different filters can be pipelined so that the output of one can the input for the other.  A possible example is a &lt;cite&gt;delta&lt;/cite&gt; followed by &lt;cite&gt;shuffle&lt;/cite&gt;, or as described above, &lt;cite&gt;trunc_prec&lt;/cite&gt; followed by &lt;cite&gt;bitshuffle&lt;/cite&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prefilters:&lt;/strong&gt; allows to apply user-defined C callbacks &lt;strong&gt;prior&lt;/strong&gt; the filter pipeline during compression.  See &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/blob/master/tests/test_prefilter.c"&gt;test_prefilter.c&lt;/a&gt; for an example of use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Postfilters:&lt;/strong&gt; allows to apply user-defined C callbacks &lt;strong&gt;after&lt;/strong&gt; the filter pipeline during decompression. The combination of prefilters and postfilters could be interesting for supporting e.g. encryption (via prefilters) and decryption (via postfilters).  Also, a postfilter alone can used to produce on-the-flight computation based on existing data (or other metadata, like e.g. coordinates). See &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/blob/master/tests/test_postfilter.c"&gt;test_postfilter.c&lt;/a&gt; for an example of use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SIMD support for ARM (NEON):&lt;/strong&gt; this allows for faster operation on ARM architectures.  Only &lt;cite&gt;shuffle&lt;/cite&gt; is supported right now, but the idea is to implement &lt;cite&gt;bitshuffle&lt;/cite&gt; for NEON too.  Thanks to Lucian Marc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SIMD support for PowerPC (ALTIVEC):&lt;/strong&gt; this allows for faster operation on PowerPC architectures.  Both &lt;cite&gt;shuffle&lt;/cite&gt;  and &lt;cite&gt;bitshuffle&lt;/cite&gt; are supported; however, this has been done via a transparent mapping from SSE2 into ALTIVEC emulation in GCC 8, so performance could be better (but still, it is already a nice improvement over native C code; see PR &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/pull/59"&gt;https://github.com/Blosc/c-blosc2/pull/59&lt;/a&gt; for details).  Thanks to Jerome Kieffer and &lt;a class="reference external" href="https://www.esrf.fr"&gt;ESRF&lt;/a&gt; for sponsoring the Blosc team in helping him in this task.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dictionaries:&lt;/strong&gt; when a block is going to be compressed, C-Blosc2 can use a previously made dictionary (stored in the header of the super-chunk) for compressing all the blocks that are part of the chunks.  This usually improves the compression ratio, as well as the decompression speed, at the expense of a (small) overhead in compression speed.  Currently, it is only supported in the &lt;cite&gt;zstd&lt;/cite&gt; codec, but would be nice to extend it to &lt;cite&gt;lz4&lt;/cite&gt; and &lt;cite&gt;blosclz&lt;/cite&gt; at least.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Contiguous frames:&lt;/strong&gt; allow to store super-chunks contiguously, either on-disk or in-memory.  When a super-chunk is backed by a frame, instead of storing all the chunks sparsely in-memory, they are serialized inside the frame container.  The frame can be stored on-disk too, meaning that persistence of super-chunks is supported.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sparse frames (on-disk):&lt;/strong&gt; each chunk in a super-chunk is stored in a separate file, as well as the metadata.  This is the counterpart of in-memory super-chunk, and allows for more efficient updates than in frames (i.e. avoiding 'holes' in monolithic files).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Partial chunk reads:&lt;/strong&gt; there is support for reading just part of chunks, so avoiding to read the whole thing and then discard the unnecessary data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Parallel chunk reads:&lt;/strong&gt; when several blocks of a chunk are to be read, this is done in parallel by the decompressing machinery.  That means that every thread is responsible to read, post-filter and decompress a block by itself, leading to an efficient overlap of I/O and CPU usage that optimizes reads to a maximum.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Meta-layers:&lt;/strong&gt; optionally, the user can add meta-data for different uses and in different layers.  For example, one may think on providing a meta-layer for &lt;a class="reference external" href="http://www.numpy.org"&gt;NumPy&lt;/a&gt; so that most of the meta-data for it is stored in a meta-layer; then, one can place another meta-layer on top of the latter for adding more high-level info if desired (e.g. geo-spatial, meteorological...).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Variable length meta-layers:&lt;/strong&gt; the user may want to add variable-length meta information that can be potentially very large (up to 2 GB). The regular meta-layer described above is very quick to read, but meant to store fixed-length and relatively small meta information.  Variable length metalayers are stored in the trailer of a frame, whereas regular meta-layers are in the header.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Efficient support for special values:&lt;/strong&gt; large sequences of repeated values can be represented with an efficient, simple and fast run-length representation, without the need to use regular codecs.  With that, chunks or super-chunks with values that are the same (zeros, NaNs or any value in general) can be built in constant time, regardless of the size.  This can be useful in situations where a lot of zeros (or NaNs) need to be stored (e.g. sparse matrices).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Nice markup for documentation:&lt;/strong&gt; we are currently using a combination of Sphinx + Doxygen + Breathe for documenting the C-API.  See &lt;a class="reference external" href="https://c-blosc2.readthedocs.io"&gt;https://c-blosc2.readthedocs.io&lt;/a&gt;.  Thanks to Alberto Sabater and Aleix Alcacer for contributing the support for this.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plugin capabilities for filters and codecs:&lt;/strong&gt; we have a plugin register capability inplace so that the info about the new filters and codecs can be persisted and transmitted to different machines.  Thanks to the NumFOCUS foundation for providing a grant for doing this.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pluggable tuning capabilities:&lt;/strong&gt; this will allow users with different needs to define an interface so as to better tune different parameters like the codec, the compression level, the filters to use, the blocksize or the shuffle size.  Thanks to ironArray for sponsoring us in doing this.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Support for I/O plugins:&lt;/strong&gt; so that users can extend the I/O capabilities beyond the current filesystem support.  Things like use databases or S3 interfaces should be possible by implementing these interfaces.  Thanks to ironArray for sponsoring us in doing this.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Python wrapper:&lt;/strong&gt;  we have a preliminary wrapper in the works.  You can have a look at our ongoing efforts in the &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2"&gt;python-blosc2 repo&lt;/a&gt;.  Thanks to the Python Software Foundation for providing a grant for doing this.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Security:&lt;/strong&gt; we are actively using using the &lt;a class="reference external" href="https://github.com/google/oss-fuzz"&gt;OSS-Fuzz&lt;/a&gt; and &lt;a class="reference external" href="https://oss-fuzz.com"&gt;ClusterFuzz&lt;/a&gt; for uncovering programming errors in C-Blosc2.  Thanks to Google for sponsoring us in doing this.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As you see, the list is long and hopefully you will find compelling enough features for your own needs.  Blosc2 is not only about speed, but also about
providing&lt;/p&gt;
&lt;/section&gt;
&lt;section id="tasks-to-be-done"&gt;
&lt;h2&gt;Tasks to be done&lt;/h2&gt;
&lt;p&gt;Even if the list of features above is long, we still have things to do in Blosc2; and the plan is to continue the development, although always respecting the existing API and format.  Here are some of the things in our TODO list:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Centralized plugin repository:&lt;/strong&gt; we have got a grant from NumFOCUS for implementing a centralized repository so that people can send their plugins (using the existing machinery) to the Blosc2 team.  If the plugins fulfill a series of requirements, they will be officially accepted, and distributed withing the library.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Improve the safety of the library:&lt;/strong&gt;  although this is always a work in progress, we did a long way in improving our safety, mainly thanks to the efforts of Nathan Moinvaziri.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Support for lossy compression codecs:&lt;/strong&gt; although we already support the &lt;cite&gt;trunc_prec&lt;/cite&gt; filter, this is only valid for floating point data; we should come with lossy codecs that are meant for any data type.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Checksums:&lt;/strong&gt; the frame can benefit from having a checksum per every chunk/index/metalayer.  This will provide more safety towards frames that are damaged for whatever reason.  Also, this would provide better feedback when trying to determine the parts of the frame that are corrupted.  Candidates for checksums can be the xxhash32 or xxhash64, depending on the goals (to be decided).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Documentation:&lt;/strong&gt; utterly important for attracting new users and making the life easier for existing ones.  Important points to have in mind here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Quality of API docstrings:&lt;/strong&gt; is the mission of the functions or data structures clearly and succinctly explained? Are all the parameters explained?  Is the return value explained?  What are the possible errors that can be returned?.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tutorials/book:&lt;/strong&gt; besides the API docstrings, more documentation materials should be provided, like tutorials or a book about Blosc (or at least, the beginnings of it).  Due to its adoption in GitHub and Jupyter notebooks, one of the most extended and useful markup systems is Markdown, so this should also be the first candidate to use here.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lock support for super-chunks:&lt;/strong&gt; when different processes are accessing concurrently to super-chunks, make them to sync properly by using locks, either on-disk (frame-backed super-chunks), or in-memory. Such a lock support would be configured in build time, so it could be disabled with a cmake flag.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It would be nice that, in case some of this feature (or a new one) sounds useful for you, you can help us in providing either code or sponsorship.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="summary"&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;Since 2015, it has been a long time to get C-Blosc2 so much featured and tested.
But hopefully the journey will continue because as &lt;a class="reference external" href="https://www.poetryfoundation.org/poems/51296/ithaka-56d22eef917ec"&gt;Kavafis said&lt;/a&gt;:&lt;/p&gt;
&lt;pre class="literal-block"&gt;As you set out for Ithaka
hope your road is a long one,
full of adventure, full of discovery.&lt;/pre&gt;
&lt;p&gt;Let me thank again all the people and sponsors that we have had during the life of the Blosc project; without them we would not be where we are now.  We do hope that C-Blosc2 will have a long life and we as a team will put our soul in making that trip to last as long as possible.&lt;/p&gt;
&lt;p&gt;Now is your turn.  We expect you to start testing the library as much as possible and report back.  With your help we can get C-Blosc2 in production stage hopefully very soon.  Thanks in advance!&lt;/p&gt;
&lt;/section&gt;</description><category>blosc2 release candidate</category><guid>https://blosc.org/posts/blosc2-ready-general-review/</guid><pubDate>Thu, 06 May 2021 10:32:20 GMT</pubDate></item><item><title>Mid 2020 Progress Report</title><link>https://blosc.org/posts/mid-2020-progress-report/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;2020 has been a year where the Blosc projects have received important donations, totalling an amount of $55,000 USD so far.  In the present report we list the most important tasks that have been carried out during the period that goes from January 2020 to August 2020.  Most of these tasks are related to the most fast-paced projects under development: C-Blosc2 and Caterva (including its cat4py wrapper).  Having said that, the Blosc development team has been active in other projects too (C-Blosc, python-blosc), although mainly for maintenance purposes.&lt;/p&gt;
&lt;p&gt;Besides, we also list the roadmap for the C-Blosc2, Caterva and cat4py projects that we plan to tackle during the next few months.&lt;/p&gt;
&lt;section id="c-blosc2"&gt;
&lt;h2&gt;C-Blosc2&lt;/h2&gt;
&lt;p&gt;C-Blosc2 adds new data containers, called superchunks, that are essentially a set of compressed chunks in memory that can be accessed randomly and enlarged during its lifetime.  Also, a new frame serialization layer has been added, so that superchunks can be persisted on disk, while keeping the same properties of superchunks in memory.  Finally, a metalayer capability allow for higher level containers to be created on top of superchunks/frames.&lt;/p&gt;
&lt;section id="highligths"&gt;
&lt;h3&gt;Highligths&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Maskout functionality.  This allows for selectively choose the blocks of a chunk that are going to be decompressed.  This paves the road for faster multidimensional slicing in Caterva (see below in the Caterva section).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Prefilters introduced and declared stable.  Prefilters allow for the user to pass C functions for performing arbitrary computations on a chunk prior to the filter/codec pipeline.  In addition, the C function can even have access to more chunks than just the one that is being compressed.  This opens the door to a way to operate with different super-chunks and produce a new one very efficiently. See &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/blob/master/tests/test_prefilter.c"&gt;https://github.com/Blosc/c-blosc2/blob/master/tests/test_prefilter.c&lt;/a&gt; for some examples of use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support for PowerPC/Altivec.  We added support for PowerPC SIMD (Altivec/VSX) instructions for faster operation of shuffle and bitshuffle filters.  For details, see &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/pull/98"&gt;https://github.com/Blosc/c-blosc2/pull/98&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improvements in compression ratio for LZ4/BloscLZ.  New processors are continually increasing the amount of memory in their caches.  In recent C-Blosc and C-Blosc2 releases we increased the size of the internal blocks so that LZ4/BloscLZ codecs have better opportunities for finding duplicates and hence, increasing their compression ratios.  But due to the increased cache sizes, performance has kept close to the original, fast speeds.  For some benchmarks, see &lt;a class="reference external" href="https://blosc.org/posts/beast-release/"&gt;https://blosc.org/posts/beast-release/&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New entropy probing method for BloscLZ.  BloscLZ is a native codec for Blosc whose mission is to be able to compress synthetic data efficiently.  Synthetic data can appear in multiple situations and having a codec that is meant to compress/decompress that with high compression ratios in a fast manner is important.  The new entropy probing method included in recent BloscLZ 2.3 (introduced in both C-Blosc and C-Blosc2) allows for even better compression ratios for highly compressible data, while giving up early when blocks are going to be difficult to compress at all.  For details see: &lt;a class="reference external" href="https://blosc.org/posts/beast-release/"&gt;https://blosc.org/posts/beast-release/&lt;/a&gt; too.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="roadmap-for-c-blosc2"&gt;
&lt;h3&gt;Roadmap for C-Blosc2&lt;/h3&gt;
&lt;p&gt;During the next few months, we plan to tackle the next tasks:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Postfilters.  The same way that prefilters allows to do user-defined computations prior to the compression pipeline, the postfilter would allow to do the same &lt;em&gt;after&lt;/em&gt; the decompression pipeline.  This could be useful in e.g. creating superchunks out of functions taking simple data as input (for example, a [min, max] range of values).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finalize the frame implementation.  Although the frame specification is almost complete (bar small modifications/additions), we still miss some features that are included in the specification, but not implemented yet.  An example of this is the fingerprint support at the end of the frames.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chunk insertion.  Right now only chunk appends are supported.  It should be possible to support chunk insertion in any position, and not only at the end of a superchunk.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Security.  Although we already started actions to improve the safety of the package using tools like OSS-Fuzz, this is an always work in progress task, and we plan indeed continuing improving it in the future.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wheels.  We would like to deliver wheels on every release soon.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="caterva-cat4py"&gt;
&lt;h2&gt;Caterva/cat4py&lt;/h2&gt;
&lt;p&gt;Caterva is a multidimensional container on top of C-Blosc2 containers.  It uses the metalayer capabilities present in superchunks/frames in order to store the multidimensionality information necessary to define arrays up to 8 dimensions and up to 2^63 elements.  Besides being able to create such arrays, Caterva provides functionality to get (multidimensional) slices of the arrays easyly and efficiently.  cat4py is the Python wrapper for Caterva.&lt;/p&gt;
&lt;section id="highligths-1"&gt;
&lt;h3&gt;Highligths&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Multidimensional blocks.  Chunks inside superchunk containers are endowed with a multidimensional structure so as to enable efficient slicing.  However, in many cases there is a tension between defining large chunks so as to reduce the amount of indexing to find chunks or smaller ones in order to avoid reading data that falls outside of a slice.  In order to reduce such a tension, we endowed the blocks inside chunks with a multidimensional structure too, so that the user has two parameters (chunkshape and blockshape) to play with in order to optimize I/O for their use case.  For an example of the kind of performance enhancements you can expect, see &lt;a class="reference external" href="https://htmlpreview.github.io/?https://github.com/Blosc/cat4py/blob/269270695d7f6e27e6796541709e98e2f67434fd/notebooks/slicing-performance.html"&gt;https://htmlpreview.github.io/?https://github.com/Blosc/cat4py/blob/269270695d7f6e27e6796541709e98e2f67434fd/notebooks/slicing-performance.html&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;API refactoring.  Caterva is a relatively young project, and its API grew up organically and hence, in a quite disorganized manner.  We recognized that and proceeded with a big API refactoring, trying to put more sense in the naming schema of the functions, as well as in providing a minimal set of C structs that allows for a simpler and better API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improved documentation.  A nice API is useless if it is not well documented, so we decided to put a significant amount of effort in creating high-quality documentation and examples so that the user can quickly figure out how to create and access Caterva containers with their own data.  Although this is still a work in progress, we are pretty happy with how docs are shaping up.  See &lt;a class="reference external" href="https://caterva.readthedocs.io/"&gt;https://caterva.readthedocs.io/&lt;/a&gt; and &lt;a class="reference external" href="https://cat4py.readthedocs.io/"&gt;https://cat4py.readthedocs.io/&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better Python integration (cat4py).  Python, specially thanks to the NumPy project, is a major player in handling multidimensional datasets, so have greatly bettered the integration of cat4py, our Python wrapper for Caterva, with NumPy.  In particular, we implemented support for the NumPy array protocol in cat4py containers, as well as an improved NumPy-esque API in cat4py package.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="roadmap-for-caterva-cat4py"&gt;
&lt;h3&gt;Roadmap for Caterva / cat4py&lt;/h3&gt;
&lt;p&gt;During the next months, we plan to tackle the next tasks:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Append chunks in any order. This will make it easier for the user to create arrays, since they will not be forced to use a row-wise order.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update array elements. With this, users will be able to update their arrays without having to make a copy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Resize array dimensions. This feature will allow Caterva to increase or decrease in size any dimension of the arrays.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wheels.  Once Caterva/cat4py would be in beta stage, we plan to deliver wheels on every release.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="final-thoughts"&gt;
&lt;h2&gt;Final thoughts&lt;/h2&gt;
&lt;p&gt;We are very grateful to our sponsors in 2020; they allowed us to implement what we think would be nice features for the whole Blosc ecosystem.  However, and although we did a lot of progress towards making C-Blosc2 and Caterva as featured and stable as possible, we still need to finalize our efforts so as to see both projects stable enough to allow them to be used in production.  Our expectation is to release a 2.0.0 (final) release for C-Blosc2 by the end of the year, whereas Caterva (and cat4py) should be declared stable during 2021.&lt;/p&gt;
&lt;p&gt;Also, we are happy to have enrolled new members on Blosc crew: Óscar Griñón, who proved to be instrumental in implementing the multidimensional blocks in Caterva and Nathan Moinvaziri, who is making great strides in making C-Blosc and C-Blosc2 more secure.  Thanks guys!&lt;/p&gt;
&lt;p&gt;Hopefully 2021 will also be a good year for seeing the Blosc ecosystem to evolve.  If you are interested on what we are building and want to help, we are open to any kind of contribution, including &lt;a class="reference external" href="https://blosc.org/pages/donate/"&gt;donations&lt;/a&gt;.  Thank you for your interest!&lt;/p&gt;
&lt;/section&gt;</description><category>blosc progress report grants</category><guid>https://blosc.org/posts/mid-2020-progress-report/</guid><pubDate>Thu, 27 Aug 2020 12:32:20 GMT</pubDate></item><item><title>C-Blosc Beast Release</title><link>https://blosc.org/posts/beast-release/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;&lt;strong&gt;TL;DR;&lt;/strong&gt; The improvements in new CPUs allow for more cores and (much) larger caches. Latest C-Blosc release leverages these facts so as to allow better compression ratios, while keeping the speed on par with previous releases.&lt;/p&gt;
&lt;p&gt;During the past two months we have been working hard at increasing the efficiency of Blosc for the new processors that are coming with more cores than ever before (8 can be considered quite normal, even for laptops, and 16 is not that unusual for rigs).  Furthermore, their caches are increasing beyond limits that we thought unthinkable just a few years ago (for example, AMD is putting 64 MB in L3 for their mid-range Ryzen2 39x0 processors).  This is mainly a consequence of the recent introduction of the 7nm process for both ARM and AMD64 architectures.  It turns out that compression ratios are quite dependent on the sizes of the streams to compress, so having access to more cores and significantly larger caches, it was clear that Blosc was in a pressing need to catch-up and fine-tune its performance for such a new 'beasts'.&lt;/p&gt;
&lt;p&gt;So, the version released today (&lt;a class="reference external" href="https://github.com/Blosc/c-blosc/releases/tag/v1.20.0"&gt;C-Blosc 1.20.0&lt;/a&gt;) has been carefully fine-tuned to take the most of recent CPUs, specially for fast codecs, where even if speed is more important than compression ratio, the latter is still a very important parameter.  With that, we decided to increase the amount of every compressed stream in a block from 64 KB to 256 KB (most of CPUs nowadays have this amount of private L2 cache or even larger).   Also, it is important to allow a minimum of shared L3 cache to every thread so that they do not have to compete for resources, so a new restriction has been added so that no thread has to deal with streams larger than 1 MB (both old and modern CPUs seem to guarantee that they provide at least this amount of L3 per thread).&lt;/p&gt;
&lt;p&gt;Below you will find the net effects of this new fine-tuning of fast codecs like LZ4 and BloscLZ on our AMD 3900X box (12 physical cores, 64 MB L3).  Here we will be comparing results from C-Blosc 1.18.1 and C-Blosc 1.20.0 (we will skip the comparison against 1.19.x because this can be considered an intermediate release in our pursuit).  Spoiler: you will be seeing an important boost of compression ratios, while the high speed of LZ4 and BloscLZ codecs is largely kept.&lt;/p&gt;
&lt;p&gt;On the plots below, on the left is the performance of 1.18.1 release, whereas on the right is the performance of the new 1.20.0 release.&lt;/p&gt;
&lt;section id="effects-in-lz4"&gt;
&lt;h2&gt;Effects in LZ4&lt;/h2&gt;
&lt;p&gt;Let's start by looking at how the new fine tuning affected &lt;em&gt;compression&lt;/em&gt; performance:&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;img alt="lz4-c-before" src="https://blosc.org/images/beast-release/ryzen12-lz4-1.18.1-c.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;img alt="lz4-c-after" src="https://blosc.org/images/beast-release/ryzen12-lz4-1.20.0-c.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Look at how much compression ratio has improved.  This is mainly a consequence of using compression streams of up to 256 KB, instead of the previous 64 KB --incidentally, this is just for this synthetic data, but it is clear that real data is going to be benefited as well; besides, synthetic data is something that frequently appears in data science (e.g. a uniformly spaced array of values).  One can also see that compression speed has not dropped in general which is great considering that we allow for much better compression ratios now.&lt;/p&gt;
&lt;p&gt;Regarding decompression we can see a similar pattern:&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;img alt="lz4-d-before" src="https://blosc.org/images/beast-release/ryzen12-lz4-1.18.1-d.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;img alt="lz4-d-after" src="https://blosc.org/images/beast-release/ryzen12-lz4-1.20.0-d.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;So the decompression speed is generally the same, even for data that can be compressed with high compression ratios.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="effects-in-blosclz"&gt;
&lt;h2&gt;Effects in BloscLZ&lt;/h2&gt;
&lt;p&gt;Now it is the turn for BloscLZ.  Similarly to LZ4, this codec is also meant for speed, but another reason for its existence is that it usually provides better compression ratios than LZ4 when using synthetic data.  In that sense, BloscLZ complements well LZ4 because the latter can be used for real data, whereas BloscLZ is usually a better bet for highly repetitive synthetic data.  In new C-Blosc we have introduced BloscLZ 2.3.0 which brings a brand new entropy detector which will disable compression early when entropy is high, allowing to selectively put CPU cycles where there are more low-hanging data compression opportunities.&lt;/p&gt;
&lt;p&gt;Here it is how performance changes for &lt;em&gt;compression&lt;/em&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;img alt="blosclz-c-before" src="https://blosc.org/images/beast-release/ryzen12-blosclz-1.18.1-c.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;img alt="blosclz-c-after" src="https://blosc.org/images/beast-release/ryzen12-blosclz-1.20.0-c.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In this case, the compression ratio has improved a lot too, and even if compression speed suffers a bit for small compression levels, it is still on par to the original speed for higher compression levels (compressing at more than 30 GB/s while reaching large compression ratios is a big achievement indeed).&lt;/p&gt;
&lt;p&gt;Regarding decompression we have this:&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;img alt="blosclz-d-before" src="https://blosc.org/images/beast-release/ryzen12-blosclz-1.18.1-d.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;img alt="blosclz-d-after" src="https://blosc.org/images/beast-release/ryzen12-blosclz-1.20.0-d.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As usual for the new release, the decompression speed is generally the same, and performance can still exceed 80 GB/s for the whole range of compression levels.  Also noticeable is that fact that single-thread speed is pretty competitive with a regular &lt;cite&gt;memcpy()&lt;/cite&gt;.  Again, Ryzen2 architecture is showing its muscle here.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="final-thoughts"&gt;
&lt;h2&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;Due to technological reasons, CPUs are evolving towards having more cores and larger caches.  Hence, compressors and specially Blosc, has to adapt to the new status quo.  With the new parametrization and new algorithms (early entropy detector) introduced today, we can achieve much better results.  In new Blosc you can expect a good bump in compression ratios with fast codecs (LZ4, BloscLZ) while keeping speed as good as always.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="appendix-hardware-and-software-used"&gt;
&lt;h2&gt;Appendix: Hardware and Software Used&lt;/h2&gt;
&lt;p&gt;For reference, here it is the software that has been used for this blog entry:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hardware&lt;/strong&gt;: AMD Ryzen2 3900X, 12 physical cores, 64 MB L3, 32 GB RAM.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OS&lt;/strong&gt;: Ubuntu 20.04&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compiler&lt;/strong&gt;: Clang 10.0.0&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;C-Blosc&lt;/strong&gt;: 1.18.1 (2020-03-29) and 1.20.0 (2020-07-25)&lt;/p&gt;
&lt;p&gt;** Enjoy Data!**&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;</description><category>blosc performance tuning</category><guid>https://blosc.org/posts/beast-release/</guid><pubDate>Sat, 25 Jul 2020 14:32:20 GMT</pubDate></item></channel></rss>