<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Blosc Home Page  (Posts about blosc2 beta)</title><link>https://blosc.org/</link><description></description><atom:link href="https://blosc.org/categories/blosc2-beta.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 &lt;a href="mailto:blosc@blosc.org"&gt;The Blosc Developers&lt;/a&gt; </copyright><lastBuildDate>Wed, 04 Mar 2026 11:43:34 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>C-Blosc2 Enters Beta Stage</title><link>https://blosc.org/posts/blosc2-first-beta/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;The first beta version of C-Blosc2 has been released today.  C-Blosc2 is the new iteration of C-Blosc 1.x series, adding more features and better documentation and is the outcome of more than 4 years of slow, but steady development.  This blog entry describes the main features that you may see in next generation of C-Blosc, as well as an overview of what is in our roadmap.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note 1&lt;/strong&gt;: C-Blosc2 is currently in beta stage, so not ready to be used in production yet.  Having said this, being in beta means that the API has been declared frozen, so there is guarantee that your programs will continue to work with future versions of the library.  If you want to collaborate in this development, you are welcome: have a look at our roadmap below and contribute PR's or just go to the &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/issues"&gt;open issues&lt;/a&gt; and help us with them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note 2&lt;/strong&gt;: the term &lt;cite&gt;C-Blosc1&lt;/cite&gt; will be used instead of the official &lt;cite&gt;C-Blosc&lt;/cite&gt; name for referring to the 1.x series of the library.  This is to make the distinction between the C-Blosc 2.x series and C-Blosc 1.x series more explicit.&lt;/p&gt;
&lt;section id="main-features-in-c-blosc2"&gt;
&lt;h2&gt;Main features in C-Blosc2&lt;/h2&gt;
&lt;section id="new-64-bit-containers"&gt;
&lt;h3&gt;New 64-bit containers&lt;/h3&gt;
&lt;p&gt;The main container in C-Blosc2 is the &lt;cite&gt;super-chunk&lt;/cite&gt; or, for brevity, &lt;cite&gt;schunk&lt;/cite&gt;, that is made by smaller containers which are essentially C-Blosc1 32-bit containers.  The &lt;cite&gt;super-chunk&lt;/cite&gt; can be backed (or not) by another container which is called a &lt;cite&gt;frame&lt;/cite&gt;.  If a &lt;cite&gt;schunk&lt;/cite&gt; is not backed by a &lt;cite&gt;frame&lt;/cite&gt; (the default), the different chunks will be stored sparsely in-memory.&lt;/p&gt;
&lt;p&gt;The &lt;cite&gt;frame&lt;/cite&gt; object allows to store super-chunks contiguously, either on-disk or in-memory.  When a super-chunk is backed by a frame, instead of storing all the chunks sparsely in-memory, they are serialized inside the frame container.  The frame can be stored on-disk too, meaning that persistence of super-chunks is supported and that data can be accessed using the same API independently of where it is stored, memory or disk.&lt;/p&gt;
&lt;p&gt;Finally, the user can add meta-data to frames for different uses and in different layers.  For example, one may think on providing a meta-layer for &lt;a class="reference external" href="http://www.numpy.org"&gt;NumPy&lt;/a&gt; so that most of the meta-data for it is stored in a meta-layer; then, one can place another meta-layer on top of the latter can add more high-level info (e.g. geo-spatial, meteorological...), if desired.&lt;/p&gt;
&lt;p&gt;When taken together, these features represent a pretty powerful way to store and retrieve compressed data that goes well beyond of the previous contiguous compressed buffer, 32-bit limited, of C-Blosc1.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="new-filters-and-filters-pipeline"&gt;
&lt;h3&gt;New filters and filters pipeline&lt;/h3&gt;
&lt;p&gt;Besides &lt;cite&gt;shuffle&lt;/cite&gt; and &lt;cite&gt;bitshuffle&lt;/cite&gt; already present in C-Blosc1, C-Blosc2 already implements:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;delta&lt;/cite&gt;: the stored blocks inside a chunk are diff'ed with respect to first block in the chunk.  The basic idea here is that, in some situations, the diff will have more zeros than the original data, leading to better compression.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;trunc_prec&lt;/cite&gt;: it zeroes the least significant bits of the mantissa of float32 and float64 types.  When combined with the &lt;cite&gt;shuffle&lt;/cite&gt; or &lt;cite&gt;bitshuffle&lt;/cite&gt; filter, this leads to more contiguous zeros, which are compressed better and faster.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Also, a new filter pipeline has been implemented.  With it, the different filters can be pipelined so that the output of one filter can be the input for the next; this happens at the block level, so minimizing the size of temporary buffers, and hence, accelerating the process.  Possible examples of pipelines are a &lt;cite&gt;delta&lt;/cite&gt; filter followed by &lt;cite&gt;shuffle&lt;/cite&gt;, or a &lt;cite&gt;trunc_prec&lt;/cite&gt; followed by &lt;cite&gt;bitshuffle&lt;/cite&gt;.  Up to 6 filters can be pipelined, so there is plenty of space for upcoming new filters to collaborate among them.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="more-simd-support-for-arm-and-powerpc"&gt;
&lt;h3&gt;More SIMD support for ARM and PowerPC&lt;/h3&gt;
&lt;p&gt;New SIMD support for ARM (NEON), allowing for faster operation on ARM architectures.  Only &lt;cite&gt;shuffle&lt;/cite&gt; is supported right now, but the idea is to implement &lt;cite&gt;bitshuffle&lt;/cite&gt; for NEON too.&lt;/p&gt;
&lt;p&gt;Also, SIMD support for PowerPC (ALTIVEC) is here, and both &lt;cite&gt;shuffle&lt;/cite&gt;  and &lt;cite&gt;bitshuffle&lt;/cite&gt; are supported.  However, this has been done via a transparent mapping from SSE2 into ALTIVEC emulation in GCC 8, so performance could be better (but still, it is already a nice improvement over native C code; see PR &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/pull/59"&gt;https://github.com/Blosc/c-blosc2/pull/59&lt;/a&gt; for details).  Thanks to Jerome Kieffer.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="new-codecs"&gt;
&lt;h3&gt;New codecs&lt;/h3&gt;
&lt;p&gt;There is a new &lt;a class="reference external" href="https://github.com/inikep/lizard"&gt;Lizard codec&lt;/a&gt;, which is an efficient compressor with very fast decompression. It achieves compression ratio that is comparable to &lt;cite&gt;zip/zlib&lt;/cite&gt; and &lt;cite&gt;zstd/brotli&lt;/cite&gt; (at low and medium compression levels) that is able to attain decompression speeds of 1 GB/s or more.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="new-dictionary-support-for-better-compression-ratio"&gt;
&lt;h3&gt;New dictionary support for better compression ratio&lt;/h3&gt;
&lt;p&gt;Dictionaries allow for better discovery of data duplicates among different blocks: when a block is going to be compressed, C-Blosc2 can use a previously made dictionary (stored in the header of the super-chunk) for compressing all the blocks that are part of the chunks.  This usually improves the compression ratio, as well as the decompression speed, at the expense of a (small) overhead in compression speed.  Currently, this is only supported in the &lt;cite&gt;zstd&lt;/cite&gt; codec, but would be nice to extend it to &lt;cite&gt;lz4&lt;/cite&gt; and &lt;cite&gt;blosclz&lt;/cite&gt; at least.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="much-improved-documentation-mark-up"&gt;
&lt;h3&gt;Much improved documentation mark-up&lt;/h3&gt;
&lt;p&gt;We are currently using a combination of Sphinx + Doxygen + Breathe for documenting the &lt;a class="reference external" href="https://blosc-doc.readthedocs.io"&gt;C API for C-Blosc2&lt;/a&gt;.  This is a huge step further compared with the documentation of C-Blosc1, where the developer needed to go the    &lt;a class="reference external" href="https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h"&gt;blosc.h&lt;/a&gt; header for reading the docstrings there.  Thanks to Alberto Sabater for contributing the support for this.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="support-for-intel-ipp-integrated-performance-primitives"&gt;
&lt;h3&gt;Support for Intel IPP (Integrated Performance Primitives)&lt;/h3&gt;
&lt;p&gt;Intel is producing a series of optimizations in their &lt;a class="reference external" href="https://software.intel.com/en-us/ipp"&gt;IPP library&lt;/a&gt; and among them, and &lt;a class="reference external" href="https://software.intel.com/en-us/ipp-dev-reference-lz4-compression-functions"&gt;accelerated version of the LZ4 codec&lt;/a&gt;.  Due to its excellent compression capabilities and speed, LZ4 is probably the most used codec in Blosc, so enabling even a bit more of optimization on LZ4 is always a good news.  And judging by the plots below, the Intel guys seem to have done an excellent job:&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;img alt="lz4-no-ipp" src="https://blosc.org/images/blosc2-first-beta/Blosc2-4MB-LZ4-NO-IPP-Shuffle.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;img alt="lz4-ipp" src="https://blosc.org/images/blosc2-first-beta/Blosc2-4MB-LZ4-IPP-Shuffle.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In the plots above we see a couple of things: 1) the IPP/LZ4 functions can compress &lt;em&gt;more&lt;/em&gt; than regular LZ4, and 2) they are quite a bit &lt;em&gt;faster&lt;/em&gt; than regular LZ4.  As always, take these plots with a grain of salt, as actual datasets will see more similar compression ratios and speed (but still, the difference can be significant).  Of course, IPP/LZ4 should generate LZ4 chunks that are completely compatible with the original LZ4 library (but in case you detect any incompatibility, please shout!).&lt;/p&gt;
&lt;p&gt;C-Blosc2 beta.1 comes with support for LZ4/IPP out-of-the-box, that is, if IPP is detected in the system, its optimized LZ4 functions are automatically linked and used with the Blosc2 library.  If, for portability or other reasons, you don't want to create a Blosc2 library that is linked with Intel IPP, you can disable support for it passing the &lt;cite&gt;-DDEACTIVATE_IPP=ON&lt;/cite&gt; to cmake.  In the future, we surely may give support for other optimized codecs in IPP too (Zstd would be an excellent candidate).&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="roadmap"&gt;
&lt;h2&gt;Roadmap&lt;/h2&gt;
&lt;p&gt;Of course, C-Blosc2 is not done yet, and there are many interesting enhancements that we would like to tackle sooner or later.  Here it is a more or less comprehensive list of our roadmap:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Lock support for &lt;cite&gt;super-chunks&lt;/cite&gt;: when different processes are accessing concurrently to super-chunks, make them to sync properly by using locks, either on-disk (frame-backed super-chunks), or in-memory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Checksums: the frame can benefit from having a checksum per every chunk/index/metalayer.  This will provide more safety towards frames that are damaged for whatever reason.  Also, this would provide better feedback when trying to determine the parts of the frame that are corrupted.  Candidates for checksums can be the xxhash32 or xxhash64, depending on the gaols (to be decided).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Documentation: utterly important for attracting new users and making the life easier for existing ones.  Important points to have in mind here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Quality of API docstrings: is the mission of the functions or data structures clearly and succinctly explained? Are all the parameters explained?  Is the return value explained?  What are the possible errors that can be returned?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tutorials/book: besides the API docstrings, more documentation materials should be provided, like tutorials or a book about Blosc (or at least, the beginnings of it).  Due to its adoption in GitHub and Jupyter notebooks, one of the most extended and useful markup systems is MarkDown, so this should also be the first candidate to use here.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wrappers for other languages: Python and Java are the most obvious candidates, but others like R or Julia would be nice to have.  Still not sure if these should be produced and maintained by the Blosc development team, or leave them for third-party players that would be interested.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It would be nice to use &lt;a class="reference external" href="https://lgtm.com"&gt;LGTM&lt;/a&gt;, a CI-friendly analyzer for security.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add support for &lt;a class="reference external" href="https://buildkite.com"&gt;buildkite&lt;/a&gt; as another CI would be handy because it allows to use on-premise machines, potentially speeding-up the time to do the builds, but also to setup pipelines with more complex dependencies and analyzers.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The implementation of these features will require the help of people, either by contributing code (see  &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/blob/master/DEVELOPING-GUIDE.rst"&gt;our developing guidelines&lt;/a&gt;) or, as it turns out that &lt;a class="reference external" href="https://numfocus.org/project/blosc"&gt;Blosc is a project sponsored by NumFOCUS&lt;/a&gt;, you may want to  &lt;a class="reference external" href="https://numfocus.org/donate-to-blosc"&gt;make a donation to the project&lt;/a&gt;.  If you plan to contribute in any way, thanks so much in the name of the community!&lt;/p&gt;
&lt;/section&gt;
&lt;section id="addendum-special-thanks-to-developers"&gt;
&lt;h2&gt;Addendum: Special thanks to developers&lt;/h2&gt;
&lt;p&gt;C-Blosc2 is the outcome of the work of &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/graphs/contributors"&gt;many developers&lt;/a&gt; that worked not only on C-Blosc2 itself, but also on C-Blosc1, from which C-Blosc2 inherits a lot of features.  I am very grateful to Jack Pappas, who contributed important portability enhancements, specially runtime and cross-platform detection of SSE2/AVX2 (with the help of Julian Taylor) as well as high precision timers (HPET) which are essential for benchmarking purposes.  Lucian Marc also contributed the support for ARM/NEON for the shuffle filter.  Jerome Kieffer contributed support for PowerPC/ALTIVEC.  Alberto Sabater, for his great efforts on producing really nice Blosc2 docs, among other aspects. And last but not least, to Valentin Haenel for general support, bug fixes and other enhancements through the years.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;** Enjoy Data!**&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/section&gt;</description><category>blosc2 beta</category><guid>https://blosc.org/posts/blosc2-first-beta/</guid><pubDate>Tue, 13 Aug 2019 01:32:20 GMT</pubDate></item></channel></rss>