<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Blosc Home Page  (Posts about blosc2 optimization matrix multiplication matmul compression)</title><link>https://blosc.org/</link><description></description><atom:link href="https://blosc.org/categories/blosc2-optimization-matrix-multiplication-matmul-compression.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 &lt;a href="mailto:blosc@blosc.org"&gt;The Blosc Developers&lt;/a&gt; </copyright><lastBuildDate>Wed, 10 Jun 2026 17:44:33 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Optimizing chunks for matrix multiplication in Blosc2</title><link>https://blosc.org/posts/optimizing-chunks-blosc2/</link><dc:creator>Ricardo Sales Piquer</dc:creator><description>&lt;p&gt;As data volumes continue to grow in fields like machine learning and scientific computing,
optimizing fundamental operations like matrix multiplication becomes increasingly critical.
Blosc2's chunk-based approach offers a new path to efficiency in these scenarios.&lt;/p&gt;
&lt;section id="matrix-multiplication"&gt;
&lt;h2&gt;Matrix Multiplication&lt;/h2&gt;
&lt;p&gt;Matrix multiplication is a fundamental operation in many scientific and
engineering applications. With the introduction of matrix multiplication into
Blosc2, users can now perform this operation on compressed arrays efficiently.
The key advantages of having matrix multiplication in Blosc2 include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compressed matrices in memory:&lt;/strong&gt;
Blosc2 enables matrices to be stored in a compressed format without sacrificing
the ability to perform operations directly on them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Efficiency with chunks&lt;/strong&gt;:
In computation-intensive applications, matrix multiplication can be executed
without fully decompressing the data, operating on small blocks of data independently,
saving both time and memory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Out-of-core computation:&lt;/strong&gt;
When matrices are too large to fit in main memory, Blosc2 facilitates out-of-core
processing. Data stored on disk is read and processed in optimized chunks,
allowing matrix multiplication operations without loading the entire dataset into
memory.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These features are especially valuable in big data environments and in scientific
or engineering applications where matrix sizes can be overwhelming, enabling
complex calculations efficiently.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;The matrix multiplication functionality is implemented in the &lt;code class="docutils literal"&gt;matmul&lt;/code&gt;
function. It supports Blosc2 &lt;code class="docutils literal"&gt;NDArray&lt;/code&gt; objects and leverages chunked
operations to perform the multiplication efficiently.&lt;/p&gt;
&lt;img alt="How blocked matrix multiplication works" class="align-center" src="https://blosc.org/images/blosc2-matmul/blocked-gemm.png"&gt;
&lt;p&gt;The image illustrates a &lt;strong&gt;blocked matrix multiplication&lt;/strong&gt; approach. The key idea
is to divide matrices into smaller blocks (or chunks) to optimize memory
access and computational efficiency.&lt;/p&gt;
&lt;p&gt;In the image, matrix &lt;cite&gt;A (M x K)&lt;/cite&gt; and matrix &lt;cite&gt;B (K x N)&lt;/cite&gt;
are partitioned into chunks, and these are partitioned into blocks. The resulting
matrix &lt;cite&gt;C (M x N)&lt;/cite&gt; is computed as a sum of block-wise multiplication.&lt;/p&gt;
&lt;p&gt;This method significantly improves cache utilization by ensuring that only the
necessary parts of the matrices are loaded into memory at any given time. In
Blosc2, storing matrix blocks as compressed chunks reduces memory footprint and
enhances performance by enabling on-the-fly decompression.&lt;/p&gt;
&lt;p&gt;Also, Blosc2 supports a wide range of data types. In addition to standard Python
types such as &lt;cite&gt;int&lt;/cite&gt;, &lt;cite&gt;float&lt;/cite&gt;, and &lt;cite&gt;complex&lt;/cite&gt;, it also fully supports various NumPy
types. The currently supported types include:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.int8&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.int16&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.int32&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.int64&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.float32&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.float64&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.complex64&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;np.complex128&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This versatility allows compression and subsequent processing to be
applied across diverse scenarios, tailored to the specific needs of each
application.&lt;/p&gt;
&lt;p&gt;Together, these features make Blosc2 a flexible and adaptable tool for various
scenarios, but especially suited for the handling of large datasets.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="benchmarks"&gt;
&lt;h2&gt;Benchmarks&lt;/h2&gt;
&lt;p&gt;The benchmarks have been designed to evaluate the performance of the &lt;code class="docutils literal"&gt;matmul&lt;/code&gt;
function under various conditions. Here are the key aspects of our
experimental setup and findings:&lt;/p&gt;
&lt;p&gt;Different matrix sizes were tested using both &lt;code class="docutils literal"&gt;float32&lt;/code&gt; and &lt;code class="docutils literal"&gt;float64&lt;/code&gt;
data types. All the matrices used for multiplication are square.
The variation in matrix sizes helps observe how the function scales and
how the overhead of chunk management impacts performance.&lt;/p&gt;
&lt;p&gt;The x-axis represents the size of the resulting matrix in megabytes (MB).
We used GFLOPS (Giga Floating-Point Operations per Second) to gauge the
computational throughput, allowing us to compare the efficiency of the
&lt;code class="docutils literal"&gt;matmul&lt;/code&gt; function relative to highly optimized libraries like NumPy.&lt;/p&gt;
&lt;p&gt;Blosc2 also incorporates a functionality to automatically select chunks, and
it is represented in the benchmark by "Auto".&lt;/p&gt;
&lt;img alt="Benchmark float32" class="align-center" src="https://blosc.org/images/blosc2-matmul/float32.png"&gt;
&lt;img alt="Benchmark float64" class="align-center" src="https://blosc.org/images/blosc2-matmul/float64.png"&gt;
&lt;p&gt;For smaller matrices, the overhead of managing chunks in Blosc2 can result in
lower GFLOPS compared to NumPy. As the matrix size increases, Blosc2 scales
well, approaching its performance to NumPy.&lt;/p&gt;
&lt;p&gt;Each chunk shape exhibits a peak performance when the matrix size matches the
chunk size, or is a multiple of the chunk shape.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The new matrix multiplication feature in Blosc2 introduces efficient, chunked
computation for compressed arrays. This allows users to handle large datasets
both in memory and on disk without sacrificing performance. The implementation
supports a wide range of data types, making it versatile for various numerical
applications.&lt;/p&gt;
&lt;p&gt;Real-world applications, such as neural network training, demonstrate the
potential benefits in scenarios where memory constraints and large data sizes
are common. While there are some limitations —such as support only for 2D arrays
and the overhead of blocking— the applicability looks promising, like
potential integration with deep learning frameworks.&lt;/p&gt;
&lt;p&gt;Overall, Blosc2 offers a compelling alternative for applications where the
advantages of compression and out-of-core computation are critical, paving
the way for more efficient processing of massive datasets.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="getting-my-feet-wet-with-blosc2"&gt;
&lt;h2&gt;Getting my feet wet with Blosc2&lt;/h2&gt;
&lt;p&gt;In the initial phase of the project, my biggest challenge was understanding how
Blosc2 manages data internally. For matrix multiplication, it was critical to
grasp how to choose the right chunks, since the operation requires that the
ranges of both matrices coincide. After some considerations and a few insightful
conversations with Francesc, I finally understood the underlying mechanics.
This breakthrough allowed me to begin implementing the first versions of my
solution, adjusting the data fragmentation so that each block was properly
aligned for precise computation.&lt;/p&gt;
&lt;p&gt;Another important aspect was adapting to the professional workflow of using Git
for version control. Embracing Git —with its branch creation, regular commits,
and conflict resolution— represented a significant shift in my development
approach. This experience not only improved the organization of my code and
facilitated collaboration but also instilled a structured and disciplined
mindset in managing my projects. This tool has shown to be both valuable and
extremely helpful.&lt;/p&gt;
&lt;p&gt;Finally, the moment when the function finally returned the correct result was
really exciting. After multiple iterations, the rigorous debugging process paid
off as everything fell into place. This breakthrough validated the robustness
of the implementation and boosted my confidence to further optimize and tackle
new challenges in data processing.&lt;/p&gt;
&lt;/section&gt;</description><category>blosc2 optimization matrix multiplication matmul compression</category><guid>https://blosc.org/posts/optimizing-chunks-blosc2/</guid><pubDate>Wed, 12 Mar 2025 09:00:00 GMT</pubDate></item></channel></rss>