<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Blosc Home Page  (Posts about PGO)</title><link>https://blosc.org/</link><description></description><atom:link href="https://blosc.org/categories/pgo.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 &lt;a href="mailto:blosc@blosc.org"&gt;The Blosc Developers&lt;/a&gt; </copyright><lastBuildDate>Wed, 04 Mar 2026 11:43:34 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Testing PGO with LZ4 and Zstd codecs</title><link>https://blosc.org/posts/codecs-pgo/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;In &lt;a class="reference external" href="http://blosc.org/posts/blosclz-tuning/"&gt;past week's post&lt;/a&gt; I was showing how the PGO (&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Profile-guided_optimization"&gt;Profile Guided Optimization&lt;/a&gt;) capability in modern compilers allowed for a good increase in the performance of the BloscLZ codec.  Today I'd like to test how the PGO optimization affected the speed of the same &lt;a class="reference external" href="https://github.com/Blosc/c-blosc2/blob/master/bench/bench.c"&gt;synthetic benchmark&lt;/a&gt; that comes with C-Blosc2 for the two other of the most used codecs in Blosc: &lt;a class="reference external" href="http://lz4.github.io/lz4/"&gt;LZ4&lt;/a&gt; and &lt;a class="reference external" href="http://facebook.github.io/zstd/"&gt;Zstd&lt;/a&gt;.&lt;/p&gt;
&lt;section id="lz4-1"&gt;
&lt;h2&gt;LZ4&lt;/h2&gt;
&lt;p&gt;First, for GCC without PGO:&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;img alt="lz4-old-c" src="https://blosc.org/images/codecs-pgo/lz4-comp-gcc-6.3.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;img alt="lz4-old-d" src="https://blosc.org/images/codecs-pgo/lz4-decomp-gcc-6.3.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Now with PGO enabled:&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;img alt="lz4-pgo-c" src="https://blosc.org/images/codecs-pgo/lz4-comp-pgo.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;img alt="lz4-pgo-d" src="https://blosc.org/images/codecs-pgo/lz4-decomp-pgo.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We can see here that, similarly to BloscLZ, although the compression speed has not improved significantly, the decompression is now reaching up to 30 GB/s, and for high compression levels, up to 20 GB/s, which is pretty good.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="zstd-1"&gt;
&lt;h2&gt;Zstd&lt;/h2&gt;
&lt;p&gt;First, for GCC without PGO:&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;img alt="zstd-old-c" src="https://blosc.org/images/codecs-pgo/zstd-comp-gcc-6.3.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;img alt="zstd-old-d" src="https://blosc.org/images/codecs-pgo/zstd-decomp-gcc-6.3.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Now with PGO enabled:&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;img alt="zstd-pgo-c" src="https://blosc.org/images/codecs-pgo/zstd-comp-pgo.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;img alt="zstd-pgo-d" src="https://blosc.org/images/codecs-pgo/zstd-decomp-pgo.png"&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Wow, in this case we &lt;em&gt;really&lt;/em&gt; can see important speedups in both compressing and decompressing.  Specially interesting is the decompression case where, for the higher compression levels, Zstd can reach speeds exceeding 20 GB/s (whereas without PGO it was not able to exceed 12 GB/s) which seems a bit crazy provided the wonderful compression ratios that Zstd is able to achieve.  Beyond any doubt, for Write Once Read Multiple scenarios there is no competitor for Zstd, most specially when PGO is used.&lt;/p&gt;
&lt;p&gt;This confirms that, once again, when performance is critical for your applications, PGO should be part of your daily weaponery.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="appendix-hardware-and-software-used"&gt;
&lt;h2&gt;Appendix: Hardware and software used&lt;/h2&gt;
&lt;p&gt;For reference, here it is the configuration that I used for producing the plots in this blog entry.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;CPU: Intel Xeon E3-1245 v5 @ 3.50GHz (4 physical cores with hyper-threading)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OS:  Ubuntu 16.04&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compiler: GCC 6.3.0 (using PGO)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;C-Blosc2: 2.0.0a4.dev (2017-07-11)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LZ4 codec: 1.7.5&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Zstd codec: 1.3.0&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;</description><category>LZ4</category><category>PGO</category><category>Zstandard</category><guid>https://blosc.org/posts/codecs-pgo/</guid><pubDate>Wed, 19 Jul 2017 11:32:20 GMT</pubDate></item></channel></rss>