<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Blosc Home Page  (Posts about ctable b2z parquet queries tabular indexing compression)</title><link>https://blosc.org/</link><description></description><atom:link href="https://blosc.org/categories/ctable-b2z-parquet-queries-tabular-indexing-compression.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 &lt;a href="mailto:blosc@blosc.org"&gt;The Blosc Developers&lt;/a&gt; </copyright><lastBuildDate>Thu, 11 Jun 2026 12:25:19 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>CTable and .b2z: Querying Tabular Data, the Blosc Way</title><link>https://blosc.org/posts/ctable-b2z-queries/</link><dc:creator>Francesc Alted</dc:creator><description>&lt;p&gt;Here is a question we have been chasing, in one form or another, for more than fifteen years: &lt;em&gt;how much work can you avoid doing if your data is stored the right way?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In this post we put that question to a concrete test: one selective query against 24.3 million Chicago taxi trips, stored on disk in two formats — Parquet and the new Blosc2 &lt;code class="docutils literal"&gt;.b2z&lt;/code&gt; — and answered by five different tools: DuckDB, PyArrow, pandas, polars, and Blosc2's own &lt;a class="reference external" href="https://blosc.org/python-blosc2/reference/ctable.html"&gt;CTable&lt;/a&gt;. But the numbers will make more sense if we first tell you how we got here, because CTable did not appear out of thin air: it is the fourth floor of a building whose foundations were laid in 2009.&lt;/p&gt;
&lt;section id="from-a-turbo-charged-compressor"&gt;
&lt;h2&gt;From a turbo-charged compressor...&lt;/h2&gt;
&lt;p&gt;Blosc was born inside PyTables with a single, then-heretical idea: that compression could make data access &lt;em&gt;faster&lt;/em&gt;, not slower. CPUs were (and are) starving — they can crunch numbers far faster than memory can feed them — so if you split data into blocks that fit in CPU caches, shuffle the bytes so that similar ones sit together, and decompress with all your cores, the time spent decompressing can be smaller than the time saved moving fewer bytes. "Compress faster than &lt;code class="docutils literal"&gt;memcpy&lt;/code&gt;" was the provocative benchmark slogan of the time.&lt;/p&gt;
&lt;p&gt;That first Blosc was deliberately humble: a blocked, multithreaded meta-compressor for binary buffers. No containers, no files, no types. Just speed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="to-containers-arrays-and-a-compute-engine"&gt;
&lt;h2&gt;...to containers, arrays, and a compute engine&lt;/h2&gt;
&lt;p&gt;The next decade taught us that a fast compressor alone is not enough; data needs a &lt;em&gt;home&lt;/em&gt;. C-Blosc2 (2.0 released in 2021) gave it one: 64-bit super-chunks, persistent frames, a richer filter pipeline, modern codecs like Zstd, and a plugin system. On the Python side, this matured into &lt;a class="reference external" href="https://www.blosc.org/python-blosc2/"&gt;python-blosc2&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then came &lt;strong&gt;NDArray&lt;/strong&gt; (2023): a compressed, n-dimensional array for Python, with a two-level partitioning scheme — chunks, divided into blocks — where the &lt;em&gt;block&lt;/em&gt; is the unit of decompression, sized to fit comfortably in CPU caches. Slicing an NDArray decompresses only the blocks that the slice touches. Keep that sentence in mind; it is the seed of everything below.&lt;/p&gt;
&lt;p&gt;On top of that, python-blosc2 3.0 (early 2025) added a &lt;strong&gt;compute engine&lt;/strong&gt;: lazy expressions like &lt;code class="docutils literal"&gt;a + b * 2&lt;/code&gt; that evaluate block by block, straight over compressed (possibly larger-than-RAM) operands, and return NumPy arrays. The engine never materializes whole arrays; it streams cache-sized blocks through the CPU. At this point we had fast compressed storage &lt;em&gt;and&lt;/em&gt; fast compute over it — what we were missing was a way to talk about &lt;em&gt;tables&lt;/em&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="ctable-a-columnar-table-on-blosc2-foundations"&gt;
&lt;h2&gt;CTable: a columnar table on Blosc2 foundations&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://www.blosc.org/posts/ctable-blosc2-columnar-table/"&gt;CTable&lt;/a&gt; (introduced in May 2026) is exactly that: a columnar table where each column is an NDArray (or a ListArray for variable-length data), with typed schemas, nullable columns, and a &lt;code class="docutils literal"&gt;where()&lt;/code&gt; method that accepts plain Python expressions and is executed by the compute engine.&lt;/p&gt;
&lt;p&gt;Because columns are NDArrays, every column inherits the block structure — and this is where the design clicks together. CTable can build a small &lt;strong&gt;SUMMARY index&lt;/strong&gt; per column: min/max statistics kept at &lt;em&gt;block&lt;/em&gt; granularity. When a query like &lt;code class="docutils literal"&gt;t.payment.tips &amp;gt; 100&lt;/code&gt; arrives, blocks whose maximum tip is below 100 are never read and never decompressed. The index granularity is exactly aligned with the unit of work it avoids.&lt;/p&gt;
&lt;p&gt;A CTable persists inside a &lt;code class="docutils literal"&gt;.b2z&lt;/code&gt; file: the single-file, zip-based flavor of &lt;a class="reference external" href="https://www.blosc.org/posts/new-treestore-blosc2/"&gt;TreeStore&lt;/a&gt; that holds all columns, indexes and metadata in one compact, openable-anywhere container. Like Parquet, the data stays compressed on disk; unlike Parquet, you can open it and immediately get NumPy-addressable columns, no engine in between.&lt;/p&gt;
&lt;p&gt;So: does the fourth floor hold the weight? Time to measure.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="the-contest-one-selective-query-five-tools"&gt;
&lt;h2&gt;The contest: one selective query, five tools&lt;/h2&gt;
&lt;p&gt;The dataset is the classic &lt;a class="reference external" href="https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-psew"&gt;Chicago Taxi trips&lt;/a&gt; table: 24.3 million rows, 14 columns (floats, timestamps, dictionary-encoded strings, and even a variable-length GPS path per trip). The query is a needle-in-a-haystack filter with projection and sort:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code sql"&gt;&lt;a id="rest_code_d8479c5ce26546d5bf2fa2e59faac631-1" name="rest_code_d8479c5ce26546d5bf2fa2e59faac631-1" href="https://blosc.org/posts/ctable-b2z-queries/#rest_code_d8479c5ce26546d5bf2fa2e59faac631-1"&gt;&lt;/a&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tips&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;km&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;
&lt;a id="rest_code_d8479c5ce26546d5bf2fa2e59faac631-2" name="rest_code_d8479c5ce26546d5bf2fa2e59faac631-2" href="https://blosc.org/posts/ctable-b2z-queries/#rest_code_d8479c5ce26546d5bf2fa2e59faac631-2"&gt;&lt;/a&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tips&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AND&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;km&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AND&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;begin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lon&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;a id="rest_code_d8479c5ce26546d5bf2fa2e59faac631-3" name="rest_code_d8479c5ce26546d5bf2fa2e59faac631-3" href="https://blosc.org/posts/ctable-b2z-queries/#rest_code_d8479c5ce26546d5bf2fa2e59faac631-3"&gt;&lt;/a&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sec&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Only &lt;strong&gt;67 of 24.3 million rows&lt;/strong&gt; match — a highly selective query, which is precisely the regime where storage-level pruning can shine (more on this honest caveat later).&lt;/p&gt;
&lt;p&gt;The contenders: &lt;strong&gt;DuckDB&lt;/strong&gt;, &lt;strong&gt;PyArrow&lt;/strong&gt;, &lt;strong&gt;pandas&lt;/strong&gt; and &lt;strong&gt;polars&lt;/strong&gt; querying the Parquet file, and &lt;strong&gt;Blosc2's CTable&lt;/strong&gt; querying the &lt;code class="docutils literal"&gt;.b2z&lt;/code&gt;. Every tool reads from disk on demand; nothing is preloaded. Each engine runs in a fresh subprocess under &lt;code class="docutils literal"&gt;/usr/bin/time&lt;/code&gt;, and we report the &lt;em&gt;query time&lt;/em&gt; each script measures internally (open + compute + print), which excludes interpreter and import overhead. Cold-cache runs happen right after flushing the OS file cache (&lt;code class="docutils literal"&gt;sudo purge&lt;/code&gt;); warm runs are best-of-7. The machine is a Mac mini (Apple M4 Pro). The full, reproducible notebook is &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/chicago-taxi/compare-query-methods.ipynb"&gt;in the python-blosc2 repository&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;First, the storage footprint — because a fair query race starts with files of comparable size:&lt;/p&gt;
&lt;img alt="File sizes: parquet 654.0 MB vs b2z 670.3 MB" class="align-center" src="https://blosc.org/images/ctable-b2z-queries/compare-size.png" style="width: 60%;"&gt;
&lt;p&gt;The &lt;code class="docutils literal"&gt;.b2z&lt;/code&gt; lands at 670 MB versus Parquet's 654 MB — a 2% premium. Those extra bytes are mostly the block-level indexes; remember them, they are about to earn their keep.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="cold-cache-reading-less-wins"&gt;
&lt;h2&gt;Cold cache: reading less wins&lt;/h2&gt;
&lt;p&gt;The cold run is the scenario we care most about: you have a large file on disk, it is &lt;em&gt;not&lt;/em&gt; in the OS cache, and you want one answer, now.&lt;/p&gt;
&lt;img alt="Cold query times: blosc2 0.056s, duckdb 0.107s, arrow 0.137s, polars 0.298s, pandas 0.534s" class="align-center" src="https://blosc.org/images/ctable-b2z-queries/compare-query-time-cold.png" style="width: 75%;"&gt;
&lt;p&gt;CTable answers in &lt;strong&gt;0.056 s&lt;/strong&gt; — about &lt;strong&gt;1.9x faster than DuckDB&lt;/strong&gt; (0.107 s), 2.4x faster than PyArrow (0.137 s), 5x faster than polars (0.298 s) and 9.5x faster than pandas (0.534 s).&lt;/p&gt;
&lt;p&gt;Let us be clear about why, because it is not magic and it is not a faster CPU loop. On a cold cache, the dominant cost is bytes coming off the disk. The SUMMARY indexes let CTable prune roughly &lt;strong&gt;89% of the blocks&lt;/strong&gt; for this query: those blocks are neither read nor decompressed. Pruning pays twice — less I/O &lt;em&gt;and&lt;/em&gt; less CPU — and on a first-touch query the I/O half is the whole ballgame.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="warm-cache-a-dead-heat-with-a-real-database"&gt;
&lt;h2&gt;Warm cache: a dead heat with a real database&lt;/h2&gt;
&lt;p&gt;Once the file is fully cached in RAM, I/O is nearly free and raw engine throughput takes over. This is DuckDB's home turf — a vectorized, multithreaded analytical SQL engine with filter pushdown and late materialization.&lt;/p&gt;
&lt;img alt="Warm query times: blosc2 0.031s and duckdb 0.034s in a dead heat, ahead of arrow, polars, pandas" class="align-center" src="https://blosc.org/images/ctable-b2z-queries/compare-query-time-warm.png" style="width: 75%;"&gt;
&lt;p&gt;CTable finishes in 0.031 s, DuckDB in 0.034 s — a &lt;strong&gt;dead heat&lt;/strong&gt; (the two trade places within run-to-run noise), with both about 2.6x ahead of PyArrow, 7x ahead of polars, and 16x ahead of pandas. We find this result remarkable not because CTable "beats" anything here (it does not), but because of what is &lt;em&gt;absent&lt;/em&gt;: there is no SQL engine in the Blosc2 process. A storage container holding the tie with a purpose-built database, purely on the strength of skipping work, tells us the layout is doing the heavy lifting.&lt;/p&gt;
&lt;p&gt;Memory tells a similar story:&lt;/p&gt;
&lt;img alt="Peak memory: duckdb ~60 MB, blosc2 ~85 MB, arrow ~210 MB, polars ~410 MB, pandas ~1.6 GB" class="align-center" src="https://blosc.org/images/ctable-b2z-queries/compare-query-mem-warm.png" style="width: 75%;"&gt;
&lt;p&gt;DuckDB (~60 MB) and CTable (~85 MB) are the two leanest by a wide margin — an order of magnitude below pandas (~1.6 GB), which materializes full columns before filtering. CTable never holds more than the blocks it could not prune, plus the 67 matching rows.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="why-pruning-wins-granularity"&gt;
&lt;h2&gt;Why pruning wins: granularity&lt;/h2&gt;
&lt;p&gt;Parquet also carries min/max statistics — at &lt;strong&gt;row-group&lt;/strong&gt; granularity, here ~970,000 rows per group. CTable keeps them at &lt;strong&gt;block&lt;/strong&gt; granularity, ~27,000 rows per block: roughly 36x finer. For this query the difference is binary: every one of Parquet's 25 row groups contains &lt;em&gt;some&lt;/em&gt; trip with &lt;code class="docutils literal"&gt;tips &amp;gt; 100&lt;/code&gt;, so row-group statistics prune &lt;strong&gt;nothing&lt;/strong&gt;, and every Parquet reader must stream most of the file. The block-level SUMMARY index prunes ~809 of 906 blocks.&lt;/p&gt;
&lt;p&gt;The deeper point is architectural. A Blosc2 block is &lt;em&gt;the unit of decompression&lt;/em&gt; — the same cache-sized block the compute engine streams. An index at that granularity skips exactly the work the query would otherwise do. An index at a coarser granularity than the I/O unit can only skip work in big, lucky lumps.&lt;/p&gt;
&lt;p&gt;And the honest caveat: this advantage rides on &lt;strong&gt;selectivity&lt;/strong&gt;, not on any general superiority. &lt;code class="docutils literal"&gt;tips &amp;gt; 100&lt;/code&gt; is rare enough that most 27 K-row blocks contain no match. A predicate that matches everywhere prunes nothing at any granularity, and on data sorted or clustered by the filter column, even Parquet's coarse row groups would start pruning effectively. Benchmarks are stories with a point of view; this one is about selective, first-touch queries.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="seasoned-conclusions"&gt;
&lt;h2&gt;Seasoned conclusions&lt;/h2&gt;
&lt;p&gt;What do we think these numbers support — and not support?&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;For selective cold queries on large tabular files, CTable/.b2z is genuinely fast&lt;/strong&gt; — the fastest of the five tools here, on a query and dataset it was not specially tuned for. If your workload looks like "open a big file, fetch a small subset, move on", the block-level indexing earns its 2% of disk many times over.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Warm, it ties — it does not dethrone.&lt;/strong&gt; DuckDB remains an excellent engine, and on cached data it matches CTable while speaking full SQL with joins and aggregations that CTable does not attempt. If your problems are relational, use a relational engine.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The result is arrays, not a result set.&lt;/strong&gt; &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;t.where(...)&lt;/span&gt;&lt;/code&gt; hands back NumPy-addressable columns with their original dtypes — no &lt;code class="docutils literal"&gt;.to_numpy()&lt;/code&gt; hop, no DataFrame conversion tax. For NumPy-centric pipelines, that removes a whole impedance layer. And since columns are NDArrays, a CTable column can even be n-dimensional, or hold variable-length data (this dataset stores a GPS trace per row).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Parquet is not going anywhere.&lt;/strong&gt; It is slightly smaller here, and it remains the lingua franca of the data ecosystem, readable by everything. &lt;code class="docutils literal"&gt;.b2z&lt;/code&gt; is young and its natural habitat is the Python/NumPy world. What this experiment shows is that the trade is real and the price is modest: a couple percent of disk for first-touch queries that run in a fraction of the time.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Sixteen years after asking whether compression could be faster than &lt;code class="docutils literal"&gt;memcpy&lt;/code&gt;, the question has scaled up but kept its shape: the fastest byte is still the one you never touch. Blocks sized for caches made decompression cheap; the compute engine made math over blocks cheap; and CTable's block-level indexes now make &lt;em&gt;not touching&lt;/em&gt; most of a table cheap, too. The fourth floor stands on the first.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="reproduce-it-yourself"&gt;
&lt;h2&gt;Reproduce it yourself&lt;/h2&gt;
&lt;p&gt;Everything in this post lives in &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/tree/main/bench/chicago-taxi"&gt;bench/chicago-taxi&lt;/a&gt; in the python-blosc2 repository: the &lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/blob/main/bench/chicago-taxi/compare-query-methods.ipynb"&gt;notebook&lt;/a&gt;, the driver, the five per-engine query scripts, and a README with the details. The notebook downloads the dataset on first run and builds the &lt;code class="docutils literal"&gt;.b2z&lt;/code&gt; from it, so the whole thing is two commands away:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code console"&gt;&lt;a id="rest_code_605fdea71a034b01a1de53b569a7e7f1-1" name="rest_code_605fdea71a034b01a1de53b569a7e7f1-1" href="https://blosc.org/posts/ctable-b2z-queries/#rest_code_605fdea71a034b01a1de53b569a7e7f1-1"&gt;&lt;/a&gt;&lt;span class="go"&gt;pip install "blosc2&amp;gt;=4.4.3" pyarrow duckdb polars pandas matplotlib jupyter&lt;/span&gt;
&lt;a id="rest_code_605fdea71a034b01a1de53b569a7e7f1-2" name="rest_code_605fdea71a034b01a1de53b569a7e7f1-2" href="https://blosc.org/posts/ctable-b2z-queries/#rest_code_605fdea71a034b01a1de53b569a7e7f1-2"&gt;&lt;/a&gt;&lt;span class="go"&gt;jupyter lab compare-query-methods.ipynb   # then: Run All&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;One practical tip if you chase the cold-cache numbers: flushing the OS file cache is necessary but not sufficient. After a flush and a few idle seconds, the &lt;em&gt;first&lt;/em&gt; disk read also pays the drive's idle-state exit latency (tens of ms on power-managed NVMe), and it lands on whichever process touches the disk first — we learned this the hard way while preparing this post. The driver's &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;--purge&lt;/span&gt;&lt;/code&gt; flag handles both the flush and the disk wake-up for you; the README explains the manual route.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="more-info"&gt;
&lt;h2&gt;More info&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.blosc.org/posts/ctable-blosc2-columnar-table/"&gt;Introducing CTable&lt;/a&gt; — the design and feature tour&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blosc.org/python-blosc2/getting_started/tutorials/13.ctable-basics.html"&gt;Getting started with CTable&lt;/a&gt; and &lt;a class="reference external" href="https://blosc.org/python-blosc2/getting_started/tutorials/15.indexing-ctables.html"&gt;Indexing CTables&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/Blosc/python-blosc2/tree/main/bench/chicago-taxi"&gt;The benchmark directory&lt;/a&gt; — notebook, driver, per-engine scripts and README&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blosc.org/python-blosc2/reference/ctable.html"&gt;CTable API reference&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Enjoy data!&lt;/p&gt;
&lt;/section&gt;</description><category>ctable b2z parquet queries tabular indexing compression</category><guid>https://blosc.org/posts/ctable-b2z-queries/</guid><pubDate>Thu, 11 Jun 2026 10:00:00 GMT</pubDate></item></channel></rss>