CTable

A columnar compressed table backed by one NDArray per column. Each column is stored, compressed, and queried independently; rows are never materialised in their entirety unless you explicitly call to_arrow() or iterate with __iter__().

class blosc2.CTable(row_type: type[RowT], new_data=None, *, urlpath: str | None = None, mode: str = 'a', expected_size: int | None = None, compact: bool = False, validate: bool = True, cparams: dict[str, Any] | None = None, dparams: dict[str, Any] | None = None)[source]
Attributes:
cbytes

Total compressed size in bytes (all columns + valid_rows mask).

computed_columns

Read-only view of the computed-column definitions.

cratio

Compression ratio for the whole table payload.

indexes

Return a list of CTableIndex handles for all active indexes.

info

Get information about this table.

info_items

Structured summary items used by info().

nbytes

Total uncompressed size in bytes (all columns + valid_rows mask).

ncols

Total number of columns, including computed (virtual) columns.

nrows
schema

The compiled schema that drives this table’s columns and validation.

Methods

add_column(name, spec, default, *[, cparams])

Add a new column filled with default for every existing live row.

add_computed_column(name, expr, *[, dtype])

Add a read-only virtual column whose values are computed from other columns.

close()

Close any persistent backing store held by this table.

column_schema(name)

Return the CompiledColumn descriptor for name.

compact_index([col_name, expression, name])

Compact an index, merging any incremental append runs.

cov()

Return the covariance matrix as a numpy array.

create_index([col_name, field, expression, ...])

Build and register an index for a stored column or table expression.

describe()

Print a per-column statistical summary.

drop_column(name)

Remove a column from the table.

drop_computed_column(name)

Remove a computed column from the table.

drop_index([col_name, expression, name])

Remove an index and delete any sidecar files.

from_arrow(arrow_table)

Build a CTable from a pyarrow.Table.

from_csv(path, row_cls, *[, header, sep])

Build a CTable from a CSV file.

index([col_name, expression, name])

Return the index handle for a stored-column or expression target.

load(urlpath)

Load a persistent table from urlpath into RAM.

materialize_computed_column(name, *[, ...])

Materialize a computed column into a new stored snapshot column.

open(urlpath, *[, mode])

Open a persistent CTable from urlpath.

rebuild_index([col_name, expression, name])

Drop and recreate an index with the same parameters.

rename_column(old, new)

Rename a column.

sample(n, *[, seed])

Return a read-only view of n randomly chosen live rows.

save(urlpath, *[, overwrite])

Copy this (in-memory) table to disk at urlpath.

schema_dict()

Return a JSON-compatible dict describing this table's schema.

select(cols)

Return a column-projection view exposing only cols.

sort_by(cols[, ascending, inplace])

Return a copy of the table sorted by one or more columns.

to_arrow()

Convert all live rows to a pyarrow.Table.

to_csv(path, *[, header, sep])

Write all live rows to a CSV file.

append

compact

delete

extend

head

tail

view

where

Special methods

CTable.__len__()

CTable.__iter__()

CTable.__getitem__(s)

CTable.__repr__()

Return repr(self).

CTable.__str__()

Return str(self).

__len__()[source]
__iter__()[source]
__getitem__(s: str)[source]
__repr__() str[source]

Return repr(self).

__str__() str[source]

Return str(self).

classmethod from_arrow(arrow_table) CTable[source]

Build a CTable from a pyarrow.Table.

Schema is inferred from the Arrow field types. String columns (pa.string(), pa.large_string()) are stored with max_length set to the longest value found in the data.

Parameters:

arrow_table – A pyarrow.Table instance.

Returns:

A new in-memory CTable containing all rows from arrow_table.

Return type:

CTable

Raises:
  • ImportError – If pyarrow is not installed.

  • TypeError – If an Arrow field type has no corresponding blosc2 spec.

classmethod from_csv(path: str, row_cls, *, header: bool = True, sep: str = ',') CTable[source]

Build a CTable from a CSV file.

Schema comes from row_cls (a dataclass) — CTable is always typed. All rows are read in a single pass into per-column Python lists, then each column is bulk-written into a pre-allocated NDArray (one slice assignment per column, no extend()).

Parameters:
  • path – Source CSV file path.

  • row_cls – A dataclass whose fields define the column names and types.

  • header – If True (default), the first row is treated as a header and skipped. Column order in the file must match row_cls field order regardless.

  • sep – Field delimiter. Defaults to ","; use "\t" for TSV.

Returns:

A new in-memory CTable containing all rows from the CSV file.

Return type:

CTable

Raises:
  • TypeError – If row_cls is not a dataclass.

  • ValueError – If a row has a different number of fields than the schema.

classmethod load(urlpath: str) CTable[source]

Load a persistent table from urlpath into RAM.

The schema is read from the table’s metadata — the original Python dataclass is not required. The returned table is fully in-memory and read/write.

Parameters:

urlpath – Path to the table root directory.

Raises:
  • FileNotFoundError – If urlpath does not contain a CTable.

  • ValueError – If the metadata at urlpath does not identify a CTable.

classmethod open(urlpath: str, *, mode: str = 'r') CTable[source]

Open a persistent CTable from urlpath.

Parameters:
  • urlpath – Path to the table root directory (created by passing urlpath to CTable).

  • mode'r' (default) — read-only. 'a' — read/write.

Raises:
  • FileNotFoundError – If urlpath does not contain a CTable.

  • ValueError – If the metadata at urlpath does not identify a CTable.

add_column(name: str, spec: SchemaSpec, default, *, cparams: dict | None = None) None[source]

Add a new column filled with default for every existing live row.

Parameters:
  • name – Column name. Must follow the same naming rules as schema fields.

  • spec – A schema descriptor such as b2.int64(ge=0) or b2.string().

  • default – Value written to every existing live row. Must be coercible to spec’s dtype.

  • cparams – Optional compression parameters for this column’s NDArray.

Raises:
  • ValueError – If the table is read-only, is a view, or the column already exists.

  • TypeError – If default cannot be coerced to spec’s dtype.

add_computed_column(name: str, expr, *, dtype: dtype | None = None) None[source]

Add a read-only virtual column whose values are computed from other columns.

The column stores no data — it is evaluated on-the-fly when read. It participates in display, filtering, sorting, export (to_arrow / to_csv), and aggregates, but cannot be written to, indexed, or included in append / extend inputs.

Parameters:
  • name – Column name. Must not collide with any existing stored or computed column and must satisfy the usual naming rules.

  • expr – Either a callable (cols: dict[str, NDArray]) -> LazyExpr or an expression string (e.g. "price * qty") where column names are referenced directly and resolved from stored columns.

  • dtype – Override the inferred result dtype. When omitted the dtype is taken from the blosc2.LazyExpr.

Raises:
  • ValueError – If called on a view, the table is read-only, name already exists, or an operand is not a stored column of this table.

  • TypeError – If expr is not a callable or string, or does not return a blosc2.LazyExpr.

close() None[source]

Close any persistent backing store held by this table.

column_schema(name: str) CompiledColumn[source]

Return the CompiledColumn descriptor for name.

Raises:

KeyError – If name is not a column in this table.

compact_index(col_name: str | None = None, *, expression: str | None = None, name: str | None = None) CTableIndex[source]

Compact an index, merging any incremental append runs.

cov() ndarray[source]

Return the covariance matrix as a numpy array.

Only int, float, and bool columns are supported. Bool columns are cast to int (0/1) before computation. Complex columns raise TypeError.

Returns:

Shape (ncols, ncols). Column order matches col_names.

Return type:

numpy.ndarray

Raises:
  • TypeError – If any column has an unsupported dtype (complex, string, …).

  • ValueError – If the table has fewer than 2 live rows (covariance undefined).

create_index(col_name: str | None = None, *, field: str | None = None, expression: str | None = None, operands: dict | None = None, kind: IndexKind = IndexKind.BUCKET, optlevel: int = 5, name: str | None = None, build: str = 'auto', tmpdir: str | None = None, **kwargs) CTableIndex[source]

Build and register an index for a stored column or table expression.

describe() None[source]

Print a per-column statistical summary.

Numeric columns (int, float): count, mean, std, min, max. Bool columns: count, true-count, true-%. String columns: count, min (lex), max (lex), n-unique.

drop_column(name: str) None[source]

Remove a column from the table.

On disk tables the corresponding persisted column leaf is deleted.

Raises:
  • ValueError – If the table is read-only, is a view, or name is the last column.

  • KeyError – If name does not exist.

drop_computed_column(name: str) None[source]

Remove a computed column from the table.

Parameters:

name – Name of the computed column to remove.

Raises:
  • KeyError – If name is not a computed column.

  • ValueError – If called on a view.

drop_index(col_name: str | None = None, *, expression: str | None = None, name: str | None = None) None[source]

Remove an index and delete any sidecar files.

index(col_name: str | None = None, *, expression: str | None = None, name: str | None = None) CTableIndex[source]

Return the index handle for a stored-column or expression target.

materialize_computed_column(name: str, *, new_name: str | None = None, dtype: dtype | None = None, cparams: dict | CParams | None = None) None[source]

Materialize a computed column into a new stored snapshot column.

Parameters:
  • name – Existing computed column to materialize.

  • new_name – Name of the new stored column. Defaults to f"{name}_stored".

  • dtype – Optional target dtype for the stored column. Defaults to the computed column dtype.

  • cparams – Optional compression parameters for the new stored column.

Raises:
  • ValueError – If called on a view, on a read-only table, or if the target name collides with an existing stored or computed column.

  • KeyError – If name is not a computed column.

  • TypeError – If dtype is incompatible with the computed values.

rebuild_index(col_name: str | None = None, *, expression: str | None = None, name: str | None = None) CTableIndex[source]

Drop and recreate an index with the same parameters.

rename_column(old: str, new: str) None[source]

Rename a column.

On disk tables the corresponding persisted column leaf is renamed.

Raises:
  • ValueError – If the table is read-only, is a view, or new already exists.

  • KeyError – If old does not exist.

sample(n: int, *, seed: int | None = None) CTable[source]

Return a read-only view of n randomly chosen live rows.

Parameters:
  • n – Number of rows to sample. If n >= number of live rows, returns a view of the whole table.

  • seed – Optional random seed for reproducibility.

Returns:

A read-only view sharing columns with this table.

Return type:

CTable

save(urlpath: str, *, overwrite: bool = False) None[source]

Copy this (in-memory) table to disk at urlpath.

Only live rows are written — the on-disk table is always compacted.

Parameters:
  • urlpath – Destination directory path.

  • overwrite – If False (default), raise ValueError when urlpath already exists. Set to True to replace an existing table.

Raises:

ValueError – If urlpath already exists and overwrite=False, or if called on a view.

schema_dict() dict[str, Any][source]

Return a JSON-compatible dict describing this table’s schema.

select(cols: list[str]) CTable[source]

Return a column-projection view exposing only cols.

The returned object shares the underlying NDArrays with this table (no data is copied). Row filtering and value writes work as usual; structural mutations (add/drop/rename column, append, …) are blocked.

Parameters:

cols – Ordered list of column names to keep.

Raises:
  • KeyError – If any name in cols is not a column of this table.

  • ValueError – If cols is empty.

sort_by(cols: str | list[str], ascending: bool | list[bool] = True, *, inplace: bool = False) CTable[source]

Return a copy of the table sorted by one or more columns.

Parameters:
  • cols – Column name or list of column names to sort by. When multiple columns are given, the first is the primary key, the second is the tiebreaker, and so on.

  • ascending – Sort direction. A single bool applies to all keys; a list must have the same length as cols.

  • inplace – If True, rewrite the physical data in place and return self (like compact() but sorted). If False (default), return a new in-memory CTable leaving this one untouched.

Raises:
  • ValueError – If called on a view or a read-only table when inplace=True.

  • KeyError – If any column name is not found.

  • TypeError – If a column used as a sort key does not support ordering (e.g. complex numbers).

to_arrow()[source]

Convert all live rows to a pyarrow.Table.

Each column is materialized via col[:] and wrapped in a pyarrow.array. String columns are emitted as pa.string() (variable-length UTF-8); bytes columns as pa.large_binary().

Raises:

ImportError – If pyarrow is not installed.

to_csv(path: str, *, header: bool = True, sep: str = ',') None[source]

Write all live rows to a CSV file.

Uses Python’s stdlib csv module — no extra dependency required. Each column is materialised once via col[:]; rows are then written one at a time.

Parameters:
  • path – Destination file path. Created or overwritten.

  • header – If True (default), write column names as the first row.

  • sep – Field delimiter. Defaults to ","; use "\t" for TSV.

property cbytes: int

Total compressed size in bytes (all columns + valid_rows mask).

property computed_columns: dict[str, dict]

Read-only view of the computed-column definitions.

Each value is a dict with keys expression, col_deps, lazy (blosc2.LazyExpr), and dtype.

property cratio: float

Compression ratio for the whole table payload.

property indexes: list[CTableIndex]

Return a list of CTableIndex handles for all active indexes.

property info: _CTableInfoReporter

Get information about this table.

Examples

>>> print(t.info)
>>> t.info()
property info_items: list[tuple[str, object]]

Structured summary items used by info().

property nbytes: int

Total uncompressed size in bytes (all columns + valid_rows mask).

property ncols: int

Total number of columns, including computed (virtual) columns.

property schema: CompiledSchema

The compiled schema that drives this table’s columns and validation.

Construction

CTable.__init__(row_type[, new_data, ...])

CTable.open(urlpath, *[, mode])

Open a persistent CTable from urlpath.

CTable.load(urlpath)

Load a persistent table from urlpath into RAM.

CTable.from_arrow(arrow_table)

Build a CTable from a pyarrow.Table.

CTable.from_csv(path, row_cls, *[, header, sep])

Build a CTable from a CSV file.

CTable.__init__(row_type: type[RowT], new_data=None, *, urlpath: str | None = None, mode: str = 'a', expected_size: int | None = None, compact: bool = False, validate: bool = True, cparams: dict[str, Any] | None = None, dparams: dict[str, Any] | None = None) None[source]
classmethod CTable.open(urlpath: str, *, mode: str = 'r') CTable[source]

Open a persistent CTable from urlpath.

Parameters:
  • urlpath – Path to the table root directory (created by passing urlpath to CTable).

  • mode'r' (default) — read-only. 'a' — read/write.

Raises:
  • FileNotFoundError – If urlpath does not contain a CTable.

  • ValueError – If the metadata at urlpath does not identify a CTable.

classmethod CTable.load(urlpath: str) CTable[source]

Load a persistent table from urlpath into RAM.

The schema is read from the table’s metadata — the original Python dataclass is not required. The returned table is fully in-memory and read/write.

Parameters:

urlpath – Path to the table root directory.

Raises:
  • FileNotFoundError – If urlpath does not contain a CTable.

  • ValueError – If the metadata at urlpath does not identify a CTable.

classmethod CTable.from_arrow(arrow_table) CTable[source]

Build a CTable from a pyarrow.Table.

Schema is inferred from the Arrow field types. String columns (pa.string(), pa.large_string()) are stored with max_length set to the longest value found in the data.

Parameters:

arrow_table – A pyarrow.Table instance.

Returns:

A new in-memory CTable containing all rows from arrow_table.

Return type:

CTable

Raises:
  • ImportError – If pyarrow is not installed.

  • TypeError – If an Arrow field type has no corresponding blosc2 spec.

classmethod CTable.from_csv(path: str, row_cls, *, header: bool = True, sep: str = ',') CTable[source]

Build a CTable from a CSV file.

Schema comes from row_cls (a dataclass) — CTable is always typed. All rows are read in a single pass into per-column Python lists, then each column is bulk-written into a pre-allocated NDArray (one slice assignment per column, no extend()).

Parameters:
  • path – Source CSV file path.

  • row_cls – A dataclass whose fields define the column names and types.

  • header – If True (default), the first row is treated as a header and skipped. Column order in the file must match row_cls field order regardless.

  • sep – Field delimiter. Defaults to ","; use "\t" for TSV.

Returns:

A new in-memory CTable containing all rows from the CSV file.

Return type:

CTable

Raises:
  • TypeError – If row_cls is not a dataclass.

  • ValueError – If a row has a different number of fields than the schema.

Attributes

CTable.col_names

CTable.computed_columns

Read-only view of the computed-column definitions.

CTable.nrows

CTable.ncols

Total number of columns, including computed (virtual) columns.

CTable.cbytes

Total compressed size in bytes (all columns + valid_rows mask).

CTable.nbytes

Total uncompressed size in bytes (all columns + valid_rows mask).

CTable.schema

The compiled schema that drives this table's columns and validation.

property CTable.computed_columns: dict[str, dict]

Read-only view of the computed-column definitions.

Each value is a dict with keys expression, col_deps, lazy (blosc2.LazyExpr), and dtype.

property CTable.nrows: int
property CTable.ncols: int

Total number of columns, including computed (virtual) columns.

property CTable.cbytes: int

Total compressed size in bytes (all columns + valid_rows mask).

property CTable.nbytes: int

Total uncompressed size in bytes (all columns + valid_rows mask).

property CTable.schema: CompiledSchema

The compiled schema that drives this table’s columns and validation.

Inserting data

CTable.append(data)

CTable.extend(data, *[, validate])

CTable.append(data: list | void | ndarray) None[source]
CTable.extend(data: list | CTable | Any, *, validate: bool | None = None) None[source]

Querying

CTable.where(**kwargs)

CTable.select(cols)

Return a column-projection view exposing only cols.

CTable.head([N])

CTable.tail([N])

CTable.sample(n, *[, seed])

Return a read-only view of n randomly chosen live rows.

CTable.sort_by(cols[, ascending, inplace])

Return a copy of the table sorted by one or more columns.

CTable.where(**kwargs)[source]
CTable.select(cols: list[str]) CTable[source]

Return a column-projection view exposing only cols.

The returned object shares the underlying NDArrays with this table (no data is copied). Row filtering and value writes work as usual; structural mutations (add/drop/rename column, append, …) are blocked.

Parameters:

cols – Ordered list of column names to keep.

Raises:
  • KeyError – If any name in cols is not a column of this table.

  • ValueError – If cols is empty.

CTable.head(N: int = 5) CTable[source]
CTable.tail(N: int = 5) CTable[source]
CTable.sample(n: int, *, seed: int | None = None) CTable[source]

Return a read-only view of n randomly chosen live rows.

Parameters:
  • n – Number of rows to sample. If n >= number of live rows, returns a view of the whole table.

  • seed – Optional random seed for reproducibility.

Returns:

A read-only view sharing columns with this table.

Return type:

CTable

CTable.sort_by(cols: str | list[str], ascending: bool | list[bool] = True, *, inplace: bool = False) CTable[source]

Return a copy of the table sorted by one or more columns.

Parameters:
  • cols – Column name or list of column names to sort by. When multiple columns are given, the first is the primary key, the second is the tiebreaker, and so on.

  • ascending – Sort direction. A single bool applies to all keys; a list must have the same length as cols.

  • inplace – If True, rewrite the physical data in place and return self (like compact() but sorted). If False (default), return a new in-memory CTable leaving this one untouched.

Raises:
  • ValueError – If called on a view or a read-only table when inplace=True.

  • KeyError – If any column name is not found.

  • TypeError – If a column used as a sort key does not support ordering (e.g. complex numbers).

Aggregates & statistics

CTable.describe()

Print a per-column statistical summary.

CTable.cov()

Return the covariance matrix as a numpy array.

CTable.describe() None[source]

Print a per-column statistical summary.

Numeric columns (int, float): count, mean, std, min, max. Bool columns: count, true-count, true-%. String columns: count, min (lex), max (lex), n-unique.

CTable.cov() ndarray[source]

Return the covariance matrix as a numpy array.

Only int, float, and bool columns are supported. Bool columns are cast to int (0/1) before computation. Complex columns raise TypeError.

Returns:

Shape (ncols, ncols). Column order matches col_names.

Return type:

numpy.ndarray

Raises:
  • TypeError – If any column has an unsupported dtype (complex, string, …).

  • ValueError – If the table has fewer than 2 live rows (covariance undefined).

Mutations

In addition to physical schema changes such as CTable.add_column(), CTables can host computed columns backed by a lazy expression over stored columns. Computed columns are read-only, use no extra storage, participate in display, filtering, sorting, and aggregates, and are persisted across CTable.save(), CTable.load(), and CTable.open().

When a computed result should become a normal stored column, use CTable.materialize_computed_column(). The materialized column is a stored snapshot that can be indexed like any other stored column. New rows inserted later via CTable.append() or CTable.extend() auto-fill omitted materialized-column values from the recorded expression metadata.

CTable indexes can also target direct expressions over stored columns via create_index(expression=...). This lets queries reuse indexes for derived predicates without adding either a computed column or a materialized stored one. A matching FULL direct-expression index can also be reused by ordering paths such as CTable.sort_by() when sorting by a computed column backed by the same expression.

CTable.delete(ind)

CTable.compact()

CTable.add_column(name, spec, default, *[, ...])

Add a new column filled with default for every existing live row.

CTable.add_computed_column(name, expr, *[, ...])

Add a read-only virtual column whose values are computed from other columns.

CTable.materialize_computed_column(name, *)

Materialize a computed column into a new stored snapshot column.

CTable.drop_computed_column(name)

Remove a computed column from the table.

CTable.drop_column(name)

Remove a column from the table.

CTable.rename_column(old, new)

Rename a column.

CTable.delete(ind: int | slice | str | Iterable) None[source]
CTable.compact()[source]
CTable.add_column(name: str, spec: SchemaSpec, default, *, cparams: dict | None = None) None[source]

Add a new column filled with default for every existing live row.

Parameters:
  • name – Column name. Must follow the same naming rules as schema fields.

  • spec – A schema descriptor such as b2.int64(ge=0) or b2.string().

  • default – Value written to every existing live row. Must be coercible to spec’s dtype.

  • cparams – Optional compression parameters for this column’s NDArray.

Raises:
  • ValueError – If the table is read-only, is a view, or the column already exists.

  • TypeError – If default cannot be coerced to spec’s dtype.

CTable.add_computed_column(name: str, expr, *, dtype: dtype | None = None) None[source]

Add a read-only virtual column whose values are computed from other columns.

The column stores no data — it is evaluated on-the-fly when read. It participates in display, filtering, sorting, export (to_arrow / to_csv), and aggregates, but cannot be written to, indexed, or included in append / extend inputs.

Parameters:
  • name – Column name. Must not collide with any existing stored or computed column and must satisfy the usual naming rules.

  • expr – Either a callable (cols: dict[str, NDArray]) -> LazyExpr or an expression string (e.g. "price * qty") where column names are referenced directly and resolved from stored columns.

  • dtype – Override the inferred result dtype. When omitted the dtype is taken from the blosc2.LazyExpr.

Raises:
  • ValueError – If called on a view, the table is read-only, name already exists, or an operand is not a stored column of this table.

  • TypeError – If expr is not a callable or string, or does not return a blosc2.LazyExpr.

CTable.materialize_computed_column(name: str, *, new_name: str | None = None, dtype: dtype | None = None, cparams: dict | CParams | None = None) None[source]

Materialize a computed column into a new stored snapshot column.

Parameters:
  • name – Existing computed column to materialize.

  • new_name – Name of the new stored column. Defaults to f"{name}_stored".

  • dtype – Optional target dtype for the stored column. Defaults to the computed column dtype.

  • cparams – Optional compression parameters for the new stored column.

Raises:
  • ValueError – If called on a view, on a read-only table, or if the target name collides with an existing stored or computed column.

  • KeyError – If name is not a computed column.

  • TypeError – If dtype is incompatible with the computed values.

CTable.drop_computed_column(name: str) None[source]

Remove a computed column from the table.

Parameters:

name – Name of the computed column to remove.

Raises:
  • KeyError – If name is not a computed column.

  • ValueError – If called on a view.

CTable.drop_column(name: str) None[source]

Remove a column from the table.

On disk tables the corresponding persisted column leaf is deleted.

Raises:
  • ValueError – If the table is read-only, is a view, or name is the last column.

  • KeyError – If name does not exist.

CTable.rename_column(old: str, new: str) None[source]

Rename a column.

On disk tables the corresponding persisted column leaf is renamed.

Raises:
  • ValueError – If the table is read-only, is a view, or new already exists.

  • KeyError – If old does not exist.

Persistence

CTable.save(urlpath, *[, overwrite])

Copy this (in-memory) table to disk at urlpath.

CTable.to_csv(path, *[, header, sep])

Write all live rows to a CSV file.

CTable.to_arrow()

Convert all live rows to a pyarrow.Table.

CTable.save(urlpath: str, *, overwrite: bool = False) None[source]

Copy this (in-memory) table to disk at urlpath.

Only live rows are written — the on-disk table is always compacted.

Parameters:
  • urlpath – Destination directory path.

  • overwrite – If False (default), raise ValueError when urlpath already exists. Set to True to replace an existing table.

Raises:

ValueError – If urlpath already exists and overwrite=False, or if called on a view.

CTable.to_csv(path: str, *, header: bool = True, sep: str = ',') None[source]

Write all live rows to a CSV file.

Uses Python’s stdlib csv module — no extra dependency required. Each column is materialised once via col[:]; rows are then written one at a time.

Parameters:
  • path – Destination file path. Created or overwritten.

  • header – If True (default), write column names as the first row.

  • sep – Field delimiter. Defaults to ","; use "\t" for TSV.

CTable.to_arrow()[source]

Convert all live rows to a pyarrow.Table.

Each column is materialized via col[:] and wrapped in a pyarrow.array. String columns are emitted as pa.string() (variable-length UTF-8); bytes columns as pa.large_binary().

Raises:

ImportError – If pyarrow is not installed.

Inspection

CTable.info

Get information about this table.

CTable.schema_dict()

Return a JSON-compatible dict describing this table's schema.

CTable.column_schema(name)

Return the CompiledColumn descriptor for name.

CTable.info()

Get information about this table.

Examples

>>> print(t.info)
>>> t.info()
CTable.schema_dict() dict[str, Any][source]

Return a JSON-compatible dict describing this table’s schema.

CTable.column_schema(name: str) CompiledColumn[source]

Return the CompiledColumn descriptor for name.

Raises:

KeyError – If name is not a column in this table.


Column

A lazy column accessor returned by table["col_name"] or table.col_name. All index operations and aggregates apply the table’s tombstone mask (_valid_rows) so deleted rows are silently excluded.

class blosc2.Column(table: CTable, col_name: str, mask=None)[source]
Attributes:
dtype
is_computed

True if this column is a virtual computed column (read-only).

null_value

The sentinel value that represents NULL for this column, or None.

view

Return a ColumnViewIndexer for creating logical sub-views.

Methods

all()

Return True if every live, non-null value is True.

any()

Return True if at least one live, non-null value is True.

assign(data)

Replace all live values in this column with data.

is_null()

Return a boolean array True where the live value is the null sentinel.

iter_chunks([size])

Iterate over live column values in chunks of size rows.

max()

Maximum live, non-null value.

mean()

Arithmetic mean of all live, non-null values.

min()

Minimum live, non-null value.

notnull()

Return a boolean array True where the live value is not the null sentinel.

null_count()

Return the number of live rows whose value equals the null sentinel.

std([ddof])

Standard deviation of all live, non-null values (single-pass, Welford's algorithm).

sum()

Sum of all live, non-null values.

unique()

Return sorted array of unique live, non-null values.

value_counts()

Return a {value: count} dict sorted by count descending.

Special methods

Column.__len__()

Column.__iter__()

Column.__getitem__(key)

Return values for the given logical index.

Column.__setitem__(key, value)

__len__()[source]
__iter__()[source]
__getitem__(key: int | slice | list | ndarray)[source]

Return values for the given logical index.

  • int → scalar

  • slicenumpy.ndarray

  • list / np.ndarraynumpy.ndarray

  • bool np.ndarraynumpy.ndarray

For a writable logical sub-view use view.

__setitem__(key: int | slice | list | ndarray, value)[source]
all() bool[source]

Return True if every live, non-null value is True.

Supported dtypes: bool. Null sentinel values are skipped. Short-circuits on the first False found.

any() bool[source]

Return True if at least one live, non-null value is True.

Supported dtypes: bool. Null sentinel values are skipped. Short-circuits on the first True found.

assign(data) None[source]

Replace all live values in this column with data.

Works on both full tables and views — on a view, only the rows visible through the view’s mask are overwritten.

Parameters:

data – List, numpy array, or any iterable. Must have exactly as many elements as there are live rows in this column. Values are coerced to the column’s dtype if possible.

Raises:
  • ValueError – If len(data) does not match the number of live rows, or the table is opened read-only.

  • TypeError – If values cannot be coerced to the column’s dtype.

is_null() ndarray[source]

Return a boolean array True where the live value is the null sentinel.

iter_chunks(size: int = 65536)[source]

Iterate over live column values in chunks of size rows.

Yields numpy arrays of at most size elements each, skipping deleted rows. The last chunk may be smaller than size.

Parameters:

size – Number of live rows per yielded chunk. Defaults to 65 536.

Yields:

numpy.ndarray – A 1-D array of up to size live values with this column’s dtype.

Examples

>>> for chunk in t["score"].iter_chunks(size=100_000):
...     process(chunk)
max()[source]

Maximum live, non-null value.

Supported dtypes: bool, int, uint, float, string, bytes. Strings are compared lexicographically. Null sentinel values are skipped.

mean() float[source]

Arithmetic mean of all live, non-null values.

Supported dtypes: bool, int, uint, float. Null sentinel values are skipped. Always returns a Python float.

min()[source]

Minimum live, non-null value.

Supported dtypes: bool, int, uint, float, string, bytes. Strings are compared lexicographically. Null sentinel values are skipped.

notnull() ndarray[source]

Return a boolean array True where the live value is not the null sentinel.

null_count() int[source]

Return the number of live rows whose value equals the null sentinel.

Returns 0 in O(1) if no null_value is configured for this column.

std(ddof: int = 0) float[source]

Standard deviation of all live, non-null values (single-pass, Welford’s algorithm).

Parameters:
  • ddof – Delta degrees of freedom. 0 (default) gives the population std; 1 gives the sample std (divides by N-1).

  • dtypes (Supported)

  • skipped. (Null _sphinx_paramlinks_blosc2.Column.std.sentinel values are)

  • float. (Always _sphinx_paramlinks_blosc2.Column.std.returns a Python)

sum()[source]

Sum of all live, non-null values.

Supported dtypes: bool, int, uint, float, complex. Bool values are counted as 0 / 1. Null sentinel values are skipped.

unique() ndarray[source]

Return sorted array of unique live, non-null values.

Null sentinel values are excluded. Processes data in chunks — never loads the full column at once.

value_counts() dict[source]

Return a {value: count} dict sorted by count descending.

Null sentinel values are excluded. Processes data in chunks — never loads the full column at once.

Example

>>> t["active"].value_counts()
{True: 8432, False: 1568}
property is_computed: bool

True if this column is a virtual computed column (read-only).

property null_value

The sentinel value that represents NULL for this column, or None.

property view: ColumnViewIndexer

Return a ColumnViewIndexer for creating logical sub-views.

Examples

Read a sub-view for chained aggregates:

sub = t.price.view[2:10]
sub.sum()

Bulk write through a sub-view:

t.price.view[0:5][:] = np.zeros(5)

Attributes

Column.dtype

Column.null_value

The sentinel value that represents NULL for this column, or None.

property Column.dtype
property Column.null_value

The sentinel value that represents NULL for this column, or None.

Data access

Column.view

Return a ColumnViewIndexer for creating logical sub-views.

Column.iter_chunks([size])

Iterate over live column values in chunks of size rows.

Column.assign(data)

Replace all live values in this column with data.

property Column.view: ColumnViewIndexer

Return a ColumnViewIndexer for creating logical sub-views.

Examples

Read a sub-view for chained aggregates:

sub = t.price.view[2:10]
sub.sum()

Bulk write through a sub-view:

t.price.view[0:5][:] = np.zeros(5)
Column.iter_chunks(size: int = 65536)[source]

Iterate over live column values in chunks of size rows.

Yields numpy arrays of at most size elements each, skipping deleted rows. The last chunk may be smaller than size.

Parameters:

size – Number of live rows per yielded chunk. Defaults to 65 536.

Yields:

numpy.ndarray – A 1-D array of up to size live values with this column’s dtype.

Examples

>>> for chunk in t["score"].iter_chunks(size=100_000):
...     process(chunk)
Column.assign(data) None[source]

Replace all live values in this column with data.

Works on both full tables and views — on a view, only the rows visible through the view’s mask are overwritten.

Parameters:

data – List, numpy array, or any iterable. Must have exactly as many elements as there are live rows in this column. Values are coerced to the column’s dtype if possible.

Raises:
  • ValueError – If len(data) does not match the number of live rows, or the table is opened read-only.

  • TypeError – If values cannot be coerced to the column’s dtype.

Nullable helpers

Column.is_null()

Return a boolean array True where the live value is the null sentinel.

Column.notnull()

Return a boolean array True where the live value is not the null sentinel.

Column.null_count()

Return the number of live rows whose value equals the null sentinel.

Column.is_null() ndarray[source]

Return a boolean array True where the live value is the null sentinel.

Column.notnull() ndarray[source]

Return a boolean array True where the live value is not the null sentinel.

Column.null_count() int[source]

Return the number of live rows whose value equals the null sentinel.

Returns 0 in O(1) if no null_value is configured for this column.

Unique values

Column.unique()

Return sorted array of unique live, non-null values.

Column.value_counts()

Return a {value: count} dict sorted by count descending.

Column.unique() ndarray[source]

Return sorted array of unique live, non-null values.

Null sentinel values are excluded. Processes data in chunks — never loads the full column at once.

Column.value_counts() dict[source]

Return a {value: count} dict sorted by count descending.

Null sentinel values are excluded. Processes data in chunks — never loads the full column at once.

Example

>>> t["active"].value_counts()
{True: 8432, False: 1568}

Aggregates

Null sentinel values are automatically excluded from all aggregates.

Column.sum()

Sum of all live, non-null values.

Column.min()

Minimum live, non-null value.

Column.max()

Maximum live, non-null value.

Column.mean()

Arithmetic mean of all live, non-null values.

Column.std([ddof])

Standard deviation of all live, non-null values (single-pass, Welford's algorithm).

Column.any()

Return True if at least one live, non-null value is True.

Column.all()

Return True if every live, non-null value is True.

Column.sum()[source]

Sum of all live, non-null values.

Supported dtypes: bool, int, uint, float, complex. Bool values are counted as 0 / 1. Null sentinel values are skipped.

Column.min()[source]

Minimum live, non-null value.

Supported dtypes: bool, int, uint, float, string, bytes. Strings are compared lexicographically. Null sentinel values are skipped.

Column.max()[source]

Maximum live, non-null value.

Supported dtypes: bool, int, uint, float, string, bytes. Strings are compared lexicographically. Null sentinel values are skipped.

Column.mean() float[source]

Arithmetic mean of all live, non-null values.

Supported dtypes: bool, int, uint, float. Null sentinel values are skipped. Always returns a Python float.

Column.std(ddof: int = 0) float[source]

Standard deviation of all live, non-null values (single-pass, Welford’s algorithm).

Parameters:
  • ddof – Delta degrees of freedom. 0 (default) gives the population std; 1 gives the sample std (divides by N-1).

  • dtypes (Supported)

  • skipped. (Null _sphinx_paramlinks_blosc2.Column.std.sentinel values are)

  • float. (Always _sphinx_paramlinks_blosc2.Column.std.returns a Python)

Column.any() bool[source]

Return True if at least one live, non-null value is True.

Supported dtypes: bool. Null sentinel values are skipped. Short-circuits on the first True found.

Column.all() bool[source]

Return True if every live, non-null value is True.

Supported dtypes: bool. Null sentinel values are skipped. Short-circuits on the first False found.


Schema Specs

Schema specs are passed to field() to declare a column’s type, storage constraints, and optional null sentinel. They are also available directly in the blosc2 namespace (e.g. blosc2.int64).

blosc2.field(spec: ~blosc2.schema.SchemaSpec, *, default=<dataclasses._MISSING_TYPE object>, cparams: dict[str, ~typing.Any] | None = None, dparams: dict[str, ~typing.Any] | None = None, chunks: tuple[int, ...] | None = None, blocks: tuple[int, ...] | None = None) Field[source]

Attach a Blosc2 schema spec and per-column storage options to a dataclass field.

Parameters:
  • spec – A schema descriptor such as b2.int64(ge=0) or b2.float64().

  • default – Default value for the field. Omit for required fields.

  • cparams – Compression parameters for this column’s NDArray.

  • dparams – Decompression parameters for this column’s NDArray.

  • chunks – Chunk shape for this column’s NDArray.

  • blocks – Block shape for this column’s NDArray.

Examples

>>> from dataclasses import dataclass
>>> import blosc2 as b2
>>> @dataclass
... class Row:
...     id: int = b2.field(b2.int64(ge=0))
...     score: float = b2.field(b2.float64(ge=0, le=100))
...     active: bool = b2.field(b2.bool(), default=True)

Numeric

int8(*[, ge, gt, le, lt, null_value])

8-bit signed integer column (−128 … 127).

int16(*[, ge, gt, le, lt, null_value])

16-bit signed integer column (−32 768 … 32 767).

int32(*[, ge, gt, le, lt, null_value])

32-bit signed integer column (−2 147 483 648 … 2 147 483 647).

int64(*[, ge, gt, le, lt, null_value])

64-bit signed integer column.

uint8(*[, ge, gt, le, lt, null_value])

8-bit unsigned integer column (0 … 255).

uint16(*[, ge, gt, le, lt, null_value])

16-bit unsigned integer column (0 … 65 535).

uint32(*[, ge, gt, le, lt, null_value])

32-bit unsigned integer column (0 … 4 294 967 295).

uint64(*[, ge, gt, le, lt, null_value])

64-bit unsigned integer column.

float32(*[, ge, gt, le, lt, null_value])

32-bit floating-point column (single precision).

float64(*[, ge, gt, le, lt, null_value])

64-bit floating-point column (double precision).

class blosc2.int8(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]

8-bit signed integer column (−128 … 127).

Methods

python_type

alias of int

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of int8

class blosc2.int16(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]

16-bit signed integer column (−32 768 … 32 767).

Methods

python_type

alias of int

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of int16

class blosc2.int32(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]

32-bit signed integer column (−2 147 483 648 … 2 147 483 647).

Methods

python_type

alias of int

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of int32

class blosc2.int64(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]

64-bit signed integer column.

Methods

python_type

alias of int

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of int64

class blosc2.uint8(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]

8-bit unsigned integer column (0 … 255).

Methods

python_type

alias of int

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of uint8

class blosc2.uint16(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]

16-bit unsigned integer column (0 … 65 535).

Methods

python_type

alias of int

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of uint16

class blosc2.uint32(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]

32-bit unsigned integer column (0 … 4 294 967 295).

Methods

python_type

alias of int

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of uint32

class blosc2.uint64(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]

64-bit unsigned integer column.

Methods

python_type

alias of int

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of uint64

class blosc2.float32(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]

32-bit floating-point column (single precision).

Methods

python_type

alias of float

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of float32

class blosc2.float64(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]

64-bit floating-point column (double precision).

Methods

python_type

alias of float

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of float64

Complex

complex64()

64-bit complex number column (two 32-bit floats).

complex128()

128-bit complex number column (two 64-bit floats).

class blosc2.complex64[source]

64-bit complex number column (two 32-bit floats).

Methods

python_type

alias of complex

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of complex64

class blosc2.complex128[source]

128-bit complex number column (two 64-bit floats).

Methods

python_type

alias of complex

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of complex128

Boolean

bool()

Boolean column.

class blosc2.bool[source]

Boolean column.

Methods

python_type

alias of bool

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

type

alias of bool

Text & binary

string(*[, min_length, max_length, pattern, ...])

Fixed-width Unicode string column.

bytes(*[, min_length, max_length, null_value])

Fixed-width bytes column.

class blosc2.string(*, min_length=None, max_length=None, pattern=None, null_value=None)[source]

Fixed-width Unicode string column.

Parameters:
  • max_length – Maximum number of characters. Determines the NumPy U<n> dtype. Defaults to 32 if not specified.

  • min_length – Minimum number of characters (validation only, no effect on dtype).

  • pattern – Regex pattern the value must match (validation only).

Methods

python_type

alias of str

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.

class blosc2.bytes(*, min_length=None, max_length=None, null_value=None)[source]

Fixed-width bytes column.

Parameters:
  • max_length – Maximum number of bytes. Determines the NumPy S<n> dtype. Defaults to 32 if not specified.

  • min_length – Minimum number of bytes (validation only, no effect on dtype).

Methods

python_type

alias of bytes

to_metadata_dict()

Return a JSON-compatible dict for schema serialization.

to_pydantic_kwargs()

Return kwargs for building a Pydantic field annotation.