CTable¶
A columnar compressed table backed by one NDArray per column.
Each column is stored, compressed, and queried independently; rows are never
materialised in their entirety unless you explicitly call to_arrow()
or iterate with __iter__().
- class blosc2.CTable(row_type: type[RowT], new_data=None, *, urlpath: str | None = None, mode: str = 'a', expected_size: int | None = None, compact: bool = False, validate: bool = True, cparams: dict[str, Any] | None = None, dparams: dict[str, Any] | None = None)[source]¶
- Attributes:
cbytesTotal compressed size in bytes (all columns + valid_rows mask).
computed_columnsRead-only view of the computed-column definitions.
cratioCompression ratio for the whole table payload.
indexesReturn a list of
CTableIndexhandles for all active indexes.infoGet information about this table.
info_itemsStructured summary items used by
info().nbytesTotal uncompressed size in bytes (all columns + valid_rows mask).
ncolsTotal number of columns, including computed (virtual) columns.
- nrows
schemaThe compiled schema that drives this table’s columns and validation.
Methods
add_column(name, spec, default, *[, cparams])Add a new column filled with default for every existing live row.
add_computed_column(name, expr, *[, dtype])Add a read-only virtual column whose values are computed from other columns.
close()Close any persistent backing store held by this table.
column_schema(name)Return the
CompiledColumndescriptor for name.compact_index([col_name, expression, name])Compact an index, merging any incremental append runs.
cov()Return the covariance matrix as a numpy array.
create_index([col_name, field, expression, ...])Build and register an index for a stored column or table expression.
describe()Print a per-column statistical summary.
drop_column(name)Remove a column from the table.
drop_computed_column(name)Remove a computed column from the table.
drop_index([col_name, expression, name])Remove an index and delete any sidecar files.
from_arrow(arrow_table)Build a
CTablefrom apyarrow.Table.from_csv(path, row_cls, *[, header, sep])Build a
CTablefrom a CSV file.index([col_name, expression, name])Return the index handle for a stored-column or expression target.
load(urlpath)Load a persistent table from urlpath into RAM.
materialize_computed_column(name, *[, ...])Materialize a computed column into a new stored snapshot column.
open(urlpath, *[, mode])Open a persistent CTable from urlpath.
rebuild_index([col_name, expression, name])Drop and recreate an index with the same parameters.
rename_column(old, new)Rename a column.
sample(n, *[, seed])Return a read-only view of n randomly chosen live rows.
save(urlpath, *[, overwrite])Copy this (in-memory) table to disk at urlpath.
Return a JSON-compatible dict describing this table's schema.
select(cols)Return a column-projection view exposing only cols.
sort_by(cols[, ascending, inplace])Return a copy of the table sorted by one or more columns.
to_arrow()Convert all live rows to a
pyarrow.Table.to_csv(path, *[, header, sep])Write all live rows to a CSV file.
append
compact
delete
extend
head
tail
view
where
Special methods
Return repr(self).
Return str(self).
- classmethod from_arrow(arrow_table) CTable[source]¶
Build a
CTablefrom apyarrow.Table.Schema is inferred from the Arrow field types. String columns (
pa.string(),pa.large_string()) are stored withmax_lengthset to the longest value found in the data.
- classmethod from_csv(path: str, row_cls, *, header: bool = True, sep: str = ',') CTable[source]¶
Build a
CTablefrom a CSV file.Schema comes from row_cls (a dataclass) — CTable is always typed. All rows are read in a single pass into per-column Python lists, then each column is bulk-written into a pre-allocated NDArray (one slice assignment per column, no
extend()).- Parameters:
path¶ – Source CSV file path.
row_cls¶ – A dataclass whose fields define the column names and types.
header¶ – If
True(default), the first row is treated as a header and skipped. Column order in the file must match row_cls field order regardless.sep¶ – Field delimiter. Defaults to
","; use"\t"for TSV.
- Returns:
A new in-memory CTable containing all rows from the CSV file.
- Return type:
- Raises:
TypeError – If row_cls is not a dataclass.
ValueError – If a row has a different number of fields than the schema.
- classmethod load(urlpath: str) CTable[source]¶
Load a persistent table from urlpath into RAM.
The schema is read from the table’s metadata — the original Python dataclass is not required. The returned table is fully in-memory and read/write.
- Parameters:
urlpath¶ – Path to the table root directory.
- Raises:
FileNotFoundError – If urlpath does not contain a CTable.
ValueError – If the metadata at urlpath does not identify a CTable.
- classmethod open(urlpath: str, *, mode: str = 'r') CTable[source]¶
Open a persistent CTable from urlpath.
- add_column(name: str, spec: SchemaSpec, default, *, cparams: dict | None = None) None[source]¶
Add a new column filled with default for every existing live row.
- Parameters:
- Raises:
ValueError – If the table is read-only, is a view, or the column already exists.
TypeError – If default cannot be coerced to spec’s dtype.
- add_computed_column(name: str, expr, *, dtype: dtype | None = None) None[source]¶
Add a read-only virtual column whose values are computed from other columns.
The column stores no data — it is evaluated on-the-fly when read. It participates in display, filtering, sorting, export (to_arrow / to_csv), and aggregates, but cannot be written to, indexed, or included in
append/extendinputs.- Parameters:
name¶ – Column name. Must not collide with any existing stored or computed column and must satisfy the usual naming rules.
expr¶ – Either a callable
(cols: dict[str, NDArray]) -> LazyExpror an expression string (e.g."price * qty") where column names are referenced directly and resolved from stored columns.dtype¶ – Override the inferred result dtype. When omitted the dtype is taken from the
blosc2.LazyExpr.
- Raises:
ValueError – If called on a view, the table is read-only, name already exists, or an operand is not a stored column of this table.
TypeError – If expr is not a callable or string, or does not return a
blosc2.LazyExpr.
- column_schema(name: str) CompiledColumn[source]¶
Return the
CompiledColumndescriptor for name.- Raises:
KeyError – If name is not a column in this table.
- compact_index(col_name: str | None = None, *, expression: str | None = None, name: str | None = None) CTableIndex[source]¶
Compact an index, merging any incremental append runs.
- cov() ndarray[source]¶
Return the covariance matrix as a numpy array.
Only int, float, and bool columns are supported. Bool columns are cast to int (0/1) before computation. Complex columns raise
TypeError.- Returns:
Shape
(ncols, ncols). Column order matchescol_names.- Return type:
numpy.ndarray
- Raises:
TypeError – If any column has an unsupported dtype (complex, string, …).
ValueError – If the table has fewer than 2 live rows (covariance undefined).
- create_index(col_name: str | None = None, *, field: str | None = None, expression: str | None = None, operands: dict | None = None, kind: IndexKind = IndexKind.BUCKET, optlevel: int = 5, name: str | None = None, build: str = 'auto', tmpdir: str | None = None, **kwargs) CTableIndex[source]¶
Build and register an index for a stored column or table expression.
- describe() None[source]¶
Print a per-column statistical summary.
Numeric columns (int, float): count, mean, std, min, max. Bool columns: count, true-count, true-%. String columns: count, min (lex), max (lex), n-unique.
- drop_column(name: str) None[source]¶
Remove a column from the table.
On disk tables the corresponding persisted column leaf is deleted.
- Raises:
ValueError – If the table is read-only, is a view, or name is the last column.
KeyError – If name does not exist.
- drop_computed_column(name: str) None[source]¶
Remove a computed column from the table.
- Parameters:
name¶ – Name of the computed column to remove.
- Raises:
KeyError – If name is not a computed column.
ValueError – If called on a view.
- drop_index(col_name: str | None = None, *, expression: str | None = None, name: str | None = None) None[source]¶
Remove an index and delete any sidecar files.
- index(col_name: str | None = None, *, expression: str | None = None, name: str | None = None) CTableIndex[source]¶
Return the index handle for a stored-column or expression target.
- materialize_computed_column(name: str, *, new_name: str | None = None, dtype: dtype | None = None, cparams: dict | CParams | None = None) None[source]¶
Materialize a computed column into a new stored snapshot column.
- Parameters:
- Raises:
ValueError – If called on a view, on a read-only table, or if the target name collides with an existing stored or computed column.
KeyError – If name is not a computed column.
TypeError – If dtype is incompatible with the computed values.
- rebuild_index(col_name: str | None = None, *, expression: str | None = None, name: str | None = None) CTableIndex[source]¶
Drop and recreate an index with the same parameters.
- rename_column(old: str, new: str) None[source]¶
Rename a column.
On disk tables the corresponding persisted column leaf is renamed.
- Raises:
ValueError – If the table is read-only, is a view, or new already exists.
KeyError – If old does not exist.
- sample(n: int, *, seed: int | None = None) CTable[source]¶
Return a read-only view of n randomly chosen live rows.
- save(urlpath: str, *, overwrite: bool = False) None[source]¶
Copy this (in-memory) table to disk at urlpath.
Only live rows are written — the on-disk table is always compacted.
- select(cols: list[str]) CTable[source]¶
Return a column-projection view exposing only cols.
The returned object shares the underlying NDArrays with this table (no data is copied). Row filtering and value writes work as usual; structural mutations (add/drop/rename column, append, …) are blocked.
- Parameters:
cols¶ – Ordered list of column names to keep.
- Raises:
KeyError – If any name in cols is not a column of this table.
ValueError – If cols is empty.
- sort_by(cols: str | list[str], ascending: bool | list[bool] = True, *, inplace: bool = False) CTable[source]¶
Return a copy of the table sorted by one or more columns.
- Parameters:
cols¶ – Column name or list of column names to sort by. When multiple columns are given, the first is the primary key, the second is the tiebreaker, and so on.
ascending¶ – Sort direction. A single bool applies to all keys; a list must have the same length as cols.
inplace¶ – If
True, rewrite the physical data in place and returnself(likecompact()but sorted). IfFalse(default), return a new in-memory CTable leaving this one untouched.
- Raises:
ValueError – If called on a view or a read-only table when
inplace=True.KeyError – If any column name is not found.
TypeError – If a column used as a sort key does not support ordering (e.g. complex numbers).
- to_arrow()[source]¶
Convert all live rows to a
pyarrow.Table.Each column is materialized via
col[:]and wrapped in apyarrow.array. String columns are emitted aspa.string()(variable-length UTF-8); bytes columns aspa.large_binary().- Raises:
ImportError – If
pyarrowis not installed.
- to_csv(path: str, *, header: bool = True, sep: str = ',') None[source]¶
Write all live rows to a CSV file.
Uses Python’s stdlib
csvmodule — no extra dependency required. Each column is materialised once viacol[:]; rows are then written one at a time.
- property cbytes: int¶
Total compressed size in bytes (all columns + valid_rows mask).
- property computed_columns: dict[str, dict]¶
Read-only view of the computed-column definitions.
Each value is a dict with keys
expression,col_deps,lazy(blosc2.LazyExpr), anddtype.
- property cratio: float¶
Compression ratio for the whole table payload.
- property indexes: list[CTableIndex]¶
Return a list of
CTableIndexhandles for all active indexes.
- property info: _CTableInfoReporter¶
Get information about this table.
Examples
>>> print(t.info) >>> t.info()
- property nbytes: int¶
Total uncompressed size in bytes (all columns + valid_rows mask).
- property ncols: int¶
Total number of columns, including computed (virtual) columns.
- property schema: CompiledSchema¶
The compiled schema that drives this table’s columns and validation.
Construction¶
|
|
|
Open a persistent CTable from urlpath. |
|
Load a persistent table from urlpath into RAM. |
|
Build a |
|
Build a |
- CTable.__init__(row_type: type[RowT], new_data=None, *, urlpath: str | None = None, mode: str = 'a', expected_size: int | None = None, compact: bool = False, validate: bool = True, cparams: dict[str, Any] | None = None, dparams: dict[str, Any] | None = None) None[source]¶
- classmethod CTable.open(urlpath: str, *, mode: str = 'r') CTable[source]¶
Open a persistent CTable from urlpath.
- classmethod CTable.load(urlpath: str) CTable[source]¶
Load a persistent table from urlpath into RAM.
The schema is read from the table’s metadata — the original Python dataclass is not required. The returned table is fully in-memory and read/write.
- Parameters:
urlpath¶ – Path to the table root directory.
- Raises:
FileNotFoundError – If urlpath does not contain a CTable.
ValueError – If the metadata at urlpath does not identify a CTable.
- classmethod CTable.from_arrow(arrow_table) CTable[source]¶
Build a
CTablefrom apyarrow.Table.Schema is inferred from the Arrow field types. String columns (
pa.string(),pa.large_string()) are stored withmax_lengthset to the longest value found in the data.
- classmethod CTable.from_csv(path: str, row_cls, *, header: bool = True, sep: str = ',') CTable[source]¶
Build a
CTablefrom a CSV file.Schema comes from row_cls (a dataclass) — CTable is always typed. All rows are read in a single pass into per-column Python lists, then each column is bulk-written into a pre-allocated NDArray (one slice assignment per column, no
extend()).- Parameters:
path¶ – Source CSV file path.
row_cls¶ – A dataclass whose fields define the column names and types.
header¶ – If
True(default), the first row is treated as a header and skipped. Column order in the file must match row_cls field order regardless.sep¶ – Field delimiter. Defaults to
","; use"\t"for TSV.
- Returns:
A new in-memory CTable containing all rows from the CSV file.
- Return type:
- Raises:
TypeError – If row_cls is not a dataclass.
ValueError – If a row has a different number of fields than the schema.
Attributes¶
|
|
Read-only view of the computed-column definitions. |
|
Total number of columns, including computed (virtual) columns. |
|
Total compressed size in bytes (all columns + valid_rows mask). |
|
Total uncompressed size in bytes (all columns + valid_rows mask). |
|
The compiled schema that drives this table's columns and validation. |
- property CTable.computed_columns: dict[str, dict]¶
Read-only view of the computed-column definitions.
Each value is a dict with keys
expression,col_deps,lazy(blosc2.LazyExpr), anddtype.
- property CTable.nrows: int¶
- property CTable.ncols: int¶
Total number of columns, including computed (virtual) columns.
- property CTable.cbytes: int¶
Total compressed size in bytes (all columns + valid_rows mask).
- property CTable.nbytes: int¶
Total uncompressed size in bytes (all columns + valid_rows mask).
- property CTable.schema: CompiledSchema¶
The compiled schema that drives this table’s columns and validation.
Inserting data¶
|
|
|
Querying¶
|
|
|
Return a column-projection view exposing only cols. |
|
|
|
|
|
Return a read-only view of n randomly chosen live rows. |
|
Return a copy of the table sorted by one or more columns. |
- CTable.select(cols: list[str]) CTable[source]¶
Return a column-projection view exposing only cols.
The returned object shares the underlying NDArrays with this table (no data is copied). Row filtering and value writes work as usual; structural mutations (add/drop/rename column, append, …) are blocked.
- Parameters:
cols¶ – Ordered list of column names to keep.
- Raises:
KeyError – If any name in cols is not a column of this table.
ValueError – If cols is empty.
- CTable.sample(n: int, *, seed: int | None = None) CTable[source]¶
Return a read-only view of n randomly chosen live rows.
- CTable.sort_by(cols: str | list[str], ascending: bool | list[bool] = True, *, inplace: bool = False) CTable[source]¶
Return a copy of the table sorted by one or more columns.
- Parameters:
cols¶ – Column name or list of column names to sort by. When multiple columns are given, the first is the primary key, the second is the tiebreaker, and so on.
ascending¶ – Sort direction. A single bool applies to all keys; a list must have the same length as cols.
inplace¶ – If
True, rewrite the physical data in place and returnself(likecompact()but sorted). IfFalse(default), return a new in-memory CTable leaving this one untouched.
- Raises:
ValueError – If called on a view or a read-only table when
inplace=True.KeyError – If any column name is not found.
TypeError – If a column used as a sort key does not support ordering (e.g. complex numbers).
Aggregates & statistics¶
Print a per-column statistical summary. |
|
Return the covariance matrix as a numpy array. |
- CTable.describe() None[source]¶
Print a per-column statistical summary.
Numeric columns (int, float): count, mean, std, min, max. Bool columns: count, true-count, true-%. String columns: count, min (lex), max (lex), n-unique.
- CTable.cov() ndarray[source]¶
Return the covariance matrix as a numpy array.
Only int, float, and bool columns are supported. Bool columns are cast to int (0/1) before computation. Complex columns raise
TypeError.- Returns:
Shape
(ncols, ncols). Column order matchescol_names.- Return type:
numpy.ndarray
- Raises:
TypeError – If any column has an unsupported dtype (complex, string, …).
ValueError – If the table has fewer than 2 live rows (covariance undefined).
Mutations¶
In addition to physical schema changes such as CTable.add_column(),
CTables can host computed columns backed by a lazy expression over stored
columns. Computed columns are read-only, use no extra storage, participate in
display, filtering, sorting, and aggregates, and are persisted across
CTable.save(), CTable.load(), and CTable.open().
When a computed result should become a normal stored column, use
CTable.materialize_computed_column(). The materialized column is a stored
snapshot that can be indexed like any other stored column. New rows inserted
later via CTable.append() or CTable.extend() auto-fill omitted
materialized-column values from the recorded expression metadata.
CTable indexes can also target direct expressions over stored columns via
create_index(expression=...). This lets queries reuse indexes for derived
predicates without adding either a computed column or a materialized stored one.
A matching FULL direct-expression index can also be reused by ordering paths
such as CTable.sort_by() when sorting by a computed column backed by the
same expression.
|
|
|
Add a new column filled with default for every existing live row. |
|
Add a read-only virtual column whose values are computed from other columns. |
|
Materialize a computed column into a new stored snapshot column. |
Remove a computed column from the table. |
|
|
Remove a column from the table. |
|
Rename a column. |
- CTable.add_column(name: str, spec: SchemaSpec, default, *, cparams: dict | None = None) None[source]¶
Add a new column filled with default for every existing live row.
- Parameters:
- Raises:
ValueError – If the table is read-only, is a view, or the column already exists.
TypeError – If default cannot be coerced to spec’s dtype.
- CTable.add_computed_column(name: str, expr, *, dtype: dtype | None = None) None[source]¶
Add a read-only virtual column whose values are computed from other columns.
The column stores no data — it is evaluated on-the-fly when read. It participates in display, filtering, sorting, export (to_arrow / to_csv), and aggregates, but cannot be written to, indexed, or included in
append/extendinputs.- Parameters:
name¶ – Column name. Must not collide with any existing stored or computed column and must satisfy the usual naming rules.
expr¶ – Either a callable
(cols: dict[str, NDArray]) -> LazyExpror an expression string (e.g."price * qty") where column names are referenced directly and resolved from stored columns.dtype¶ – Override the inferred result dtype. When omitted the dtype is taken from the
blosc2.LazyExpr.
- Raises:
ValueError – If called on a view, the table is read-only, name already exists, or an operand is not a stored column of this table.
TypeError – If expr is not a callable or string, or does not return a
blosc2.LazyExpr.
- CTable.materialize_computed_column(name: str, *, new_name: str | None = None, dtype: dtype | None = None, cparams: dict | CParams | None = None) None[source]¶
Materialize a computed column into a new stored snapshot column.
- Parameters:
- Raises:
ValueError – If called on a view, on a read-only table, or if the target name collides with an existing stored or computed column.
KeyError – If name is not a computed column.
TypeError – If dtype is incompatible with the computed values.
- CTable.drop_computed_column(name: str) None[source]¶
Remove a computed column from the table.
- Parameters:
name¶ – Name of the computed column to remove.
- Raises:
KeyError – If name is not a computed column.
ValueError – If called on a view.
Persistence¶
|
Copy this (in-memory) table to disk at urlpath. |
|
Write all live rows to a CSV file. |
Convert all live rows to a |
- CTable.save(urlpath: str, *, overwrite: bool = False) None[source]¶
Copy this (in-memory) table to disk at urlpath.
Only live rows are written — the on-disk table is always compacted.
Inspection¶
Get information about this table. |
|
Return a JSON-compatible dict describing this table's schema. |
|
|
Return the |
- CTable.info()¶
Get information about this table.
Examples
>>> print(t.info) >>> t.info()
- CTable.schema_dict() dict[str, Any][source]¶
Return a JSON-compatible dict describing this table’s schema.
- CTable.column_schema(name: str) CompiledColumn[source]¶
Return the
CompiledColumndescriptor for name.- Raises:
KeyError – If name is not a column in this table.
Column¶
A lazy column accessor returned by table["col_name"] or table.col_name.
All index operations and aggregates apply the table’s tombstone mask
(_valid_rows) so deleted rows are silently excluded.
- class blosc2.Column(table: CTable, col_name: str, mask=None)[source]¶
- Attributes:
- dtype
is_computedTrue if this column is a virtual computed column (read-only).
null_valueThe sentinel value that represents NULL for this column, or
None.viewReturn a
ColumnViewIndexerfor creating logical sub-views.
Methods
all()Return True if every live, non-null value is True.
any()Return True if at least one live, non-null value is True.
assign(data)Replace all live values in this column with data.
is_null()Return a boolean array True where the live value is the null sentinel.
iter_chunks([size])Iterate over live column values in chunks of size rows.
max()Maximum live, non-null value.
mean()Arithmetic mean of all live, non-null values.
min()Minimum live, non-null value.
notnull()Return a boolean array True where the live value is not the null sentinel.
Return the number of live rows whose value equals the null sentinel.
std([ddof])Standard deviation of all live, non-null values (single-pass, Welford's algorithm).
sum()Sum of all live, non-null values.
unique()Return sorted array of unique live, non-null values.
Return a
{value: count}dict sorted by count descending.Special methods
Column.__getitem__(key)Return values for the given logical index.
Column.__setitem__(key, value)- __getitem__(key: int | slice | list | ndarray)[source]¶
Return values for the given logical index.
int→ scalarslice→numpy.ndarraylist / np.ndarray→numpy.ndarraybool np.ndarray→numpy.ndarray
For a writable logical sub-view use
view.
- all() bool[source]¶
Return True if every live, non-null value is True.
Supported dtypes: bool. Null sentinel values are skipped. Short-circuits on the first False found.
- any() bool[source]¶
Return True if at least one live, non-null value is True.
Supported dtypes: bool. Null sentinel values are skipped. Short-circuits on the first True found.
- assign(data) None[source]¶
Replace all live values in this column with data.
Works on both full tables and views — on a view, only the rows visible through the view’s mask are overwritten.
- Parameters:
data¶ – List, numpy array, or any iterable. Must have exactly as many elements as there are live rows in this column. Values are coerced to the column’s dtype if possible.
- Raises:
ValueError – If
len(data)does not match the number of live rows, or the table is opened read-only.TypeError – If values cannot be coerced to the column’s dtype.
- iter_chunks(size: int = 65536)[source]¶
Iterate over live column values in chunks of size rows.
Yields numpy arrays of at most size elements each, skipping deleted rows. The last chunk may be smaller than size.
- Parameters:
size¶ – Number of live rows per yielded chunk. Defaults to 65 536.
- Yields:
numpy.ndarray – A 1-D array of up to size live values with this column’s dtype.
Examples
>>> for chunk in t["score"].iter_chunks(size=100_000): ... process(chunk)
- max()[source]¶
Maximum live, non-null value.
Supported dtypes: bool, int, uint, float, string, bytes. Strings are compared lexicographically. Null sentinel values are skipped.
- mean() float[source]¶
Arithmetic mean of all live, non-null values.
Supported dtypes: bool, int, uint, float. Null sentinel values are skipped. Always returns a Python float.
- min()[source]¶
Minimum live, non-null value.
Supported dtypes: bool, int, uint, float, string, bytes. Strings are compared lexicographically. Null sentinel values are skipped.
- notnull() ndarray[source]¶
Return a boolean array True where the live value is not the null sentinel.
- null_count() int[source]¶
Return the number of live rows whose value equals the null sentinel.
Returns
0in O(1) if nonull_valueis configured for this column.
- std(ddof: int = 0) float[source]¶
Standard deviation of all live, non-null values (single-pass, Welford’s algorithm).
- sum()[source]¶
Sum of all live, non-null values.
Supported dtypes: bool, int, uint, float, complex. Bool values are counted as 0 / 1. Null sentinel values are skipped.
- unique() ndarray[source]¶
Return sorted array of unique live, non-null values.
Null sentinel values are excluded. Processes data in chunks — never loads the full column at once.
- value_counts() dict[source]¶
Return a
{value: count}dict sorted by count descending.Null sentinel values are excluded. Processes data in chunks — never loads the full column at once.
Example
>>> t["active"].value_counts() {True: 8432, False: 1568}
- property null_value¶
The sentinel value that represents NULL for this column, or
None.
- property view: ColumnViewIndexer¶
Return a
ColumnViewIndexerfor creating logical sub-views.Examples
Read a sub-view for chained aggregates:
sub = t.price.view[2:10] sub.sum()
Bulk write through a sub-view:
t.price.view[0:5][:] = np.zeros(5)
Attributes¶
The sentinel value that represents NULL for this column, or |
- property Column.dtype¶
- property Column.null_value¶
The sentinel value that represents NULL for this column, or
None.
Data access¶
Return a |
|
|
Iterate over live column values in chunks of size rows. |
|
Replace all live values in this column with data. |
- property Column.view: ColumnViewIndexer¶
Return a
ColumnViewIndexerfor creating logical sub-views.Examples
Read a sub-view for chained aggregates:
sub = t.price.view[2:10] sub.sum()
Bulk write through a sub-view:
t.price.view[0:5][:] = np.zeros(5)
- Column.iter_chunks(size: int = 65536)[source]¶
Iterate over live column values in chunks of size rows.
Yields numpy arrays of at most size elements each, skipping deleted rows. The last chunk may be smaller than size.
- Parameters:
size¶ – Number of live rows per yielded chunk. Defaults to 65 536.
- Yields:
numpy.ndarray – A 1-D array of up to size live values with this column’s dtype.
Examples
>>> for chunk in t["score"].iter_chunks(size=100_000): ... process(chunk)
- Column.assign(data) None[source]¶
Replace all live values in this column with data.
Works on both full tables and views — on a view, only the rows visible through the view’s mask are overwritten.
- Parameters:
data¶ – List, numpy array, or any iterable. Must have exactly as many elements as there are live rows in this column. Values are coerced to the column’s dtype if possible.
- Raises:
ValueError – If
len(data)does not match the number of live rows, or the table is opened read-only.TypeError – If values cannot be coerced to the column’s dtype.
Nullable helpers¶
Return a boolean array True where the live value is the null sentinel. |
|
Return a boolean array True where the live value is not the null sentinel. |
|
Return the number of live rows whose value equals the null sentinel. |
- Column.is_null() ndarray[source]¶
Return a boolean array True where the live value is the null sentinel.
Unique values¶
Return sorted array of unique live, non-null values. |
|
Return a |
Aggregates¶
Null sentinel values are automatically excluded from all aggregates.
Sum of all live, non-null values. |
|
Minimum live, non-null value. |
|
Maximum live, non-null value. |
|
Arithmetic mean of all live, non-null values. |
|
|
Standard deviation of all live, non-null values (single-pass, Welford's algorithm). |
Return True if at least one live, non-null value is True. |
|
Return True if every live, non-null value is True. |
- Column.sum()[source]¶
Sum of all live, non-null values.
Supported dtypes: bool, int, uint, float, complex. Bool values are counted as 0 / 1. Null sentinel values are skipped.
- Column.min()[source]¶
Minimum live, non-null value.
Supported dtypes: bool, int, uint, float, string, bytes. Strings are compared lexicographically. Null sentinel values are skipped.
- Column.max()[source]¶
Maximum live, non-null value.
Supported dtypes: bool, int, uint, float, string, bytes. Strings are compared lexicographically. Null sentinel values are skipped.
- Column.mean() float[source]¶
Arithmetic mean of all live, non-null values.
Supported dtypes: bool, int, uint, float. Null sentinel values are skipped. Always returns a Python float.
- Column.std(ddof: int = 0) float[source]¶
Standard deviation of all live, non-null values (single-pass, Welford’s algorithm).
- Column.any() bool[source]¶
Return True if at least one live, non-null value is True.
Supported dtypes: bool. Null sentinel values are skipped. Short-circuits on the first True found.
- Column.all() bool[source]¶
Return True if every live, non-null value is True.
Supported dtypes: bool. Null sentinel values are skipped. Short-circuits on the first False found.
Schema Specs¶
Schema specs are passed to field() to declare a column’s type,
storage constraints, and optional null sentinel. They are also
available directly in the blosc2 namespace (e.g. blosc2.int64).
- blosc2.field(spec: ~blosc2.schema.SchemaSpec, *, default=<dataclasses._MISSING_TYPE object>, cparams: dict[str, ~typing.Any] | None = None, dparams: dict[str, ~typing.Any] | None = None, chunks: tuple[int, ...] | None = None, blocks: tuple[int, ...] | None = None) Field[source]¶
Attach a Blosc2 schema spec and per-column storage options to a dataclass field.
- Parameters:
spec¶ – A schema descriptor such as
b2.int64(ge=0)orb2.float64().default¶ – Default value for the field. Omit for required fields.
cparams¶ – Compression parameters for this column’s NDArray.
dparams¶ – Decompression parameters for this column’s NDArray.
chunks¶ – Chunk shape for this column’s NDArray.
blocks¶ – Block shape for this column’s NDArray.
Examples
>>> from dataclasses import dataclass >>> import blosc2 as b2 >>> @dataclass ... class Row: ... id: int = b2.field(b2.int64(ge=0)) ... score: float = b2.field(b2.float64(ge=0, le=100)) ... active: bool = b2.field(b2.bool(), default=True)
Numeric¶
|
8-bit signed integer column (−128 … 127). |
|
16-bit signed integer column (−32 768 … 32 767). |
|
32-bit signed integer column (−2 147 483 648 … 2 147 483 647). |
|
64-bit signed integer column. |
|
8-bit unsigned integer column (0 … 255). |
|
16-bit unsigned integer column (0 … 65 535). |
|
32-bit unsigned integer column (0 … 4 294 967 295). |
|
64-bit unsigned integer column. |
|
32-bit floating-point column (single precision). |
|
64-bit floating-point column (double precision). |
- class blosc2.int8(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]¶
8-bit signed integer column (−128 … 127).
Methods
alias of
intto_metadata_dict()Return a JSON-compatible dict for schema serialization.
to_pydantic_kwargs()Return kwargs for building a Pydantic field annotation.
alias of
int8
- class blosc2.int16(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]¶
16-bit signed integer column (−32 768 … 32 767).
Methods
alias of
intto_metadata_dict()Return a JSON-compatible dict for schema serialization.
to_pydantic_kwargs()Return kwargs for building a Pydantic field annotation.
alias of
int16
- class blosc2.int32(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]¶
32-bit signed integer column (−2 147 483 648 … 2 147 483 647).
Methods
alias of
intto_metadata_dict()Return a JSON-compatible dict for schema serialization.
to_pydantic_kwargs()Return kwargs for building a Pydantic field annotation.
alias of
int32
- class blosc2.int64(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]¶
64-bit signed integer column.
Methods
alias of
intto_metadata_dict()Return a JSON-compatible dict for schema serialization.
to_pydantic_kwargs()Return kwargs for building a Pydantic field annotation.
alias of
int64
- class blosc2.uint8(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]¶
8-bit unsigned integer column (0 … 255).
Methods
alias of
intto_metadata_dict()Return a JSON-compatible dict for schema serialization.
to_pydantic_kwargs()Return kwargs for building a Pydantic field annotation.
alias of
uint8
- class blosc2.uint16(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]¶
16-bit unsigned integer column (0 … 65 535).
Methods
alias of
intto_metadata_dict()Return a JSON-compatible dict for schema serialization.
to_pydantic_kwargs()Return kwargs for building a Pydantic field annotation.
alias of
uint16
- class blosc2.uint32(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]¶
32-bit unsigned integer column (0 … 4 294 967 295).
Methods
alias of
intto_metadata_dict()Return a JSON-compatible dict for schema serialization.
to_pydantic_kwargs()Return kwargs for building a Pydantic field annotation.
alias of
uint32
- class blosc2.uint64(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]¶
64-bit unsigned integer column.
Methods
alias of
intto_metadata_dict()Return a JSON-compatible dict for schema serialization.
to_pydantic_kwargs()Return kwargs for building a Pydantic field annotation.
alias of
uint64
- class blosc2.float32(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]¶
32-bit floating-point column (single precision).
Methods
alias of
floatto_metadata_dict()Return a JSON-compatible dict for schema serialization.
to_pydantic_kwargs()Return kwargs for building a Pydantic field annotation.
alias of
float32
- class blosc2.float64(*, ge=None, gt=None, le=None, lt=None, null_value=None)[source]¶
64-bit floating-point column (double precision).
Methods
alias of
floatto_metadata_dict()Return a JSON-compatible dict for schema serialization.
to_pydantic_kwargs()Return kwargs for building a Pydantic field annotation.
alias of
float64
Complex¶
64-bit complex number column (two 32-bit floats). |
|
128-bit complex number column (two 64-bit floats). |
- class blosc2.complex64[source]¶
64-bit complex number column (two 32-bit floats).
Methods
alias of
complexReturn a JSON-compatible dict for schema serialization.
Return kwargs for building a Pydantic field annotation.
alias of
complex64
Boolean¶
|
Boolean column. |
Text & binary¶
|
Fixed-width Unicode string column. |
|
Fixed-width bytes column. |
- class blosc2.string(*, min_length=None, max_length=None, pattern=None, null_value=None)[source]¶
Fixed-width Unicode string column.
- Parameters:
Methods
alias of
strReturn a JSON-compatible dict for schema serialization.
Return kwargs for building a Pydantic field annotation.