Skip to content

Table

Instances of class Table are handles to Pixeltable tables and views/snapshots.

Use this handle to query and update the table and to add and drop columns.

Tables are created by calling pxt.create_table. Views and snapshots are created by calling pxt.create_view (snapshots require is_snapshot=True).

To get a handle to an existing table/view/snapshot, call pxt.get_table.

Overview

Column Operations
add_column Add a column to the table or view
drop_column Remove a column from the table or view
rename_column Rename a column
Data Operations
insert Insert rows into table
update Update rows in table or view
delete Delete rows from table
Indexing Operations
add_embedding_index Add embedding index on column
drop_embedding_index Drop embedding index from column
drop_index Drop index from column
Versioning
revert Revert the last change

pixeltable.Table

Table(id: UUID, dir_id: UUID, name: str, tbl_version_path: TableVersionPath)

Base class for table objects (base tables, views, snapshots).

add_column

add_column(
    *,
    type: Optional[ColumnType] = None,
    stored: Optional[bool] = None,
    print_stats: bool = False,
    **kwargs: Union[ColumnType, Expr, Callable]
) -> UpdateStatus

Adds a column to the table.

Parameters:

  • kwargs (Union[ColumnType, Expr, Callable], default: {} ) –

    Exactly one keyword argument of the form column-name=type|value-expression.

  • type (Optional[ColumnType], default: None ) –

    The type of the column. Only valid and required if value-expression is a Callable.

  • stored (Optional[bool], default: None ) –

    Whether the column is materialized and stored or computed on demand. Only valid for image columns.

  • print_stats (bool, default: False ) –

    If True, print execution metrics.

Returns:

  • UpdateStatus

    execution status

Raises:

  • Error

    If the column name is invalid or already exists.

Examples:

Add an int column with None values:

>>> tbl.add_column(new_col=IntType())

Alternatively, this can also be expressed as:

>>> tbl['new_col'] = IntType()

For a table with int column int_col, add a column that is the factorial of int_col. The names of the parameters of the Callable must correspond to existing column names (the column values are then passed as arguments to the Callable). In this case, the column type needs to be specified explicitly:

>>> tbl.add_column(factorial=lambda int_col: math.factorial(int_col), type=IntType())

Alternatively, this can also be expressed as:

>>> tbl['factorial'] = {'value': lambda int_col: math.factorial(int_col), 'type': IntType()}

For a table with an image column frame, add an image column rotated that rotates the image by 90 degrees. In this case, the column type is inferred from the expression. Also, the column is not stored (by default, computed image columns are not stored but recomputed on demand):

>>> tbl.add_column(rotated=tbl.frame.rotate(90))

Alternatively, this can also be expressed as:

>>> tbl['rotated'] = tbl.frame.rotate(90)

Do the same, but now the column is stored:

>>> tbl.add_column(rotated=tbl.frame.rotate(90), stored=True)

Alternatively, this can also be expressed as:

>>> tbl['rotated'] = {'value': tbl.frame.rotate(90), 'stored': True}

add_embedding_index

add_embedding_index(
    col_name: str,
    *,
    idx_name: Optional[str] = None,
    string_embed: Optional[Function] = None,
    image_embed: Optional[Function] = None,
    metric: str = "cosine"
) -> None

Add an index to the table.

Parameters:

  • col_name (str) –

    name of column to index

  • idx_name (Optional[str], default: None ) –

    name of index, which needs to be unique for the table; if not provided, a name will be generated

  • string_embed (Optional[Function], default: None ) –

    function to embed text; required if the column is a text column

  • image_embed (Optional[Function], default: None ) –

    function to embed images; required if the column is an image column

  • metric (str, default: 'cosine' ) –

    distance metric to use for the index; one of 'cosine', 'ip', 'l2'; default is 'cosine'

Raises:

  • Error

    If an index with that name already exists for the table or if the column does not exist.

Examples:

Add an index to the img column:

>>> tbl.add_embedding_index('img', image_embed=...)

Add another index to the img column, using the inner product as the distance metric, and with a specific name; string_embed is also specified in order to search with text:

>>> tbl.add_embedding_index(
    'img', idx_name='clip_idx', image_embed=..., string_embed=..., metric='ip')

batch_update

batch_update(
    rows: Iterable[dict[str, Any]],
    cascade: bool = True,
    if_not_exists: Literal["error", "ignore", "insert"] = "error",
) -> UpdateStatus

Update rows in this table.

Parameters:

  • rows (Iterable[dict[str, Any]]) –

    an Iterable of dictionaries containing values for the updated columns plus values for the primary key columns.

  • cascade (bool, default: True ) –

    if True, also update all computed columns that transitively depend on the updated columns.

  • if_not_exists (Literal['error', 'ignore', 'insert'], default: 'error' ) –

    Specifies the behavior if a row to update does not exist:

    • 'error': Raise an error.
    • 'ignore': Skip the row silently.
    • 'insert': Insert the row.

Examples:

Update the name and age columns for the rows with ids 1 and 2 (assuming id is the primary key). If either row does not exist, this raises an error:

>>> tbl.update([{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 2, 'name': 'Bob', 'age': 40}])

Update the name and age columns for the row with id 1 (assuming id is the primary key) and insert the row with new id 3 (assuming this key does not exist):

>>> tbl.update(
    [{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 3, 'name': 'Bob', 'age': 40}],
    if_not_exists='insert')

collect

collect() -> 'pixeltable.dataframe.DataFrameResultSet'

Return rows from this table.

count

count() -> int

Return the number of rows in this table.

delete

delete(where: Optional['pixeltable.exprs.Expr'] = None) -> UpdateStatus

Delete rows in this table.

Parameters:

  • where (Optional['pixeltable.exprs.Expr'], default: None ) –

    a predicate to filter rows to delete.

Examples:

Delete all rows in a table:

>>> tbl.delete()

Delete all rows in a table where column a is greater than 5:

>>> tbl.delete(tbl.a > 5)

describe

describe() -> None

Print the table schema.

drop_column

drop_column(name: str) -> None

Drop a column from the table.

Parameters:

  • name (str) –

    The name of the column to drop.

Raises:

  • Error

    If the column does not exist or if it is referenced by a computed column.

Examples:

Drop column factorial:

>>> tbl.drop_column('factorial')

drop_embedding_index

drop_embedding_index(
    *, column_name: Optional[str] = None, idx_name: Optional[str] = None
) -> None

Drop an embedding index from the table.

Parameters:

  • column_name (Optional[str], default: None ) –

    The name of the column whose embedding index to drop. Invalid if the column has multiple embedding indices.

  • idx_name (Optional[str], default: None ) –

    The name of the index to drop.

Raises:

  • Error

    If the index does not exist.

Examples:

Drop embedding index on the img column:

>>> tbl.drop_embedding_index(column_name='img')

drop_index

drop_index(
    *, column_name: Optional[str] = None, idx_name: Optional[str] = None
) -> None

Drop an index from the table.

Parameters:

  • column_name (Optional[str], default: None ) –

    The name of the column whose index to drop. Invalid if the column has multiple indices.

  • idx_name (Optional[str], default: None ) –

    The name of the index to drop.

Raises:

  • Error

    If the index does not exist.

Examples:

Drop index on the img column:

>>> tbl.drop_index(column_name='img')

group_by

group_by(*items: 'exprs.Expr') -> 'pixeltable.DataFrame'

Return a DataFrame for this table.

head

head(*args, **kwargs) -> 'pixeltable.dataframe.DataFrameResultSet'

Return the first n rows inserted into this table.

insert abstractmethod

insert(
    rows: Optional[Iterable[dict[str, Any]]] = None,
    /,
    *,
    print_stats: bool = False,
    fail_on_exception: bool = True,
    **kwargs: Any,
) -> UpdateStatus

Inserts rows into this table. There are two mutually exclusive call patterns:

To insert multiple rows at a time: insert(rows: Iterable[dict[str, Any]], /, *, print_stats: bool = False, fail_on_exception: bool = True)

To insert just a single row, you can use the more convenient syntax: insert(*, print_stats: bool = False, fail_on_exception: bool = True, **kwargs: Any)

Parameters:

  • rows (Optional[Iterable[dict[str, Any]]], default: None ) –

    (if inserting multiple rows) A list of rows to insert, each of which is a dictionary mapping column names to values.

  • kwargs (Any, default: {} ) –

    (if inserting a single row) Keyword-argument pairs representing column names and values.

  • print_stats (bool, default: False ) –

    If True, print statistics about the cost of computed columns.

  • fail_on_exception (bool, default: True ) –

    Determines how exceptions in computed columns and invalid media files (e.g., corrupt images) are handled. If False, store error information (accessible as column properties 'errortype' and 'errormsg') for those cases, but continue inserting rows. If True, raise an exception that aborts the insert.

Returns:

  • UpdateStatus

    execution status

Raises:

  • Error

    if a row does not match the table schema or contains values for computed columns

Examples:

Insert two rows into a table with three int columns a, b, and c. Column c is nullable.

>>> tbl.insert([{'a': 1, 'b': 1, 'c': 1}, {'a': 2, 'b': 2}])

Insert a single row into a table with three int columns a, b, and c.

>>> tbl.insert(a=1, b=1, c=1)

list_views

list_views(*, recursive: bool = True) -> list[str]

Returns a list of all views and snapshots of this Table.

Parameters:

  • recursive (bool, default: True ) –

    If False, returns only the immediate successor views of this Table. If True, returns all sub-views (including views of views, etc.)

order_by

order_by(*items: 'exprs.Expr', asc: bool = True) -> 'pixeltable.DataFrame'

Return a DataFrame for this table.

rename_column

rename_column(old_name: str, new_name: str) -> None

Rename a column.

Parameters:

  • old_name (str) –

    The current name of the column.

  • new_name (str) –

    The new name of the column.

Raises:

  • Error

    If the column does not exist or if the new name is invalid or already exists.

Examples:

Rename column factorial to fac:

>>> tbl.rename_column('factorial', 'fac')

revert

revert() -> None

Reverts the table to the previous version.

.. warning:: This operation is irreversible.

select

select(*items: Any, **named_items: Any) -> 'pixeltable.DataFrame'

Return a DataFrame for this table.

show

show(*args, **kwargs) -> 'pixeltable.dataframe.DataFrameResultSet'

Return rows from this table.

sync

sync(
    stores: Optional[str | list[str]] = None,
    *,
    export_data: bool = True,
    import_data: bool = True
) -> "pixeltable.io.SyncStatus"

Synchronizes this table with its linked external stores.

Parameters:

  • stores (Optional[str | list[str]], default: None ) –

    If specified, will synchronize only the specified named store or list of stores. If not specified, will synchronize all of this table's external stores.

  • export_data (bool, default: True ) –

    If True, data from this table will be exported to the external stores during synchronization.

  • import_data (bool, default: True ) –

    If True, data from the external stores will be imported to this table during synchronization.

tail

tail(*args, **kwargs) -> 'pixeltable.dataframe.DataFrameResultSet'

Return the last n rows inserted into this table.

to_coco_dataset

to_coco_dataset() -> Path

Return the path to a COCO json file for this table. See DataFrame.to_coco_dataset()

to_pytorch_dataset

to_pytorch_dataset(
    image_format: str = "pt",
) -> "torch.utils.data.IterableDataset"

Return a PyTorch Dataset for this table. See DataFrame.to_pytorch_dataset()

unlink_external_stores(
    stores: Optional[str | list[str]] = None,
    *,
    delete_external_data: bool = False,
    ignore_errors: bool = False
) -> None

Unlinks this table's external stores.

Parameters:

  • stores (Optional[str | list[str]], default: None ) –

    If specified, will unlink only the specified named store or list of stores. If not specified, will unlink all of this table's external stores.

  • ignore_errors (bool, default: False ) –

    If True, no exception will be thrown if a specified store is not linked to this table.

  • delete_external_data (bool, default: False ) –

    If True, then the external data store will also be deleted. WARNING: This is a destructive operation that will delete data outside Pixeltable, and cannot be undone.

update

update(
    value_spec: dict[str, Any],
    where: Optional["pixeltable.exprs.Expr"] = None,
    cascade: bool = True,
) -> UpdateStatus

Update rows in this table.

Parameters:

  • value_spec (dict[str, Any]) –

    a dictionary mapping column names to literal values or Pixeltable expressions.

  • where (Optional['pixeltable.exprs.Expr'], default: None ) –

    a predicate to filter rows to update.

  • cascade (bool, default: True ) –

    if True, also update all computed columns that transitively depend on the updated columns.

Examples:

Set column int_col to 1 for all rows:

>>> tbl.update({'int_col': 1})

Set column int_col to 1 for all rows where int_col is 0:

>>> tbl.update({'int_col': 1}, where=tbl.int_col == 0)

Set int_col to the value of other_int_col + 1:

>>> tbl.update({'int_col': tbl.other_int_col + 1})

Increment int_col by 1 for all rows where int_col is 0:

>>> tbl.update({'int_col': tbl.int_col + 1}, where=tbl.int_col == 0)

where

where(pred: 'exprs.Expr') -> 'pixeltable.DataFrame'

Return a DataFrame for this table.