Skip to content

Table

Instances of class Table are handles to Pixeltable tables and views/snapshots.

Use this handle to query and update the table and to add and drop columns.

Tables are created by calling pxt.create_table. Views are created by calling pxt.create_view, and snapshots by calling pxt.create_snapshot.

To get a handle to an existing table/view/snapshot, call pxt.get_table.

Overview

Column Operations
add_column Add a column to the table or view
drop_column Remove a column from the table or view
rename_column Rename a column
Data Operations
insert Insert rows into table
update Update rows in table or view
delete Delete rows from table
Indexing Operations
add_embedding_index Add embedding index on column
drop_embedding_index Drop embedding index from column
drop_index Drop index from column
Versioning
revert Revert the last change

pixeltable.Table

Table(id: UUID, dir_id: UUID, name: str, tbl_version_path: TableVersionPath)

A handle to a table, view, or snapshot. This class is the primary interface through which table operations (queries, insertions, updates, etc.) are performed in Pixeltable.

columns property

columns: list[str]

Return the names of the columns in this table.

add_column

add_column(
    *,
    if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error",
    **kwargs: Union[ColumnType, type, _GenericAlias, Expr]
) -> UpdateStatus

Adds an ordinary (non-computed) column to the table.

Parameters:

  • kwargs (Union[ColumnType, type, _GenericAlias, Expr], default: {} ) –

    Exactly one keyword argument of the form col_name=col_type.

  • if_exists (Literal['error', 'ignore', 'replace', 'replace_force'], default: 'error' ) –

    Determines the behavior if the column already exists. Must be one of the following:

    • 'error': an exception will be raised.
    • 'ignore': do nothing and return.
    • 'replace' or 'replace_force': drop the existing column and add the new column, if it has no dependents.

Returns:

  • UpdateStatus

    Information about the execution status of the operation.

Raises:

  • Error

    If the column name is invalid, or already exists and if_exists='erorr', or if_exists='replace*' but the column has dependents or is a basetable column.

Examples:

Add an int column:

>>> tbl.add_column(new_col=pxt.Int)

Alternatively, this can also be expressed as:

>>> tbl.add_columns({'new_col': pxt.Int})

add_columns

add_columns(
    schema: dict[str, Union[ColumnType, type, _GenericAlias]],
    if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error",
) -> UpdateStatus

Adds multiple columns to the table. The columns must be concrete (non-computed) columns; to add computed columns, use add_computed_column() instead.

The format of the schema argument is identical to the format of the schema in a call to create_table().

Parameters:

  • schema (dict[str, Union[ColumnType, type, _GenericAlias]]) –

    A dictionary mapping column names to types.

  • if_exists (Literal['error', 'ignore', 'replace', 'replace_force'], default: 'error' ) –

    Determines the behavior if a column already exists. Must be one of the following:

    • 'error': an exception will be raised.
    • 'ignore': do nothing and return.
    • 'replace' or 'replace_force': drop the existing column and add the new column, if it has no dependents.

    Note that the if_exists parameter is applied to all columns in the schema. To apply different behaviors to different columns, please use add_column() for each column.

Returns:

  • UpdateStatus

    Information about the execution status of the operation.

Raises:

  • Error

    If any column name is invalid, or already exists and if_exists='error', or if_exists='replace*' but the column has dependents or is a basetable column.

Examples:

Add multiple columns to the table my_table:

>>> tbl = pxt.get_table('my_table')
... schema = {
...     'new_col_1': pxt.Int,
...     'new_col_2': pxt.String,
... }
... tbl.add_columns(schema)

add_computed_column

add_computed_column(
    *,
    stored: Optional[bool] = None,
    print_stats: bool = False,
    on_error: Literal["abort", "ignore"] = "abort",
    if_exists: Literal["error", "ignore", "replace"] = "error",
    **kwargs: Expr
) -> UpdateStatus

Adds a computed column to the table.

Parameters:

  • kwargs (Expr, default: {} ) –

    Exactly one keyword argument of the form col_name=expression.

  • stored (Optional[bool], default: None ) –

    Whether the column is materialized and stored or computed on demand.

  • print_stats (bool, default: False ) –

    If True, print execution metrics during evaluation.

  • on_error (Literal['abort', 'ignore'], default: 'abort' ) –

    Determines the behavior if an error occurs while evaluating the column expression for at least one row.

    • 'abort': an exception will be raised and the column will not be added.
    • 'ignore': execution will continue and the column will be added. Any rows with errors will have a None value for the column, with information about the error stored in the corresponding tbl.col_name.errortype and tbl.col_name.errormsg fields.
  • if_exists (Literal['error', 'ignore', 'replace'], default: 'error' ) –

    Determines the behavior if the column already exists. Must be one of the following:

    • 'error': an exception will be raised.
    • 'ignore': do nothing and return.
    • 'replace' or 'replace_force': drop the existing column and add the new column, iff it has no dependents.

Returns:

  • UpdateStatus

    Information about the execution status of the operation.

Raises:

  • Error

    If the column name is invalid or already exists and if_exists='error', or if_exists='replace*' but the column has dependents or is a basetable column.

Examples:

For a table with an image column frame, add an image column rotated that rotates the image by 90 degrees:

>>> tbl.add_computed_column(rotated=tbl.frame.rotate(90))

Do the same, but now the column is unstored:

>>> tbl.add_computed_column(rotated=tbl.frame.rotate(90), stored=False)

add_embedding_index

add_embedding_index(
    column: Union[str, ColumnRef],
    *,
    idx_name: Optional[str] = None,
    embedding: Optional[Function] = None,
    string_embed: Optional[Function] = None,
    image_embed: Optional[Function] = None,
    metric: str = "cosine",
    if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error"
) -> None

Add an embedding index to the table. Once the index is created, it will be automatically kept up-to-date as new rows are inserted into the table.

To add an embedding index, one must specify, at minimum, the column to be indexed and an embedding UDF. Only String and Image columns are currently supported. Here's an example that uses a CLIP embedding to index an image column:

from pixeltable.functions.huggingface import clip ... embedding_fn = clip.using(model_id='openai/clip-vit-base-patch32') ... tbl.add_embedding_index(tbl.img, embedding=embedding_fn)

Once the index is created, similiarity lookups can be performed using the similarity pseudo-function.

reference_img = PIL.Image.open('my_image.jpg') ... sim = tbl.img.similarity(reference_img) ... tbl.select(tbl.img, sim).order_by(sim, asc=False).limit(5)

If the embedding UDF is a multimodal embedding (supporting more than one data type), then lookups may be performed using any of its supported types. In our example, CLIP supports both text and images, so we can also search for images using a text description:

sim = tbl.img.similarity('a picture of a train') ... tbl.select(tbl.img, sim).order_by(sim, asc=False).limit(5)

Parameters:

  • column (Union[str, ColumnRef]) –

    The name of, or reference to, the column to be indexed; must be a String or Image column.

  • idx_name (Optional[str], default: None ) –

    An optional name for the index. If not specified, a name such as 'idx0' will be generated automatically. If specified, the name must be unique for this table and a valid pixeltable column name.

  • embedding (Optional[Function], default: None ) –

    The UDF to use for the embedding. Must be a UDF that accepts a single argument of type String or Image (as appropriate for the column being indexed) and returns a fixed-size 1-dimensional array of floats.

  • string_embed (Optional[Function], default: None ) –

    An optional UDF to use for the string embedding component of this index. Can be used in conjunction with image_embed to construct multimodal embeddings manually, by specifying different embedding functions for different data types.

  • image_embed (Optional[Function], default: None ) –

    An optional UDF to use for the image embedding component of this index. Can be used in conjunction with string_embed to construct multimodal embeddings manually, by specifying different embedding functions for different data types.

  • metric (str, default: 'cosine' ) –

    Distance metric to use for the index; one of 'cosine', 'ip', or 'l2'. The default is 'cosine'.

  • if_exists (Literal['error', 'ignore', 'replace', 'replace_force'], default: 'error' ) –

    Directive for handling an existing index with the same name. Must be one of the following:

    • 'error': raise an error if an index with the same name already exists.
    • 'ignore': do nothing if an index with the same name already exists.
    • 'replace' or 'replace_force': replace the existing index with the new one.

Raises:

  • Error

    If an index with the specified name already exists for the table and if_exists='error', or if the specified column does not exist.

Examples:

Add an index to the img column of the table my_table:

>>> from pixeltable.functions.huggingface import clip
... tbl = pxt.get_table('my_table')
... embedding_fn = clip.using(model_id='openai/clip-vit-base-patch32')
... tbl.add_embedding_index(tbl.img, embedding=embedding_fn)

Alternatively, the img column may be specified by name:

>>> tbl.add_embedding_index('img', embedding=embedding_fn)

Add a second index to the img column, using the inner product as the distance metric, and with a specific name:

>>> tbl.add_embedding_index(
...     tbl.img,
...     idx_name='ip_idx',
...     embedding=embedding_fn,
...     metric='ip'
... )

Add an index using separately specified string and image embeddings:

>>> tbl.add_embedding_index(
...     tbl.img,
...     string_embed=string_embedding_fn,
...     image_embed=image_embedding_fn
... )

batch_update

batch_update(
    rows: Iterable[dict[str, Any]],
    cascade: bool = True,
    if_not_exists: Literal["error", "ignore", "insert"] = "error",
) -> UpdateStatus

Update rows in this table.

Parameters:

  • rows (Iterable[dict[str, Any]]) –

    an Iterable of dictionaries containing values for the updated columns plus values for the primary key columns.

  • cascade (bool, default: True ) –

    if True, also update all computed columns that transitively depend on the updated columns.

  • if_not_exists (Literal['error', 'ignore', 'insert'], default: 'error' ) –

    Specifies the behavior if a row to update does not exist:

    • 'error': Raise an error.
    • 'ignore': Skip the row silently.
    • 'insert': Insert the row.

Examples:

Update the name and age columns for the rows with ids 1 and 2 (assuming id is the primary key). If either row does not exist, this raises an error:

>>> tbl.update([{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 2, 'name': 'Bob', 'age': 40}])

Update the name and age columns for the row with id 1 (assuming id is the primary key) and insert the row with new id 3 (assuming this key does not exist):

>>> tbl.update(
    [{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 3, 'name': 'Bob', 'age': 40}],
    if_not_exists='insert')

collect

collect() -> 'pxt.dataframe.DataFrameResultSet'

Return rows from this table.

count

count() -> int

Return the number of rows in this table.

delete

delete(where: Optional['pxt.exprs.Expr'] = None) -> UpdateStatus

Delete rows in this table.

Parameters:

  • where (Optional['pxt.exprs.Expr'], default: None ) –

    a predicate to filter rows to delete.

Examples:

Delete all rows in a table:

>>> tbl.delete()

Delete all rows in a table where column a is greater than 5:

>>> tbl.delete(tbl.a > 5)

describe

describe() -> None

Print the table schema.

drop_column

drop_column(
    column: Union[str, ColumnRef],
    if_not_exists: Literal["error", "ignore"] = "error",
) -> None

Drop a column from the table.

Parameters:

  • column (Union[str, ColumnRef]) –

    The name or reference of the column to drop.

  • if_not_exists (Literal['error', 'ignore'], default: 'error' ) –

    Directive for handling a non-existent column. Must be one of the following:

    • 'error': raise an error if the column does not exist.
    • 'ignore': do nothing if the column does not exist.

Raises:

  • Error

    If the column does not exist and if_exists='error', or if it is referenced by a dependent computed column.

Examples:

Drop the column col from the table my_table by column name:

>>> tbl = pxt.get_table('my_table')
... tbl.drop_column('col')

Drop the column col from the table my_table by column reference:

>>> tbl = pxt.get_table('my_table')
... tbl.drop_column(tbl.col)

Drop the column col from the table my_table if it exists, otherwise do nothing:

>>> tbl = pxt.get_table('my_table')
... tbl.drop_col(tbl.col, if_not_exists='ignore')

drop_embedding_index

drop_embedding_index(
    *,
    column: Union[str, ColumnRef, None] = None,
    idx_name: Optional[str] = None,
    if_not_exists: Literal["error", "ignore"] = "error"
) -> None

Drop an embedding index from the table. Either a column name or an index name (but not both) must be specified. If a column name or reference is specified, it must be a column containing exactly one embedding index; otherwise the specific index name must be provided instead.

Parameters:

  • column (Union[str, ColumnRef, None], default: None ) –

    The name of, or reference to, the column from which to drop the index. The column must have only one embedding index.

  • idx_name (Optional[str], default: None ) –

    The name of the index to drop.

  • if_not_exists (Literal['error', 'ignore'], default: 'error' ) –

    Directive for handling a non-existent index. Must be one of the following:

    • 'error': raise an error if the index does not exist.
    • 'ignore': do nothing if the index does not exist.

    Note that if_not_exists parameter is only applicable when an idx_name is specified and it does not exist, or when column is specified and it has no index. if_not_exists does not apply to non-exisitng column.

Raises:

  • Error

    If column is specified, but the column does not exist, or it contains no embedding indices and if_not_exists='error', or the column has multiple embedding indices.

  • Error

    If idx_name is specified, but the index is not an embedding index, or the index does not exist and if_not_exists='error'.

Examples:

Drop the embedding index on the img column of the table my_table by column name:

>>> tbl = pxt.get_table('my_table')
... tbl.drop_embedding_index(column='img')

Drop the embedding index on the img column of the table my_table by column reference:

>>> tbl = pxt.get_table('my_table')
... tbl.drop_embedding_index(column=tbl.img)

Drop the embedding index idx1 of the table my_table by index name:

>>> tbl = pxt.get_table('my_table')
... tbl.drop_embedding_index(idx_name='idx1')

Drop the embedding index idx1 of the table my_table by index name, if it exists, otherwise do nothing:

>>> tbl = pxt.get_table('my_table')
... tbl.drop_embedding_index(idx_name='idx1', if_not_exists='ignore')

drop_index

drop_index(
    *,
    column: Union[str, ColumnRef, None] = None,
    idx_name: Optional[str] = None,
    if_not_exists: Literal["error", "ignore"] = "error"
) -> None

Drop an index from the table. Either a column name or an index name (but not both) must be specified. If a column name or reference is specified, it must be a column containing exactly one index; otherwise the specific index name must be provided instead.

Parameters:

  • column (Union[str, ColumnRef, None], default: None ) –

    The name of, or reference to, the column from which to drop the index. The column must have only one embedding index.

  • idx_name (Optional[str], default: None ) –

    The name of the index to drop.

  • if_not_exists (Literal['error', 'ignore'], default: 'error' ) –

    Directive for handling a non-existent index. Must be one of the following:

    • 'error': raise an error if the index does not exist.
    • 'ignore': do nothing if the index does not exist.

    Note that if_not_exists parameter is only applicable when an idx_name is specified and it does not exist, or when column is specified and it has no index. if_not_exists does not apply to non-exisitng column.

Raises:

  • Error

    If column is specified, but the column does not exist, or it contains no indices or multiple indices.

  • Error

    If idx_name is specified, but the index does not exist.

Examples:

Drop the index on the img column of the table my_table by column name:

>>> tbl = pxt.get_table('my_table')
... tbl.drop_index(column_name='img')

Drop the index on the img column of the table my_table by column reference:

>>> tbl = pxt.get_table('my_table')
... tbl.drop_index(tbl.img)

Drop the index idx1 of the table my_table by index name:

>>> tbl = pxt.get_table('my_table')
... tbl.drop_index(idx_name='idx1')

Drop the index idx1 of the table my_table by index name, if it exists, otherwise do nothing:

>>> tbl = pxt.get_table('my_table')
... tbl.drop_index(idx_name='idx1', if_not_exists='ignore')

get_metadata

get_metadata() -> dict[str, Any]

Retrieves metadata associated with this table.

Returns:

  • dict[str, Any]

    A dictionary containing the metadata, in the following format:

    {
        'base': None,  # If this is a view or snapshot, will contain the name of its base table
        'schema': {
            'col1': StringType(),
            'col2': IntType(),
        },
        'version': 22,
        'schema_version': 1,
        'comment': '',
        'num_retained_versions': 10,
        'is_view': False,
        'is_snapshot': False,
        'media_validation': 'on_write',
    }
    

group_by

group_by(*items: 'exprs.Expr') -> 'pxt.DataFrame'

Group the rows of this table based on the expression.

See DataFrame.group_by for more details.

head

head(*args, **kwargs) -> 'pxt.dataframe.DataFrameResultSet'

Return the first n rows inserted into this table.

insert abstractmethod

insert(
    source: Optional[TableDataSource] = None,
    /,
    *,
    source_format: Optional[Literal["csv", "excel", "parquet", "json"]] = None,
    schema_overrides: Optional[dict[str, ColumnType]] = None,
    on_error: Literal["abort", "ignore"] = "abort",
    print_stats: bool = False,
    **kwargs: Any,
) -> UpdateStatus

Inserts rows into this table. There are two mutually exclusive call patterns:

To insert multiple rows at a time:

python insert( source: TableSourceDataType, /, *, on_error: Literal['abort', 'ignore'] = 'abort', print_stats: bool = False, **kwargs: Any, )

To insert just a single row, you can use the more concise syntax:

python insert( *, on_error: Literal['abort', 'ignore'] = 'abort', print_stats: bool = False, **kwargs: Any )

Parameters:

  • source (Optional[TableDataSource], default: None ) –

    A data source from which data can be imported.

  • kwargs (Any, default: {} ) –

    (if inserting a single row) Keyword-argument pairs representing column names and values. (if inserting multiple rows) Additional keyword arguments are passed to the data source.

  • source_format (Optional[Literal['csv', 'excel', 'parquet', 'json']], default: None ) –

    A hint about the format of the source data

  • schema_overrides (Optional[dict[str, ColumnType]], default: None ) –

    If specified, then columns in schema_overrides will be given the specified types

  • on_error (Literal['abort', 'ignore'], default: 'abort' ) –

    Determines the behavior if an error occurs while evaluating a computed column or detecting an invalid media file (such as a corrupt image) for one of the inserted rows.

    • If on_error='abort', then an exception will be raised and the rows will not be inserted.
    • If on_error='ignore', then execution will continue and the rows will be inserted. Any cells with errors will have a None value for that cell, with information about the error stored in the corresponding tbl.col_name.errortype and tbl.col_name.errormsg fields.
  • print_stats (bool, default: False ) –

    If True, print statistics about the cost of computed columns.

Returns:

Raises:

  • Error

    If one of the following conditions occurs:

    • The table is a view or snapshot.
    • The table has been dropped.
    • One of the rows being inserted does not conform to the table schema.
    • An error occurs during processing of computed columns, and on_error='ignore'.
    • An error occurs while importing data from a source, and on_error='abort'.

Examples:

Insert two rows into the table my_table with three int columns a, b, and c. Column c is nullable:

>>> tbl = pxt.get_table('my_table')
... tbl.insert([{'a': 1, 'b': 1, 'c': 1}, {'a': 2, 'b': 2}])

Insert a single row using the alternative syntax:

>>> tbl.insert(a=3, b=3, c=3)

Insert rows from a CSV file:

>>> tbl.insert(source='path/to/file.csv')

join

join(
    other: "Table",
    *,
    on: Optional["exprs.Expr"] = None,
    how: "pixeltable.plan.JoinType.LiteralType" = "inner"
) -> "pxt.DataFrame"

Join this table with another table.

list_views

list_views(*, recursive: bool = True) -> list[str]

Returns a list of all views and snapshots of this Table.

Parameters:

  • recursive (bool, default: True ) –

    If False, returns only the immediate successor views of this Table. If True, returns all sub-views (including views of views, etc.)

Returns:

  • list[str]

    A list of view paths.

order_by

order_by(*items: 'exprs.Expr', asc: bool = True) -> 'pxt.DataFrame'

Order the rows of this table based on the expression.

See DataFrame.order_by for more details.

rename_column

rename_column(old_name: str, new_name: str) -> None

Rename a column.

Parameters:

  • old_name (str) –

    The current name of the column.

  • new_name (str) –

    The new name of the column.

Raises:

  • Error

    If the column does not exist, or if the new name is invalid or already exists.

Examples:

Rename the column col1 to col2 of the table my_table:

>>> tbl = pxt.get_table('my_table')
... tbl.rename_column('col1', 'col2')

revert

revert() -> None

Reverts the table to the previous version.

.. warning:: This operation is irreversible.

select

select(*items: Any, **named_items: Any) -> 'pxt.DataFrame'

Select columns or expressions from this table.

See DataFrame.select for more details.

show

show(*args, **kwargs) -> 'pxt.dataframe.DataFrameResultSet'

Return rows from this table.

sync

sync(
    stores: Optional[str | list[str]] = None,
    *,
    export_data: bool = True,
    import_data: bool = True
) -> "pxt.io.SyncStatus"

Synchronizes this table with its linked external stores.

Parameters:

  • stores (Optional[str | list[str]], default: None ) –

    If specified, will synchronize only the specified named store or list of stores. If not specified, will synchronize all of this table's external stores.

  • export_data (bool, default: True ) –

    If True, data from this table will be exported to the external stores during synchronization.

  • import_data (bool, default: True ) –

    If True, data from the external stores will be imported to this table during synchronization.

tail

tail(*args, **kwargs) -> 'pxt.dataframe.DataFrameResultSet'

Return the last n rows inserted into this table.

to_coco_dataset

to_coco_dataset() -> Path

Return the path to a COCO json file for this table. See DataFrame.to_coco_dataset()

to_pytorch_dataset

to_pytorch_dataset(
    image_format: str = "pt",
) -> "torch.utils.data.IterableDataset"

Return a PyTorch Dataset for this table. See DataFrame.to_pytorch_dataset()

unlink_external_stores(
    stores: Optional[str | list[str]] = None,
    *,
    delete_external_data: bool = False,
    ignore_errors: bool = False
) -> None

Unlinks this table's external stores.

Parameters:

  • stores (Optional[str | list[str]], default: None ) –

    If specified, will unlink only the specified named store or list of stores. If not specified, will unlink all of this table's external stores.

  • ignore_errors (bool, default: False ) –

    If True, no exception will be thrown if a specified store is not linked to this table.

  • delete_external_data (bool, default: False ) –

    If True, then the external data store will also be deleted. WARNING: This is a destructive operation that will delete data outside Pixeltable, and cannot be undone.

update

update(
    value_spec: dict[str, Any],
    where: Optional["pxt.exprs.Expr"] = None,
    cascade: bool = True,
) -> UpdateStatus

Update rows in this table.

Parameters:

  • value_spec (dict[str, Any]) –

    a dictionary mapping column names to literal values or Pixeltable expressions.

  • where (Optional['pxt.exprs.Expr'], default: None ) –

    a predicate to filter rows to update.

  • cascade (bool, default: True ) –

    if True, also update all computed columns that transitively depend on the updated columns.

Examples:

Set column int_col to 1 for all rows:

>>> tbl.update({'int_col': 1})

Set column int_col to 1 for all rows where int_col is 0:

>>> tbl.update({'int_col': 1}, where=tbl.int_col == 0)

Set int_col to the value of other_int_col + 1:

>>> tbl.update({'int_col': tbl.other_int_col + 1})

Increment int_col by 1 for all rows where int_col is 0:

>>> tbl.update({'int_col': tbl.int_col + 1}, where=tbl.int_col == 0)

validate_column_name classmethod

validate_column_name(name: str) -> None

Check that a name is usable as a pixeltalbe column name

where

where(pred: 'exprs.Expr') -> 'pxt.DataFrame'

Filter rows from this table based on the expression.

See DataFrame.where for more details.