Table
Instances of class Table
are handles to Pixeltable tables and views/snapshots.
Use this handle to query and update the table and to add and drop columns.
Tables are created by calling pxt.create_table
.
Views are created by calling pxt.create_view
, and snapshots by calling
pxt.create_snapshot
.
To get a handle to an existing table/view/snapshot, call pxt.get_table
.
Overview
Column Operations | |
---|---|
add_column |
Add a column to the table or view |
drop_column |
Remove a column from the table or view |
rename_column |
Rename a column |
Data Operations | |
---|---|
insert |
Insert rows into table |
update |
Update rows in table or view |
delete |
Delete rows from table |
Indexing Operations | |
---|---|
add_embedding_index |
Add embedding index on column |
drop_embedding_index |
Drop embedding index from column |
drop_index |
Drop index from column |
Versioning | |
---|---|
revert |
Revert the last change |
pixeltable.Table
Table(id: UUID, dir_id: UUID, name: str, tbl_version_path: TableVersionPath)
A handle to a table, view, or snapshot. This class is the primary interface through which table operations (queries, insertions, updates, etc.) are performed in Pixeltable.
columns
property
columns: list[str]
Return the names of the columns in this table.
add_column
add_column(
*,
if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error",
**kwargs: Union[ColumnType, type, _GenericAlias, Expr]
) -> UpdateStatus
Adds an ordinary (non-computed) column to the table.
Parameters:
-
kwargs
(Union[ColumnType, type, _GenericAlias, Expr]
, default:{}
) –Exactly one keyword argument of the form
col_name=col_type
. -
if_exists
(Literal['error', 'ignore', 'replace', 'replace_force']
, default:'error'
) –Determines the behavior if the column already exists. Must be one of the following:
'error'
: an exception will be raised.'ignore'
: do nothing and return.'replace' or 'replace_force'
: drop the existing column and add the new column, if it has no dependents.
Returns:
-
UpdateStatus
–Information about the execution status of the operation.
Raises:
-
Error
–If the column name is invalid, or already exists and
if_exists='erorr'
, orif_exists='replace*'
but the column has dependents or is a basetable column.
Examples:
Add an int column:
>>> tbl.add_column(new_col=pxt.Int)
Alternatively, this can also be expressed as:
>>> tbl.add_columns({'new_col': pxt.Int})
add_columns
add_columns(
schema: dict[str, Union[ColumnType, type, _GenericAlias]],
if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error",
) -> UpdateStatus
Adds multiple columns to the table. The columns must be concrete (non-computed) columns; to add computed columns,
use add_computed_column()
instead.
The format of the schema
argument is identical to the format of the schema in a call to
create_table()
.
Parameters:
-
schema
(dict[str, Union[ColumnType, type, _GenericAlias]]
) –A dictionary mapping column names to types.
-
if_exists
(Literal['error', 'ignore', 'replace', 'replace_force']
, default:'error'
) –Determines the behavior if a column already exists. Must be one of the following:
'error'
: an exception will be raised.'ignore'
: do nothing and return.'replace' or 'replace_force'
: drop the existing column and add the new column, if it has no dependents.
Note that the
if_exists
parameter is applied to all columns in the schema. To apply different behaviors to different columns, please useadd_column()
for each column.
Returns:
-
UpdateStatus
–Information about the execution status of the operation.
Raises:
-
Error
–If any column name is invalid, or already exists and
if_exists='error'
, orif_exists='replace*'
but the column has dependents or is a basetable column.
Examples:
Add multiple columns to the table my_table
:
>>> tbl = pxt.get_table('my_table')
... schema = {
... 'new_col_1': pxt.Int,
... 'new_col_2': pxt.String,
... }
... tbl.add_columns(schema)
add_computed_column
add_computed_column(
*,
stored: Optional[bool] = None,
print_stats: bool = False,
on_error: Literal["abort", "ignore"] = "abort",
if_exists: Literal["error", "ignore", "replace"] = "error",
**kwargs: Expr
) -> UpdateStatus
Adds a computed column to the table.
Parameters:
-
kwargs
(Expr
, default:{}
) –Exactly one keyword argument of the form
col_name=expression
. -
stored
(Optional[bool]
, default:None
) –Whether the column is materialized and stored or computed on demand.
-
print_stats
(bool
, default:False
) –If
True
, print execution metrics during evaluation. -
on_error
(Literal['abort', 'ignore']
, default:'abort'
) –Determines the behavior if an error occurs while evaluating the column expression for at least one row.
'abort'
: an exception will be raised and the column will not be added.'ignore'
: execution will continue and the column will be added. Any rows with errors will have aNone
value for the column, with information about the error stored in the correspondingtbl.col_name.errortype
andtbl.col_name.errormsg
fields.
-
if_exists
(Literal['error', 'ignore', 'replace']
, default:'error'
) –Determines the behavior if the column already exists. Must be one of the following:
'error'
: an exception will be raised.'ignore'
: do nothing and return.'replace' or 'replace_force'
: drop the existing column and add the new column, iff it has no dependents.
Returns:
-
UpdateStatus
–Information about the execution status of the operation.
Raises:
-
Error
–If the column name is invalid or already exists and
if_exists='error'
, orif_exists='replace*'
but the column has dependents or is a basetable column.
Examples:
For a table with an image column frame
, add an image column rotated
that rotates the image by
90 degrees:
>>> tbl.add_computed_column(rotated=tbl.frame.rotate(90))
Do the same, but now the column is unstored:
>>> tbl.add_computed_column(rotated=tbl.frame.rotate(90), stored=False)
add_embedding_index
add_embedding_index(
column: Union[str, ColumnRef],
*,
idx_name: Optional[str] = None,
embedding: Optional[Function] = None,
string_embed: Optional[Function] = None,
image_embed: Optional[Function] = None,
metric: str = "cosine",
if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error"
) -> None
Add an embedding index to the table. Once the index is created, it will be automatically kept up-to-date as new rows are inserted into the table.
To add an embedding index, one must specify, at minimum, the column to be indexed and an embedding UDF.
Only String
and Image
columns are currently supported. Here's an example that uses a
CLIP embedding to index an image column:
from pixeltable.functions.huggingface import clip ... embedding_fn = clip.using(model_id='openai/clip-vit-base-patch32') ... tbl.add_embedding_index(tbl.img, embedding=embedding_fn)
Once the index is created, similiarity lookups can be performed using the similarity
pseudo-function.
reference_img = PIL.Image.open('my_image.jpg') ... sim = tbl.img.similarity(reference_img) ... tbl.select(tbl.img, sim).order_by(sim, asc=False).limit(5)
If the embedding UDF is a multimodal embedding (supporting more than one data type), then lookups may be performed using any of its supported types. In our example, CLIP supports both text and images, so we can also search for images using a text description:
sim = tbl.img.similarity('a picture of a train') ... tbl.select(tbl.img, sim).order_by(sim, asc=False).limit(5)
Parameters:
-
column
(Union[str, ColumnRef]
) –The name of, or reference to, the column to be indexed; must be a
String
orImage
column. -
idx_name
(Optional[str]
, default:None
) –An optional name for the index. If not specified, a name such as
'idx0'
will be generated automatically. If specified, the name must be unique for this table and a valid pixeltable column name. -
embedding
(Optional[Function]
, default:None
) –The UDF to use for the embedding. Must be a UDF that accepts a single argument of type
String
orImage
(as appropriate for the column being indexed) and returns a fixed-size 1-dimensional array of floats. -
string_embed
(Optional[Function]
, default:None
) –An optional UDF to use for the string embedding component of this index. Can be used in conjunction with
image_embed
to construct multimodal embeddings manually, by specifying different embedding functions for different data types. -
image_embed
(Optional[Function]
, default:None
) –An optional UDF to use for the image embedding component of this index. Can be used in conjunction with
string_embed
to construct multimodal embeddings manually, by specifying different embedding functions for different data types. -
metric
(str
, default:'cosine'
) –Distance metric to use for the index; one of
'cosine'
,'ip'
, or'l2'
. The default is'cosine'
. -
if_exists
(Literal['error', 'ignore', 'replace', 'replace_force']
, default:'error'
) –Directive for handling an existing index with the same name. Must be one of the following:
'error'
: raise an error if an index with the same name already exists.'ignore'
: do nothing if an index with the same name already exists.'replace'
or'replace_force'
: replace the existing index with the new one.
Raises:
-
Error
–If an index with the specified name already exists for the table and
if_exists='error'
, or if the specified column does not exist.
Examples:
Add an index to the img
column of the table my_table
:
>>> from pixeltable.functions.huggingface import clip
... tbl = pxt.get_table('my_table')
... embedding_fn = clip.using(model_id='openai/clip-vit-base-patch32')
... tbl.add_embedding_index(tbl.img, embedding=embedding_fn)
Alternatively, the img
column may be specified by name:
>>> tbl.add_embedding_index('img', embedding=embedding_fn)
Add a second index to the img
column, using the inner product as the distance metric,
and with a specific name:
>>> tbl.add_embedding_index(
... tbl.img,
... idx_name='ip_idx',
... embedding=embedding_fn,
... metric='ip'
... )
Add an index using separately specified string and image embeddings:
>>> tbl.add_embedding_index(
... tbl.img,
... string_embed=string_embedding_fn,
... image_embed=image_embedding_fn
... )
batch_update
batch_update(
rows: Iterable[dict[str, Any]],
cascade: bool = True,
if_not_exists: Literal["error", "ignore", "insert"] = "error",
) -> UpdateStatus
Update rows in this table.
Parameters:
-
rows
(Iterable[dict[str, Any]]
) –an Iterable of dictionaries containing values for the updated columns plus values for the primary key columns.
-
cascade
(bool
, default:True
) –if True, also update all computed columns that transitively depend on the updated columns.
-
if_not_exists
(Literal['error', 'ignore', 'insert']
, default:'error'
) –Specifies the behavior if a row to update does not exist:
'error'
: Raise an error.'ignore'
: Skip the row silently.'insert'
: Insert the row.
Examples:
Update the name
and age
columns for the rows with ids 1 and 2 (assuming id
is the primary key).
If either row does not exist, this raises an error:
>>> tbl.update([{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 2, 'name': 'Bob', 'age': 40}])
Update the name
and age
columns for the row with id
1 (assuming id
is the primary key) and insert
the row with new id
3 (assuming this key does not exist):
>>> tbl.update(
[{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 3, 'name': 'Bob', 'age': 40}],
if_not_exists='insert')
collect
collect() -> 'pxt.dataframe.DataFrameResultSet'
Return rows from this table.
count
count() -> int
Return the number of rows in this table.
delete
delete(where: Optional['pxt.exprs.Expr'] = None) -> UpdateStatus
Delete rows in this table.
Parameters:
-
where
(Optional['pxt.exprs.Expr']
, default:None
) –a predicate to filter rows to delete.
Examples:
Delete all rows in a table:
>>> tbl.delete()
Delete all rows in a table where column a
is greater than 5:
>>> tbl.delete(tbl.a > 5)
describe
describe() -> None
Print the table schema.
drop_column
drop_column(
column: Union[str, ColumnRef],
if_not_exists: Literal["error", "ignore"] = "error",
) -> None
Drop a column from the table.
Parameters:
-
column
(Union[str, ColumnRef]
) –The name or reference of the column to drop.
-
if_not_exists
(Literal['error', 'ignore']
, default:'error'
) –Directive for handling a non-existent column. Must be one of the following:
'error'
: raise an error if the column does not exist.'ignore'
: do nothing if the column does not exist.
Raises:
-
Error
–If the column does not exist and
if_exists='error'
, or if it is referenced by a dependent computed column.
Examples:
Drop the column col
from the table my_table
by column name:
>>> tbl = pxt.get_table('my_table')
... tbl.drop_column('col')
Drop the column col
from the table my_table
by column reference:
>>> tbl = pxt.get_table('my_table')
... tbl.drop_column(tbl.col)
Drop the column col
from the table my_table
if it exists, otherwise do nothing:
>>> tbl = pxt.get_table('my_table')
... tbl.drop_col(tbl.col, if_not_exists='ignore')
drop_embedding_index
drop_embedding_index(
*,
column: Union[str, ColumnRef, None] = None,
idx_name: Optional[str] = None,
if_not_exists: Literal["error", "ignore"] = "error"
) -> None
Drop an embedding index from the table. Either a column name or an index name (but not both) must be specified. If a column name or reference is specified, it must be a column containing exactly one embedding index; otherwise the specific index name must be provided instead.
Parameters:
-
column
(Union[str, ColumnRef, None]
, default:None
) –The name of, or reference to, the column from which to drop the index. The column must have only one embedding index.
-
idx_name
(Optional[str]
, default:None
) –The name of the index to drop.
-
if_not_exists
(Literal['error', 'ignore']
, default:'error'
) –Directive for handling a non-existent index. Must be one of the following:
'error'
: raise an error if the index does not exist.'ignore'
: do nothing if the index does not exist.
Note that
if_not_exists
parameter is only applicable when anidx_name
is specified and it does not exist, or whencolumn
is specified and it has no index.if_not_exists
does not apply to non-exisitng column.
Raises:
-
Error
–If
column
is specified, but the column does not exist, or it contains no embedding indices andif_not_exists='error'
, or the column has multiple embedding indices. -
Error
–If
idx_name
is specified, but the index is not an embedding index, or the index does not exist andif_not_exists='error'
.
Examples:
Drop the embedding index on the img
column of the table my_table
by column name:
>>> tbl = pxt.get_table('my_table')
... tbl.drop_embedding_index(column='img')
Drop the embedding index on the img
column of the table my_table
by column reference:
>>> tbl = pxt.get_table('my_table')
... tbl.drop_embedding_index(column=tbl.img)
Drop the embedding index idx1
of the table my_table
by index name:
>>> tbl = pxt.get_table('my_table')
... tbl.drop_embedding_index(idx_name='idx1')
Drop the embedding index idx1
of the table my_table
by index name, if it exists, otherwise do nothing:
>>> tbl = pxt.get_table('my_table')
... tbl.drop_embedding_index(idx_name='idx1', if_not_exists='ignore')
drop_index
drop_index(
*,
column: Union[str, ColumnRef, None] = None,
idx_name: Optional[str] = None,
if_not_exists: Literal["error", "ignore"] = "error"
) -> None
Drop an index from the table. Either a column name or an index name (but not both) must be specified. If a column name or reference is specified, it must be a column containing exactly one index; otherwise the specific index name must be provided instead.
Parameters:
-
column
(Union[str, ColumnRef, None]
, default:None
) –The name of, or reference to, the column from which to drop the index. The column must have only one embedding index.
-
idx_name
(Optional[str]
, default:None
) –The name of the index to drop.
-
if_not_exists
(Literal['error', 'ignore']
, default:'error'
) –Directive for handling a non-existent index. Must be one of the following:
'error'
: raise an error if the index does not exist.'ignore'
: do nothing if the index does not exist.
Note that
if_not_exists
parameter is only applicable when anidx_name
is specified and it does not exist, or whencolumn
is specified and it has no index.if_not_exists
does not apply to non-exisitng column.
Raises:
-
Error
–If
column
is specified, but the column does not exist, or it contains no indices or multiple indices. -
Error
–If
idx_name
is specified, but the index does not exist.
Examples:
Drop the index on the img
column of the table my_table
by column name:
>>> tbl = pxt.get_table('my_table')
... tbl.drop_index(column_name='img')
Drop the index on the img
column of the table my_table
by column reference:
>>> tbl = pxt.get_table('my_table')
... tbl.drop_index(tbl.img)
Drop the index idx1
of the table my_table
by index name:
>>> tbl = pxt.get_table('my_table')
... tbl.drop_index(idx_name='idx1')
Drop the index idx1
of the table my_table
by index name, if it exists, otherwise do nothing:
>>> tbl = pxt.get_table('my_table')
... tbl.drop_index(idx_name='idx1', if_not_exists='ignore')
get_metadata
get_metadata() -> dict[str, Any]
Retrieves metadata associated with this table.
Returns:
-
dict[str, Any]
–A dictionary containing the metadata, in the following format:
{ 'base': None, # If this is a view or snapshot, will contain the name of its base table 'schema': { 'col1': StringType(), 'col2': IntType(), }, 'version': 22, 'schema_version': 1, 'comment': '', 'num_retained_versions': 10, 'is_view': False, 'is_snapshot': False, 'media_validation': 'on_write', }
group_by
group_by(*items: 'exprs.Expr') -> 'pxt.DataFrame'
Group the rows of this table based on the expression.
See DataFrame.group_by
for more details.
head
head(*args, **kwargs) -> 'pxt.dataframe.DataFrameResultSet'
Return the first n rows inserted into this table.
insert
abstractmethod
insert(
source: Optional[TableDataSource] = None,
/,
*,
source_format: Optional[Literal["csv", "excel", "parquet", "json"]] = None,
schema_overrides: Optional[dict[str, ColumnType]] = None,
on_error: Literal["abort", "ignore"] = "abort",
print_stats: bool = False,
**kwargs: Any,
) -> UpdateStatus
Inserts rows into this table. There are two mutually exclusive call patterns:
To insert multiple rows at a time:
python
insert(
source: TableSourceDataType,
/,
*,
on_error: Literal['abort', 'ignore'] = 'abort',
print_stats: bool = False,
**kwargs: Any,
)
To insert just a single row, you can use the more concise syntax:
python
insert(
*,
on_error: Literal['abort', 'ignore'] = 'abort',
print_stats: bool = False,
**kwargs: Any
)
Parameters:
-
source
(Optional[TableDataSource]
, default:None
) –A data source from which data can be imported.
-
kwargs
(Any
, default:{}
) –(if inserting a single row) Keyword-argument pairs representing column names and values. (if inserting multiple rows) Additional keyword arguments are passed to the data source.
-
source_format
(Optional[Literal['csv', 'excel', 'parquet', 'json']]
, default:None
) –A hint about the format of the source data
-
schema_overrides
(Optional[dict[str, ColumnType]]
, default:None
) –If specified, then columns in
schema_overrides
will be given the specified types -
on_error
(Literal['abort', 'ignore']
, default:'abort'
) –Determines the behavior if an error occurs while evaluating a computed column or detecting an invalid media file (such as a corrupt image) for one of the inserted rows.
- If
on_error='abort'
, then an exception will be raised and the rows will not be inserted. - If
on_error='ignore'
, then execution will continue and the rows will be inserted. Any cells with errors will have aNone
value for that cell, with information about the error stored in the correspondingtbl.col_name.errortype
andtbl.col_name.errormsg
fields.
- If
-
print_stats
(bool
, default:False
) –If
True
, print statistics about the cost of computed columns.
Returns:
-
UpdateStatus
–An
UpdateStatus
object containing information about the update.
Raises:
-
Error
–If one of the following conditions occurs:
- The table is a view or snapshot.
- The table has been dropped.
- One of the rows being inserted does not conform to the table schema.
- An error occurs during processing of computed columns, and
on_error='ignore'
. - An error occurs while importing data from a source, and
on_error='abort'
.
Examples:
Insert two rows into the table my_table
with three int columns a
, b
, and c
.
Column c
is nullable:
>>> tbl = pxt.get_table('my_table')
... tbl.insert([{'a': 1, 'b': 1, 'c': 1}, {'a': 2, 'b': 2}])
Insert a single row using the alternative syntax:
>>> tbl.insert(a=3, b=3, c=3)
Insert rows from a CSV file:
>>> tbl.insert(source='path/to/file.csv')
join
join(
other: "Table",
*,
on: Optional["exprs.Expr"] = None,
how: "pixeltable.plan.JoinType.LiteralType" = "inner"
) -> "pxt.DataFrame"
Join this table with another table.
list_views
list_views(*, recursive: bool = True) -> list[str]
Returns a list of all views and snapshots of this Table
.
Parameters:
-
recursive
(bool
, default:True
) –If
False
, returns only the immediate successor views of thisTable
. IfTrue
, returns all sub-views (including views of views, etc.)
Returns:
-
list[str]
–A list of view paths.
order_by
order_by(*items: 'exprs.Expr', asc: bool = True) -> 'pxt.DataFrame'
Order the rows of this table based on the expression.
See DataFrame.order_by
for more details.
rename_column
rename_column(old_name: str, new_name: str) -> None
Rename a column.
Parameters:
-
old_name
(str
) –The current name of the column.
-
new_name
(str
) –The new name of the column.
Raises:
-
Error
–If the column does not exist, or if the new name is invalid or already exists.
Examples:
Rename the column col1
to col2
of the table my_table
:
>>> tbl = pxt.get_table('my_table')
... tbl.rename_column('col1', 'col2')
revert
revert() -> None
Reverts the table to the previous version.
.. warning:: This operation is irreversible.
select
select(*items: Any, **named_items: Any) -> 'pxt.DataFrame'
Select columns or expressions from this table.
See DataFrame.select
for more details.
show
show(*args, **kwargs) -> 'pxt.dataframe.DataFrameResultSet'
Return rows from this table.
sync
sync(
stores: Optional[str | list[str]] = None,
*,
export_data: bool = True,
import_data: bool = True
) -> "pxt.io.SyncStatus"
Synchronizes this table with its linked external stores.
Parameters:
-
stores
(Optional[str | list[str]]
, default:None
) –If specified, will synchronize only the specified named store or list of stores. If not specified, will synchronize all of this table's external stores.
-
export_data
(bool
, default:True
) –If
True
, data from this table will be exported to the external stores during synchronization. -
import_data
(bool
, default:True
) –If
True
, data from the external stores will be imported to this table during synchronization.
tail
tail(*args, **kwargs) -> 'pxt.dataframe.DataFrameResultSet'
Return the last n rows inserted into this table.
to_coco_dataset
to_coco_dataset() -> Path
Return the path to a COCO json file for this table. See DataFrame.to_coco_dataset()
to_pytorch_dataset
to_pytorch_dataset(
image_format: str = "pt",
) -> "torch.utils.data.IterableDataset"
Return a PyTorch Dataset for this table. See DataFrame.to_pytorch_dataset()
unlink_external_stores
unlink_external_stores(
stores: Optional[str | list[str]] = None,
*,
delete_external_data: bool = False,
ignore_errors: bool = False
) -> None
Unlinks this table's external stores.
Parameters:
-
stores
(Optional[str | list[str]]
, default:None
) –If specified, will unlink only the specified named store or list of stores. If not specified, will unlink all of this table's external stores.
-
ignore_errors
(bool
, default:False
) –If
True
, no exception will be thrown if a specified store is not linked to this table. -
delete_external_data
(bool
, default:False
) –If
True
, then the external data store will also be deleted. WARNING: This is a destructive operation that will delete data outside Pixeltable, and cannot be undone.
update
update(
value_spec: dict[str, Any],
where: Optional["pxt.exprs.Expr"] = None,
cascade: bool = True,
) -> UpdateStatus
Update rows in this table.
Parameters:
-
value_spec
(dict[str, Any]
) –a dictionary mapping column names to literal values or Pixeltable expressions.
-
where
(Optional['pxt.exprs.Expr']
, default:None
) –a predicate to filter rows to update.
-
cascade
(bool
, default:True
) –if True, also update all computed columns that transitively depend on the updated columns.
Examples:
Set column int_col
to 1 for all rows:
>>> tbl.update({'int_col': 1})
Set column int_col
to 1 for all rows where int_col
is 0:
>>> tbl.update({'int_col': 1}, where=tbl.int_col == 0)
Set int_col
to the value of other_int_col
+ 1:
>>> tbl.update({'int_col': tbl.other_int_col + 1})
Increment int_col
by 1 for all rows where int_col
is 0:
>>> tbl.update({'int_col': tbl.int_col + 1}, where=tbl.int_col == 0)
validate_column_name
classmethod
validate_column_name(name: str) -> None
Check that a name is usable as a pixeltalbe column name
where
where(pred: 'exprs.Expr') -> 'pxt.DataFrame'
Filter rows from this table based on the expression.
See DataFrame.where
for more details.