Table
Instances of class Table
are handles to Pixeltable tables and views/snapshots.
Use this handle to query and update the table and to add and drop columns.
Tables are created by calling pxt.create_table
.
Views and snapshots are created by calling pxt.create_view
(snapshots require is_snapshot=True
).
To get a handle to an existing table/view/snapshot, call pxt.get_table
.
Overview
Column Operations | |
---|---|
add_column |
Add a column to the table or view |
drop_column |
Remove a column from the table or view |
rename_column |
Rename a column |
Data Operations | |
---|---|
insert |
Insert rows into table |
update |
Update rows in table or view |
delete |
Delete rows from table |
Indexing Operations | |
---|---|
add_embedding_index |
Add embedding index on column |
drop_embedding_index |
Drop embedding index from column |
drop_index |
Drop index from column |
Versioning | |
---|---|
revert |
Revert the last change |
pixeltable.Table
Table(id: UUID, dir_id: UUID, name: str, tbl_version_path: TableVersionPath)
Base class for table objects (base tables, views, snapshots).
add_column
add_column(
*,
type: Optional[ColumnType] = None,
stored: Optional[bool] = None,
print_stats: bool = False,
**kwargs: Union[ColumnType, Expr, Callable]
) -> UpdateStatus
Adds a column to the table.
Parameters:
-
kwargs
(Union[ColumnType, Expr, Callable]
, default:{}
) –Exactly one keyword argument of the form
column-name=type|value-expression
. -
type
(Optional[ColumnType]
, default:None
) –The type of the column. Only valid and required if
value-expression
is a Callable. -
stored
(Optional[bool]
, default:None
) –Whether the column is materialized and stored or computed on demand. Only valid for image columns.
-
print_stats
(bool
, default:False
) –If
True
, print execution metrics.
Returns:
-
UpdateStatus
–execution status
Raises:
-
Error
–If the column name is invalid or already exists.
Examples:
Add an int column with None
values:
>>> tbl.add_column(new_col=IntType())
Alternatively, this can also be expressed as:
>>> tbl['new_col'] = IntType()
For a table with int column int_col
, add a column that is the factorial of int_col
. The names of
the parameters of the Callable must correspond to existing column names (the column values are then passed
as arguments to the Callable). In this case, the column type needs to be specified explicitly:
>>> tbl.add_column(factorial=lambda int_col: math.factorial(int_col), type=IntType())
Alternatively, this can also be expressed as:
>>> tbl['factorial'] = {'value': lambda int_col: math.factorial(int_col), 'type': IntType()}
For a table with an image column frame
, add an image column rotated
that rotates the image by
90 degrees. In this case, the column type is inferred from the expression. Also, the column is not stored
(by default, computed image columns are not stored but recomputed on demand):
>>> tbl.add_column(rotated=tbl.frame.rotate(90))
Alternatively, this can also be expressed as:
>>> tbl['rotated'] = tbl.frame.rotate(90)
Do the same, but now the column is stored:
>>> tbl.add_column(rotated=tbl.frame.rotate(90), stored=True)
Alternatively, this can also be expressed as:
>>> tbl['rotated'] = {'value': tbl.frame.rotate(90), 'stored': True}
add_embedding_index
add_embedding_index(
col_name: str,
*,
idx_name: Optional[str] = None,
string_embed: Optional[Function] = None,
image_embed: Optional[Function] = None,
metric: str = "cosine"
) -> None
Add an index to the table.
Parameters:
-
col_name
(str
) –name of column to index
-
idx_name
(Optional[str]
, default:None
) –name of index, which needs to be unique for the table; if not provided, a name will be generated
-
string_embed
(Optional[Function]
, default:None
) –function to embed text; required if the column is a text column
-
image_embed
(Optional[Function]
, default:None
) –function to embed images; required if the column is an image column
-
metric
(str
, default:'cosine'
) –distance metric to use for the index; one of 'cosine', 'ip', 'l2'; default is 'cosine'
Raises:
-
Error
–If an index with that name already exists for the table or if the column does not exist.
Examples:
Add an index to the img
column:
>>> tbl.add_embedding_index('img', image_embed=...)
Add another index to the img
column, using the inner product as the distance metric,
and with a specific name; string_embed
is also specified in order to search with text:
>>> tbl.add_embedding_index(
'img', idx_name='clip_idx', image_embed=..., string_embed=..., metric='ip')
batch_update
batch_update(
rows: Iterable[dict[str, Any]],
cascade: bool = True,
if_not_exists: Literal["error", "ignore", "insert"] = "error",
) -> UpdateStatus
Update rows in this table.
Parameters:
-
rows
(Iterable[dict[str, Any]]
) –an Iterable of dictionaries containing values for the updated columns plus values for the primary key columns.
-
cascade
(bool
, default:True
) –if True, also update all computed columns that transitively depend on the updated columns.
-
if_not_exists
(Literal['error', 'ignore', 'insert']
, default:'error'
) –Specifies the behavior if a row to update does not exist:
'error'
: Raise an error.'ignore'
: Skip the row silently.'insert'
: Insert the row.
Examples:
Update the name
and age
columns for the rows with ids 1 and 2 (assuming id
is the primary key).
If either row does not exist, this raises an error:
>>> tbl.update([{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 2, 'name': 'Bob', 'age': 40}])
Update the name
and age
columns for the row with id
1 (assuming id
is the primary key) and insert
the row with new id
3 (assuming this key does not exist):
>>> tbl.update(
[{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 3, 'name': 'Bob', 'age': 40}],
if_not_exists='insert')
collect
collect() -> 'pixeltable.dataframe.DataFrameResultSet'
Return rows from this table.
count
count() -> int
Return the number of rows in this table.
delete
delete(where: Optional['pixeltable.exprs.Expr'] = None) -> UpdateStatus
Delete rows in this table.
Parameters:
-
where
(Optional['pixeltable.exprs.Expr']
, default:None
) –a predicate to filter rows to delete.
Examples:
Delete all rows in a table:
>>> tbl.delete()
Delete all rows in a table where column a
is greater than 5:
>>> tbl.delete(tbl.a > 5)
describe
describe() -> None
Print the table schema.
drop_column
drop_column(name: str) -> None
Drop a column from the table.
Parameters:
-
name
(str
) –The name of the column to drop.
Raises:
-
Error
–If the column does not exist or if it is referenced by a computed column.
Examples:
Drop column factorial
:
>>> tbl.drop_column('factorial')
drop_embedding_index
drop_embedding_index(
*, column_name: Optional[str] = None, idx_name: Optional[str] = None
) -> None
Drop an embedding index from the table.
Parameters:
-
column_name
(Optional[str]
, default:None
) –The name of the column whose embedding index to drop. Invalid if the column has multiple embedding indices.
-
idx_name
(Optional[str]
, default:None
) –The name of the index to drop.
Raises:
-
Error
–If the index does not exist.
Examples:
Drop embedding index on the img
column:
>>> tbl.drop_embedding_index(column_name='img')
drop_index
drop_index(
*, column_name: Optional[str] = None, idx_name: Optional[str] = None
) -> None
Drop an index from the table.
Parameters:
-
column_name
(Optional[str]
, default:None
) –The name of the column whose index to drop. Invalid if the column has multiple indices.
-
idx_name
(Optional[str]
, default:None
) –The name of the index to drop.
Raises:
-
Error
–If the index does not exist.
Examples:
Drop index on the img
column:
>>> tbl.drop_index(column_name='img')
group_by
group_by(*items: 'exprs.Expr') -> 'pixeltable.DataFrame'
Return a DataFrame
for this table.
head
head(*args, **kwargs) -> 'pixeltable.dataframe.DataFrameResultSet'
Return the first n rows inserted into this table.
insert
abstractmethod
insert(
rows: Optional[Iterable[dict[str, Any]]] = None,
/,
*,
print_stats: bool = False,
fail_on_exception: bool = True,
**kwargs: Any,
) -> UpdateStatus
Inserts rows into this table. There are two mutually exclusive call patterns:
To insert multiple rows at a time:
insert(rows: Iterable[dict[str, Any]], /, *, print_stats: bool = False, fail_on_exception: bool = True)
To insert just a single row, you can use the more convenient syntax:
insert(*, print_stats: bool = False, fail_on_exception: bool = True, **kwargs: Any)
Parameters:
-
rows
(Optional[Iterable[dict[str, Any]]]
, default:None
) –(if inserting multiple rows) A list of rows to insert, each of which is a dictionary mapping column names to values.
-
kwargs
(Any
, default:{}
) –(if inserting a single row) Keyword-argument pairs representing column names and values.
-
print_stats
(bool
, default:False
) –If
True
, print statistics about the cost of computed columns. -
fail_on_exception
(bool
, default:True
) –Determines how exceptions in computed columns and invalid media files (e.g., corrupt images) are handled. If
False
, store error information (accessible as column properties 'errortype' and 'errormsg') for those cases, but continue inserting rows. IfTrue
, raise an exception that aborts the insert.
Returns:
-
UpdateStatus
–execution status
Raises:
-
Error
–if a row does not match the table schema or contains values for computed columns
Examples:
Insert two rows into a table with three int columns a
, b
, and c
. Column c
is nullable.
>>> tbl.insert([{'a': 1, 'b': 1, 'c': 1}, {'a': 2, 'b': 2}])
Insert a single row into a table with three int columns a
, b
, and c
.
>>> tbl.insert(a=1, b=1, c=1)
list_views
list_views(*, recursive: bool = True) -> list[str]
Returns a list of all views and snapshots of this Table
.
Parameters:
-
recursive
(bool
, default:True
) –If
False
, returns only the immediate successor views of thisTable
. IfTrue
, returns all sub-views (including views of views, etc.)
order_by
order_by(*items: 'exprs.Expr', asc: bool = True) -> 'pixeltable.DataFrame'
Return a DataFrame
for this table.
rename_column
rename_column(old_name: str, new_name: str) -> None
Rename a column.
Parameters:
-
old_name
(str
) –The current name of the column.
-
new_name
(str
) –The new name of the column.
Raises:
-
Error
–If the column does not exist or if the new name is invalid or already exists.
Examples:
Rename column factorial
to fac
:
>>> tbl.rename_column('factorial', 'fac')
revert
revert() -> None
Reverts the table to the previous version.
.. warning:: This operation is irreversible.
select
select(*items: Any, **named_items: Any) -> 'pixeltable.DataFrame'
Return a DataFrame
for this table.
show
show(*args, **kwargs) -> 'pixeltable.dataframe.DataFrameResultSet'
Return rows from this table.
sync
sync(
stores: Optional[str | list[str]] = None,
*,
export_data: bool = True,
import_data: bool = True
) -> "pixeltable.io.SyncStatus"
Synchronizes this table with its linked external stores.
Parameters:
-
stores
(Optional[str | list[str]]
, default:None
) –If specified, will synchronize only the specified named store or list of stores. If not specified, will synchronize all of this table's external stores.
-
export_data
(bool
, default:True
) –If
True
, data from this table will be exported to the external stores during synchronization. -
import_data
(bool
, default:True
) –If
True
, data from the external stores will be imported to this table during synchronization.
tail
tail(*args, **kwargs) -> 'pixeltable.dataframe.DataFrameResultSet'
Return the last n rows inserted into this table.
to_coco_dataset
to_coco_dataset() -> Path
Return the path to a COCO json file for this table. See DataFrame.to_coco_dataset()
to_pytorch_dataset
to_pytorch_dataset(
image_format: str = "pt",
) -> "torch.utils.data.IterableDataset"
Return a PyTorch Dataset for this table. See DataFrame.to_pytorch_dataset()
unlink_external_stores
unlink_external_stores(
stores: Optional[str | list[str]] = None,
*,
delete_external_data: bool = False,
ignore_errors: bool = False
) -> None
Unlinks this table's external stores.
Parameters:
-
stores
(Optional[str | list[str]]
, default:None
) –If specified, will unlink only the specified named store or list of stores. If not specified, will unlink all of this table's external stores.
-
ignore_errors
(bool
, default:False
) –If
True
, no exception will be thrown if a specified store is not linked to this table. -
delete_external_data
(bool
, default:False
) –If
True
, then the external data store will also be deleted. WARNING: This is a destructive operation that will delete data outside Pixeltable, and cannot be undone.
update
update(
value_spec: dict[str, Any],
where: Optional["pixeltable.exprs.Expr"] = None,
cascade: bool = True,
) -> UpdateStatus
Update rows in this table.
Parameters:
-
value_spec
(dict[str, Any]
) –a dictionary mapping column names to literal values or Pixeltable expressions.
-
where
(Optional['pixeltable.exprs.Expr']
, default:None
) –a predicate to filter rows to update.
-
cascade
(bool
, default:True
) –if True, also update all computed columns that transitively depend on the updated columns.
Examples:
Set column int_col
to 1 for all rows:
>>> tbl.update({'int_col': 1})
Set column int_col
to 1 for all rows where int_col
is 0:
>>> tbl.update({'int_col': 1}, where=tbl.int_col == 0)
Set int_col
to the value of other_int_col
+ 1:
>>> tbl.update({'int_col': tbl.other_int_col + 1})
Increment int_col
by 1 for all rows where int_col
is 0:
>>> tbl.update({'int_col': tbl.int_col + 1}, where=tbl.int_col == 0)