Table
Instances of class Table
are handles to Pixeltable tables and views/snapshots.
Use this handle to query and update the table and to add and drop columns.
Tables are created by calling pxt.create_table
.
Views and snapshots are created by calling pxt.create_view
(snapshots require is_snapshot=True
).
To get a handle to an existing table/view/snapshot, call pxt.get_table
.
Overview
Column Operations | |
---|---|
add_column |
Add a column to the table or view |
drop_column |
Remove a column from the table or view |
rename_column |
Rename a column |
Data Operations | |
---|---|
insert |
Insert rows into table |
update |
Update rows in table or view |
delete |
Delete rows from table |
Indexing Operations | |
---|---|
add_embedding_index |
Add embedding index on column |
drop_embedding_index |
Drop embedding index from column |
drop_index |
Drop index from column |
Versioning | |
---|---|
revert |
Revert the last change |
pixeltable.Table
Table(id: UUID, dir_id: UUID, name: str, tbl_version_path: TableVersionPath)
Base class for all tabular SchemaObjects.
base
property
base: Optional['Table']
The base table of this Table
. If this table is a view, returns the Table
from which it was derived. Otherwise, returns None
.
__getattr__
__getattr__(
name: str,
) -> Union[
"pixeltable.exprs.ColumnRef", "pixeltable.func.QueryTemplateFunction"
]
Return a ColumnRef or QueryTemplateFunction for the given name.
__getitem__
__getitem__(
index: object,
) -> Union[
"pixeltable.func.QueryTemplateFunction",
"pixeltable.exprs.ColumnRef",
"pixeltable.dataframe.DataFrame",
]
Return a ColumnRef or QueryTemplateFunction for the given name, or a DataFrame for the given slice.
__setitem__
__setitem__(
column_name: str, value: Union[ColumnType, Expr, Callable, dict]
) -> None
Adds a column to the table
Args:
column_name: the name of the new column
value: column type or value expression or column specification dictionary:
column type: a Pixeltable column type (if the table already contains rows, it must be nullable)
value expression: a Pixeltable expression that computes the column values
column specification: a dictionary with possible keys 'type', 'value', 'stored'
Examples:
Add an int column with None
values:
>>> tbl['new_col'] = IntType(nullable=True)
For a table with int column ``int_col``, add a column that is the factorial of ``int_col``. The names of
the parameters of the Callable must correspond to existing column names (the column values are then passed
as arguments to the Callable). In this case, the return type cannot be inferred and needs to be specified
explicitly:
>>> tbl['factorial'] = {'value': lambda int_col: math.factorial(int_col), 'type': IntType()}
For a table with an image column ``frame``, add an image column ``rotated`` that rotates the image by
90 degrees. In this case, the column type is inferred from the expression. Also, the column is not stored
(by default, computed image columns are not stored but recomputed on demand):
>>> tbl['rotated'] = tbl.frame.rotate(90)
Do the same, but now the column is stored:
>>> tbl['rotated'] = {'value': tbl.frame.rotate(90), 'stored': True}
add_column
add_column(
*,
type: Optional[ColumnType] = None,
stored: Optional[bool] = None,
print_stats: bool = False,
**kwargs: Any
) -> UpdateStatus
Adds a column to the table.
Parameters:
-
kwargs
(Any
, default:{}
) –Exactly one keyword argument of the form
column-name=type|value-expression
. -
type
(Optional[ColumnType]
, default:None
) –The type of the column. Only valid and required if
value-expression
is a Callable. -
stored
(Optional[bool]
, default:None
) –Whether the column is materialized and stored or computed on demand. Only valid for image columns.
-
print_stats
(bool
, default:False
) –If
True
, print execution metrics.
Returns:
-
UpdateStatus
–execution status
Raises:
-
Error
–If the column name is invalid or already exists.
Examples:
Add an int column with None
values:
>>> tbl.add_column(new_col=IntType())
Alternatively, this can also be expressed as:
>>> tbl['new_col'] = IntType()
For a table with int column int_col
, add a column that is the factorial of int_col
. The names of
the parameters of the Callable must correspond to existing column names (the column values are then passed
as arguments to the Callable). In this case, the column type needs to be specified explicitly:
>>> tbl.add_column(factorial=lambda int_col: math.factorial(int_col), type=IntType())
Alternatively, this can also be expressed as:
>>> tbl['factorial'] = {'value': lambda int_col: math.factorial(int_col), 'type': IntType()}
For a table with an image column frame
, add an image column rotated
that rotates the image by
90 degrees. In this case, the column type is inferred from the expression. Also, the column is not stored
(by default, computed image columns are not stored but recomputed on demand):
>>> tbl.add_column(rotated=tbl.frame.rotate(90))
Alternatively, this can also be expressed as:
>>> tbl['rotated'] = tbl.frame.rotate(90)
Do the same, but now the column is stored:
>>> tbl.add_column(rotated=tbl.frame.rotate(90), stored=True)
Alternatively, this can also be expressed as:
>>> tbl['rotated'] = {'value': tbl.frame.rotate(90), 'stored': True}
add_embedding_index
add_embedding_index(
col_name: str,
*,
idx_name: Optional[str] = None,
text_embed: Optional[Function] = None,
img_embed: Optional[Function] = None,
metric: str = "cosine"
) -> None
Add an index to the table. Args: col_name: name of column to index idx_name: name of index, which needs to be unique for the table; if not provided, a name will be generated text_embed: function to embed text; required if the column is a text column img_embed: function to embed images; required if the column is an image column metric: distance metric to use for the index; one of 'cosine', 'ip', 'l2'; default is 'cosine'
Raises:
-
Error
–If an index with that name already exists for the table or if the column does not exist.
Examples:
Add an index to the img
column:
>>> tbl.add_embedding_index('img', img_embed=...)
Add another index to the img
column, using the inner product as the distance metric,
and with a specific name; text_embed
is also specified in order to search with text:
>>> tbl.add_embedding_index(
'img', idx_name='clip_idx', img_embed=..., text_embed=...text_embed..., metric='ip')
batch_update
batch_update(
rows: Iterable[dict[str, Any]], cascade: bool = True
) -> UpdateStatus
Update rows in this table.
Parameters:
-
rows
(Iterable[dict[str, Any]]
) –an Iterable of dictionaries containing values for the updated columns plus values for the primary key columns.
-
cascade
(bool
, default:True
) –if True, also update all computed columns that transitively depend on the updated columns.
Examples:
Update the 'name' and 'age' columns for the rows with ids 1 and 2 (assuming 'id' is the primary key):
>>> tbl.update([{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 2, 'name': 'Bob', 'age': 40}])
collect
collect() -> 'pixeltable.dataframe.DataFrameResultSet'
Return rows from this table.
column_names
column_names() -> list[str]
Return the names of the columns in this table.
column_types
column_types() -> dict[str, ColumnType]
Return the names of the columns in this table.
count
count() -> int
Return the number of rows in this table.
delete
abstractmethod
delete(where: Optional['pixeltable.exprs.Predicate'] = None) -> UpdateStatus
Delete rows in this table.
Parameters:
-
where
(Optional['pixeltable.exprs.Predicate']
, default:None
) –a Predicate to filter rows to delete.
Examples:
Delete all rows in a table:
>>> tbl.delete()
Delete all rows in a table where column a
is greater than 5:
>>> tbl.delete(tbl.a > 5)
describe
describe() -> None
Print the table schema.
df
df() -> 'pixeltable.dataframe.DataFrame'
Return a DataFrame for this table.
display_name
abstractmethod
classmethod
display_name() -> str
Return name displayed in error messages.
drop_column
drop_column(name: str) -> None
Drop a column from the table.
Parameters:
-
name
(str
) –The name of the column to drop.
Raises:
-
Error
–If the column does not exist or if it is referenced by a computed column.
Examples:
Drop column factorial
:
>>> tbl.drop_column('factorial')
drop_embedding_index
drop_embedding_index(
*, column_name: Optional[str] = None, idx_name: Optional[str] = None
) -> None
Drop an embedding index from the table.
Parameters:
-
column_name
(Optional[str]
, default:None
) –The name of the column whose embedding index to drop. Invalid if the column has multiple embedding indices.
-
idx_name
(Optional[str]
, default:None
) –The name of the index to drop.
Raises:
-
Error
–If the index does not exist.
Examples:
Drop embedding index on the img
column:
>>> tbl.drop_embedding_index(column_name='img')
drop_index
drop_index(
*, column_name: Optional[str] = None, idx_name: Optional[str] = None
) -> None
Drop an index from the table.
Parameters:
-
column_name
(Optional[str]
, default:None
) –The name of the column whose index to drop. Invalid if the column has multiple indices.
-
idx_name
(Optional[str]
, default:None
) –The name of the index to drop.
Raises:
-
Error
–If the index does not exist.
Examples:
Drop index on the img
column:
>>> tbl.drop_index(column_name='img')
get_views
get_views(*, recursive: bool = False) -> list['Table']
All views and snapshots of this Table
.
group_by
group_by(*items: 'exprs.Expr') -> 'pixeltable.dataframe.DataFrame'
Return a DataFrame for this table.
head
head(*args, **kwargs) -> 'pixeltable.dataframe.DataFrameResultSet'
Return the first n rows inserted into this table.
insert
abstractmethod
insert(
rows: Optional[Iterable[dict[str, Any]]] = None,
/,
*,
print_stats: bool = False,
fail_on_exception: bool = True,
**kwargs: Any,
) -> UpdateStatus
Inserts rows into this table. There are two mutually exclusive call patterns:
To insert multiple rows at a time:
insert(rows: Iterable[dict[str, Any]], /, *, print_stats: bool = False, fail_on_exception: bool = True)
To insert just a single row, you can use the more convenient syntax:
insert(*, print_stats: bool = False, fail_on_exception: bool = True, **kwargs: Any)
Parameters:
-
rows
(Optional[Iterable[dict[str, Any]]]
, default:None
) –(if inserting multiple rows) A list of rows to insert, each of which is a dictionary mapping column names to values.
-
kwargs
(Any
, default:{}
) –(if inserting a single row) Keyword-argument pairs representing column names and values.
-
print_stats
(bool
, default:False
) –If
True
, print statistics about the cost of computed columns. -
fail_on_exception
(bool
, default:True
) –Determines how exceptions in computed columns and invalid media files (e.g., corrupt images) are handled. If
False
, store error information (accessible as column properties 'errortype' and 'errormsg') for those cases, but continue inserting rows. IfTrue
, raise an exception that aborts the insert.
Returns:
-
UpdateStatus
–execution status
Raises:
-
Error
–if a row does not match the table schema or contains values for computed columns
Examples:
Insert two rows into a table with three int columns a
, b
, and c
. Column c
is nullable.
>>> tbl.insert([{'a': 1, 'b': 1, 'c': 1}, {'a': 2, 'b': 2}])
Insert a single row into a table with three int columns a
, b
, and c
.
>>> tbl.insert(a=1, b=1, c=1)
order_by
order_by(
*items: "exprs.Expr", asc: bool = True
) -> "pixeltable.dataframe.DataFrame"
Return a DataFrame for this table.
query_names
query_names() -> list[str]
Return the names of the registered queries for this table.
rename_column
rename_column(old_name: str, new_name: str) -> None
Rename a column.
Parameters:
-
old_name
(str
) –The current name of the column.
-
new_name
(str
) –The new name of the column.
Raises:
-
Error
–If the column does not exist or if the new name is invalid or already exists.
Examples:
Rename column factorial
to fac
:
>>> tbl.rename_column('factorial', 'fac')
revert
revert() -> None
Reverts the table to the previous version.
.. warning:: This operation is irreversible.
select
select(*items: Any, **named_items: Any) -> 'pixeltable.dataframe.DataFrame'
Return a DataFrame for this table.
show
show(*args, **kwargs) -> 'pixeltable.dataframe.DataFrameResultSet'
Return rows from this table.
sync
sync(
stores: Optional[str | list[str]] = None,
*,
export_data: bool = True,
import_data: bool = True
) -> "pixeltable.io.SyncStatus"
Synchronizes this table with its linked external stores.
Parameters:
-
stores
(Optional[str | list[str]]
, default:None
) –If specified, will synchronize only the specified named store or list of stores. If not specified, will synchronize all of this table's external stores.
-
export_data
(bool
, default:True
) –If
True
, data from this table will be exported to the external stores during synchronization. -
import_data
(bool
, default:True
) –If
True
, data from the external stores will be imported to this table during synchronization.
tail
tail(*args, **kwargs) -> 'pixeltable.dataframe.DataFrameResultSet'
Return the last n rows inserted into this table.
to_coco_dataset
to_coco_dataset() -> Path
Return the path to a COCO json file for this table. See DataFrame.to_coco_dataset()
to_pytorch_dataset
to_pytorch_dataset(
image_format: str = "pt",
) -> "torch.utils.data.IterableDataset"
Return a PyTorch Dataset for this table. See DataFrame.to_pytorch_dataset()
unlink_external_stores
unlink_external_stores(
stores: Optional[str | list[str]] = None,
*,
delete_external_data: bool = False,
ignore_errors: bool = False
) -> None
Unlinks this table's external stores.
Parameters:
-
stores
(Optional[str | list[str]]
, default:None
) –If specified, will unlink only the specified named store or list of stores. If not specified, will unlink all of this table's external stores.
-
ignore_errors
(bool
, default:False
) –If
True
, no exception will be thrown if a specified store is not linked to this table. -
delete_external_data
(bool
, default:False
) –If
True
, then the external data store will also be deleted. WARNING: This is a destructive operation that will delete data outside Pixeltable, and cannot be undone.
update
update(
value_spec: dict[str, Any],
where: Optional["pixeltable.exprs.Predicate"] = None,
cascade: bool = True,
) -> UpdateStatus
Update rows in this table.
Parameters:
-
value_spec
(dict[str, Any]
) –a dictionary mapping column names to literal values or Pixeltable expressions.
-
where
(Optional['pixeltable.exprs.Predicate']
, default:None
) –a Predicate to filter rows to update.
-
cascade
(bool
, default:True
) –if True, also update all computed columns that transitively depend on the updated columns.
Examples:
Set column int_col
to 1 for all rows:
>>> tbl.update({'int_col': 1})
Set column int_col
to 1 for all rows where int_col
is 0:
>>> tbl.update({'int_col': 1}, where=tbl.int_col == 0)
Set int_col
to the value of other_int_col
+ 1:
>>> tbl.update({'int_col': tbl.other_int_col + 1})
Increment int_col
by 1 for all rows where int_col
is 0:
>>> tbl.update({'int_col': tbl.int_col + 1}, where=tbl.int_col == 0)
version
version() -> int
Return the version of this table. Used by tests to ascertain version changes.
where
where(pred: 'exprs.Predicate') -> 'pixeltable.dataframe.DataFrame'
Return a DataFrame for this table.