Pixeltable
Import conventions:
import pixeltable as pxt
Insertable tables, views, and snapshots all have a tabular interface and are generically referred to as "tables" below.
Overview
Table Operations | |
---|---|
pxt.create_table |
Create a new (insertable) table |
pxt.create_view |
Create a new view |
pxt.create_snapshot |
Create a new snapshot |
pxt.drop_table |
Delete a table |
pxt.get_table |
Get a handle to a table |
pxt.list_tables |
List the tables in a directory |
Directory Operations | |
---|---|
pxt.create_dir |
Create a directory |
pxt.list_dirs |
List the directories in a directory |
pxt.drop_dir |
Remove a directory |
Misc | |
---|---|
pxt.configure_logging |
Configure logging |
pxt.init |
Initialize Pixeltable runtime now (if not already initialized) |
pxt.move |
Move a schema object to a new directory and/or rename a schema object |
pixeltable
configure_logging
configure_logging(
*,
to_stdout: Optional[bool] = None,
level: Optional[int] = None,
add: Optional[str] = None,
remove: Optional[str] = None
) -> None
Configure logging.
Parameters:
-
to_stdout
(Optional[bool]
, default:None
) –if True, also log to stdout
-
level
(Optional[int]
, default:None
) –default log level
-
add
(Optional[str]
, default:None
) –comma-separated list of 'module name:log level' pairs; ex.: add='video:10'
-
remove
(Optional[str]
, default:None
) –comma-separated list of module names
create_dir
create_dir(
path: str,
if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error",
parents: bool = False,
) -> Optional[Dir]
Create a directory.
Parameters:
-
path
(str
) –Path to the directory.
-
if_exists
(Literal['error', 'ignore', 'replace', 'replace_force']
, default:'error'
) –Directive regarding how to handle if the path already exists. Must be one of the following:
'error'
: raise an error'ignore'
: do nothing and return the existing directory handle'replace'
: if the existing directory is empty, drop it and create a new one'replace_force'
: drop the existing directory and all its children, and create a new one
-
parents
(bool
, default:False
) –Create missing parent directories.
Returns:
-
Optional[Dir]
–A handle to the newly created directory, or to an already existing directory at the path when
if_exists='ignore'
. Please note the existing directory may not be empty.
Raises:
-
Error
–If
- the path is invalid, or
- the path already exists and
if_exists='error'
, or - the path already exists and is not a directory, or
- an error occurs while attempting to create the directory.
Examples:
>>> pxt.create_dir('my_dir')
Create a subdirectory:
>>> pxt.create_dir('my_dir.sub_dir')
Create a subdirectory only if it does not already exist, otherwise do nothing:
>>> pxt.create_dir('my_dir.sub_dir', if_exists='ignore')
Create a directory and replace if it already exists:
>>> pxt.create_dir('my_dir', if_exists='replace_force')
Create a subdirectory along with its ancestors:
>>> pxt.create_dir('parent1.parent2.sub_dir', parents=True)
create_snapshot
create_snapshot(
path_str: str,
base: Union[Table, DataFrame],
*,
additional_columns: Optional[dict[str, Any]] = None,
iterator: Optional[tuple[type[ComponentIterator], dict[str, Any]]] = None,
num_retained_versions: int = 10,
comment: str = "",
media_validation: Literal["on_read", "on_write"] = "on_write",
if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error"
) -> Optional[Table]
Create a snapshot of an existing table object (which itself can be a view or a snapshot or a base table).
Parameters:
-
path_str
(str
) –A name for the snapshot; can be either a simple name such as
my_snapshot
, or a pathname such asdir1.my_snapshot
. -
base
(Union[Table, DataFrame]
) – -
additional_columns
(Optional[dict[str, Any]]
, default:None
) –If specified, will add these columns to the snapshot once it is created. The format of the
additional_columns
parameter is identical to the format of theschema_or_df
parameter increate_table
. -
iterator
(Optional[tuple[type[ComponentIterator], dict[str, Any]]]
, default:None
) –The iterator to use for this snapshot. If specified, then this snapshot will be a one-to-many view of the base table.
-
num_retained_versions
(int
, default:10
) –Number of versions of the view to retain.
-
comment
(str
, default:''
) –Optional comment for the snapshot.
-
media_validation
(Literal['on_read', 'on_write']
, default:'on_write'
) –Media validation policy for the snapshot.
'on_read'
: validate media files at query time'on_write'
: validate media files during insert/update operations
-
if_exists
(Literal['error', 'ignore', 'replace', 'replace_force']
, default:'error'
) –Directive regarding how to handle if the path already exists. Must be one of the following:
'error'
: raise an error'ignore'
: do nothing and return the existing snapshot handle'replace'
: if the existing snapshot has no dependents, drop and replace it with a new one'replace_force'
: drop the existing snapshot and all its dependents, and create a new one
Returns:
-
Optional[Table]
–A handle to the
Table
representing the newly created snapshot. Please note the schema or base of the existing snapshot may not match those provided in the call.
Raises:
-
Error
–if
- the path is invalid, or
- the path already exists and
if_exists='error'
, or - the path already exists and is not a snapshot, or
- an error occurs while attempting to create the snapshot.
Examples:
Create a snapshot my_snapshot
of a table my_table
:
>>> tbl = pxt.get_table('my_table')
... snapshot = pxt.create_snapshot('my_snapshot', tbl)
Create a snapshot my_snapshot
of a view my_view
with additional int column col3
,
if my_snapshot
does not already exist:
>>> view = pxt.get_table('my_view')
... snapshot = pxt.create_snapshot(
... 'my_snapshot', view, additional_columns={'col3': pxt.Int}, if_exists='ignore'
... )
Create a snapshot my_snapshot
on a table my_table
, and replace any existing snapshot named my_snapshot
:
>>> tbl = pxt.get_table('my_table')
... snapshot = pxt.create_snapshot('my_snapshot', tbl, if_exists='replace_force')
create_table
create_table(
path_str: str,
schema: Optional[dict[str, Any]] = None,
*,
source: Optional[TableDataSource] = None,
source_format: Optional[Literal["csv", "excel", "parquet", "json"]] = None,
schema_overrides: Optional[dict[str, Any]] = None,
on_error: Literal["abort", "ignore"] = "abort",
primary_key: Optional[Union[str, list[str]]] = None,
num_retained_versions: int = 10,
comment: str = "",
media_validation: Literal["on_read", "on_write"] = "on_write",
if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error",
extra_args: Optional[dict[str, Any]] = None
) -> Table
Create a new base table.
Parameters:
-
path_str
(str
) –Path to the table.
-
schema
(Optional[dict[str, Any]]
, default:None
) –A dictionary that maps column names to column types
-
source
(Optional[TableDataSource]
, default:None
) –A data source from which a table schema can be inferred and data imported
-
source_format
(Optional[Literal['csv', 'excel', 'parquet', 'json']]
, default:None
) –A hint to the format of the source data
-
schema_overrides
(Optional[dict[str, Any]]
, default:None
) –If specified, then columns in
schema_overrides
will be given the specified types -
on_error
(Literal['abort', 'ignore']
, default:'abort'
) –Determines the behavior if an error occurs while evaluating a computed column or detecting an invalid media file (such as a corrupt image) for one of the inserted rows.
- If
on_error='abort'
, then an exception will be raised and the rows will not be inserted. - If
on_error='ignore'
, then execution will continue and the rows will be inserted. Any cells with errors will have aNone
value for that cell, with information about the error stored in the correspondingtbl.col_name.errortype
andtbl.col_name.errormsg
fields.
- If
-
primary_key
(Optional[Union[str, list[str]]]
, default:None
) –An optional column name or list of column names to use as the primary key(s) of the table.
-
num_retained_versions
(int
, default:10
) –Number of versions of the table to retain.
-
comment
(str
, default:''
) –An optional comment; its meaning is user-defined.
-
media_validation
(Literal['on_read', 'on_write']
, default:'on_write'
) –Media validation policy for the table.
'on_read'
: validate media files at query time'on_write'
: validate media files during insert/update operations
-
if_exists
(Literal['error', 'ignore', 'replace', 'replace_force']
, default:'error'
) –Directive regarding how to handle if the path already exists. Must be one of the following:
'error'
: raise an error'ignore'
: do nothing and return the existing table handle'replace'
: if the existing table has no views, drop and replace it with a new one'replace_force'
: drop the existing table and all its views, and create a new one
-
extra_args
(Optional[dict[str, Any]]
, default:None
) –Additional arguments to pass to the source data provider
Returns:
-
Table
–A handle to the newly created table, or to an already existing table at the path when
if_exists='ignore'
. Please note the schema of the existing table may not match the schema provided in the call.
Raises:
-
Error
–if
- the path is invalid, or
- the path already exists and
if_exists='error'
, or - the path already exists and is not a table, or
- an error occurs while attempting to create the table, or
- an error occurs while attempting to import data from the source.
Examples:
Create a table with an int and a string column:
>>> tbl = pxt.create_table('my_table', schema={'col1': pxt.Int, 'col2': pxt.String})
Create a table from a select statement over an existing table orig_table
(this will create a new table
containing the exact contents of the query):
>>> tbl1 = pxt.get_table('orig_table')
... tbl2 = pxt.create_table('new_table', tbl1.where(tbl1.col1 < 10).select(tbl1.col2))
Create a table if does not already exist, otherwise get the existing table:
>>> tbl = pxt.create_table('my_table', schema={'col1': pxt.Int, 'col2': pxt.String}, if_exists='ignore')
Create a table with an int and a float column, and replace any existing table:
>>> tbl = pxt.create_table('my_table', schema={'col1': pxt.Int, 'col2': pxt.Float}, if_exists='replace')
Create a table from a CSV file:
>>> tbl = pxt.create_table('my_table', source='data.csv')
create_view
create_view(
path: str,
base: Union[Table, DataFrame],
*,
additional_columns: Optional[dict[str, Any]] = None,
is_snapshot: bool = False,
iterator: Optional[tuple[type[ComponentIterator], dict[str, Any]]] = None,
num_retained_versions: int = 10,
comment: str = "",
media_validation: Literal["on_read", "on_write"] = "on_write",
if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error"
) -> Optional[Table]
Create a view of an existing table object (which itself can be a view or a snapshot or a base table).
Parameters:
-
path
(str
) –A name for the view; can be either a simple name such as
my_view
, or a pathname such asdir1.my_view
. -
base
(Union[Table, DataFrame]
) – -
additional_columns
(Optional[dict[str, Any]]
, default:None
) –If specified, will add these columns to the view once it is created. The format of the
additional_columns
parameter is identical to the format of theschema_or_df
parameter increate_table
. -
is_snapshot
(bool
, default:False
) –Whether the view is a snapshot. Setting this to
True
is equivalent to callingcreate_snapshot
. -
iterator
(Optional[tuple[type[ComponentIterator], dict[str, Any]]]
, default:None
) –The iterator to use for this view. If specified, then this view will be a one-to-many view of the base table.
-
num_retained_versions
(int
, default:10
) –Number of versions of the view to retain.
-
comment
(str
, default:''
) –Optional comment for the view.
-
media_validation
(Literal['on_read', 'on_write']
, default:'on_write'
) –Media validation policy for the view.
'on_read'
: validate media files at query time'on_write'
: validate media files during insert/update operations
-
if_exists
(Literal['error', 'ignore', 'replace', 'replace_force']
, default:'error'
) –Directive regarding how to handle if the path already exists. Must be one of the following:
'error'
: raise an error'ignore'
: do nothing and return the existing view handle'replace'
: if the existing view has no dependents, drop and replace it with a new one'replace_force'
: drop the existing view and all its dependents, and create a new one
Returns:
-
Optional[Table]
–A handle to the
Table
representing the newly created view. If the path already exists andif_exists='ignore'
, returns a handle to the existing view. Please note the schema or the base of the existing view may not match those provided in the call.
Raises:
-
Error
–if
- the path is invalid, or
- the path already exists and
if_exists='error'
, or - the path already exists and is not a view, or
- an error occurs while attempting to create the view.
Examples:
Create a view my_view
of an existing table my_table
, filtering on rows where col1
is greater than 10:
>>> tbl = pxt.get_table('my_table')
... view = pxt.create_view('my_view', tbl.where(tbl.col1 > 10))
Create a view my_view
of an existing table my_table
, filtering on rows where col1
is greater than 10,
and if it not already exist. Otherwise, get the existing view named my_view
:
>>> tbl = pxt.get_table('my_table')
... view = pxt.create_view('my_view', tbl.where(tbl.col1 > 10), if_exists='ignore')
Create a view my_view
of an existing table my_table
, filtering on rows where col1
is greater than 100,
and replace any existing view named my_view
:
>>> tbl = pxt.get_table('my_table')
... view = pxt.create_view('my_view', tbl.where(tbl.col1 > 100), if_exists='replace_force')
drop_table
drop_table(
table: Union[str, Table],
force: bool = False,
if_not_exists: Literal["error", "ignore"] = "error",
) -> None
Drop a table, view, or snapshot.
Parameters:
-
table
(Union[str, Table]
) –Fully qualified name, or handle, of the table to be dropped.
-
force
(bool
, default:False
) –If
True
, will also drop all views and sub-views of this table. -
if_not_exists
(Literal['error', 'ignore']
, default:'error'
) –Directive regarding how to handle if the path does not exist. Must be one of the following:
'error'
: raise an error'ignore'
: do nothing and return
Raises:
-
Error
–if the qualified name
- is invalid, or
- does not exist and
if_not_exists='error'
, or - does not designate a table object, or
- designates a table object but has dependents and
force=False
.
Examples:
Drop a table by its fully qualified name:
>>> pxt.drop_table('subdir.my_table')
Drop a table by its handle:
>>> t = pxt.get_table('subdir.my_table')
... pxt.drop_table(t)
Drop a table if it exists, otherwise do nothing:
>>> pxt.drop_table('subdir.my_table', if_not_exists='ignore')
Drop a table and all its dependents:
>>> pxt.drop_table('subdir.my_table', force=True)
get_table
get_table(path: str) -> Table
Get a handle to an existing table, view, or snapshot.
Parameters:
-
path
(str
) –Path to the table.
Returns:
Raises:
-
Error
–If the path does not exist or does not designate a table object.
Examples:
Get handle for a table in the top-level directory:
>>> tbl = pxt.get_table('my_table')
For a table in a subdirectory:
>>> tbl = pxt.get_table('subdir.my_table')
Handles to views and snapshots are retrieved in the same way:
>>> tbl = pxt.get_table('my_snapshot')
init
init() -> None
Initializes the Pixeltable environment.
list_tables
list_tables(dir_path: str = '', recursive: bool = True) -> list[str]
List the Table
s in a directory.
Parameters:
-
dir_path
(str
, default:''
) –Path to the directory. Defaults to the root directory.
-
recursive
(bool
, default:True
) –If
False
, returns only those tables that are directly contained in specified directory; ifTrue
, returns all tables that are descendants of the specified directory, recursively.
Returns:
-
list[str]
–A list of
Table
paths.
Raises:
-
Error
–If the path does not exist or does not designate a directory.
Examples:
List tables in top-level directory:
>>> pxt.list_tables()
List tables in 'dir1':
>>> pxt.list_tables('dir1')
list_dirs
list_dirs(path: str = '', recursive: bool = True) -> list[str]
List the directories in a directory.
Parameters:
-
path
(str
, default:''
) –Name or path of the directory.
-
recursive
(bool
, default:True
) –If
True
, lists all descendants of this directory recursively.
Returns:
-
list[str]
–List of directory paths.
Raises:
-
Error
–If
path_str
does not exist or does not designate a directory.
Examples:
>>> cl.list_dirs('my_dir', recursive=True)
['my_dir', 'my_dir.sub_dir1']
move
move(path: str, new_path: str) -> None
Move a schema object to a new directory and/or rename a schema object.
Parameters:
-
path
(str
) –absolute path to the existing schema object.
-
new_path
(str
) –absolute new path for the schema object.
Raises:
-
Error
–If path does not exist or new_path already exists.
Examples:
Move a table to a different directory:
>>>> pxt.move('dir1.my_table', 'dir2.my_table')
Rename a table:
>>>> pxt.move('dir1.my_table', 'dir1.new_name')
drop_dir
drop_dir(
path: str,
force: bool = False,
if_not_exists: Literal["error", "ignore"] = "error",
) -> None
Remove a directory.
Parameters:
-
path
(str
) –Name or path of the directory.
-
force
(bool
, default:False
) –If
True
, will also drop all tables and subdirectories of this directory, recursively, along with any views or snapshots that depend on any of the dropped tables. -
if_not_exists
(Literal['error', 'ignore']
, default:'error'
) –Directive regarding how to handle if the path does not exist. Must be one of the following:
'error'
: raise an error'ignore'
: do nothing and return
Raises:
-
Error
–If the path
- is invalid, or
- does not exist and
if_not_exists='error'
, or - is not designate a directory, or
- is a direcotory but is not empty and
force=False
.
Examples:
Remove a directory, if it exists and is empty:
>>> pxt.drop_dir('my_dir')
Remove a subdirectory:
>>> pxt.drop_dir('my_dir.sub_dir')
Remove an existing directory if it is empty, but do nothing if it does not exist:
>>> pxt.drop_dir('my_dir.sub_dir', if_not_exists='ignore')
Remove an existing directory and all its contents:
>>> pxt.drop_dir('my_dir', force=True)