Pixeltable
Import conventions:
import pixeltable as pxt
Insertable tables, views, and snapshots all have a tabular interface and are generically referred to as "tables" below.
Overview
Table Operations | |
---|---|
pxt.create_table |
Create a new (insertable) table |
pxt.create_view |
Create a new view |
pxt.create_snapshot |
Create a new snapshot |
pxt.drop_table |
Delete a table |
pxt.get_table |
Get a handle to a table |
pxt.list_tables |
List the tables in a directory |
Directory Operations | |
---|---|
pxt.create_dir |
Create a directory |
pxt.list_dirs |
List the directories in a directory |
pxt.drop_dir |
Remove a directory |
Misc | |
---|---|
pxt.ls |
Output a human-readable list of the contents of a Pixeltable directory |
pxt.configure_logging |
Configure logging |
pxt.init |
Initialize Pixeltable runtime now (if not already initialized) |
pxt.move |
Move a schema object to a new directory and/or rename a schema object |
pixeltable
configure_logging
configure_logging(
*,
to_stdout: Optional[bool] = None,
level: Optional[int] = None,
add: Optional[str] = None,
remove: Optional[str] = None
) -> None
Configure logging.
Parameters:
-
to_stdout
(Optional[bool]
, default:None
) –if True, also log to stdout
-
level
(Optional[int]
, default:None
) –default log level
-
add
(Optional[str]
, default:None
) –comma-separated list of 'module name:log level' pairs; ex.: add='video:10'
-
remove
(Optional[str]
, default:None
) –comma-separated list of module names
create_dir
create_dir(
path: str,
if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error",
parents: bool = False,
) -> Optional[Dir]
Create a directory.
Parameters:
-
path
(str
) –Path to the directory.
-
if_exists
(Literal['error', 'ignore', 'replace', 'replace_force']
, default:'error'
) –Directive regarding how to handle if the path already exists. Must be one of the following:
'error'
: raise an error'ignore'
: do nothing and return the existing directory handle'replace'
: if the existing directory is empty, drop it and create a new one'replace_force'
: drop the existing directory and all its children, and create a new one
-
parents
(bool
, default:False
) –Create missing parent directories.
Returns:
-
Optional[Dir]
–A handle to the newly created directory, or to an already existing directory at the path when
if_exists='ignore'
. Please note the existing directory may not be empty.
Raises:
-
Error
–If
- the path is invalid, or
- the path already exists and
if_exists='error'
, or - the path already exists and is not a directory, or
- an error occurs while attempting to create the directory.
Examples:
>>> pxt.create_dir('my_dir')
Create a subdirectory:
>>> pxt.create_dir('my_dir.sub_dir')
Create a subdirectory only if it does not already exist, otherwise do nothing:
>>> pxt.create_dir('my_dir.sub_dir', if_exists='ignore')
Create a directory and replace if it already exists:
>>> pxt.create_dir('my_dir', if_exists='replace_force')
Create a subdirectory along with its ancestors:
>>> pxt.create_dir('parent1.parent2.sub_dir', parents=True)
create_snapshot
create_snapshot(
path_str: str,
base: Union[Table, DataFrame],
*,
additional_columns: Optional[dict[str, Any]] = None,
iterator: Optional[tuple[type[ComponentIterator], dict[str, Any]]] = None,
num_retained_versions: int = 10,
comment: str = "",
media_validation: Literal["on_read", "on_write"] = "on_write",
if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error"
) -> Optional[Table]
Create a snapshot of an existing table object (which itself can be a view or a snapshot or a base table).
Parameters:
-
path_str
(str
) –A name for the snapshot; can be either a simple name such as
my_snapshot
, or a pathname such asdir1.my_snapshot
. -
base
(Union[Table, DataFrame]
) – -
additional_columns
(Optional[dict[str, Any]]
, default:None
) –If specified, will add these columns to the snapshot once it is created. The format of the
additional_columns
parameter is identical to the format of theschema_or_df
parameter increate_table
. -
iterator
(Optional[tuple[type[ComponentIterator], dict[str, Any]]]
, default:None
) –The iterator to use for this snapshot. If specified, then this snapshot will be a one-to-many view of the base table.
-
num_retained_versions
(int
, default:10
) –Number of versions of the view to retain.
-
comment
(str
, default:''
) –Optional comment for the snapshot.
-
media_validation
(Literal['on_read', 'on_write']
, default:'on_write'
) –Media validation policy for the snapshot.
'on_read'
: validate media files at query time'on_write'
: validate media files during insert/update operations
-
if_exists
(Literal['error', 'ignore', 'replace', 'replace_force']
, default:'error'
) –Directive regarding how to handle if the path already exists. Must be one of the following:
'error'
: raise an error'ignore'
: do nothing and return the existing snapshot handle'replace'
: if the existing snapshot has no dependents, drop and replace it with a new one'replace_force'
: drop the existing snapshot and all its dependents, and create a new one
Returns:
-
Optional[Table]
–A handle to the
Table
representing the newly created snapshot. Please note the schema or base of the existing snapshot may not match those provided in the call.
Raises:
-
Error
–if
- the path is invalid, or
- the path already exists and
if_exists='error'
, or - the path already exists and is not a snapshot, or
- an error occurs while attempting to create the snapshot.
Examples:
Create a snapshot my_snapshot
of a table my_table
:
>>> tbl = pxt.get_table('my_table')
... snapshot = pxt.create_snapshot('my_snapshot', tbl)
Create a snapshot my_snapshot
of a view my_view
with additional int column col3
,
if my_snapshot
does not already exist:
>>> view = pxt.get_table('my_view')
... snapshot = pxt.create_snapshot(
... 'my_snapshot', view, additional_columns={'col3': pxt.Int}, if_exists='ignore'
... )
Create a snapshot my_snapshot
on a table my_table
, and replace any existing snapshot named my_snapshot
:
>>> tbl = pxt.get_table('my_table')
... snapshot = pxt.create_snapshot('my_snapshot', tbl, if_exists='replace_force')
create_table
create_table(
path: str,
schema: Optional[dict[str, Any]] = None,
*,
source: Optional[TableDataSource] = None,
source_format: Optional[Literal["csv", "excel", "parquet", "json"]] = None,
schema_overrides: Optional[dict[str, Any]] = None,
on_error: Literal["abort", "ignore"] = "abort",
primary_key: Optional[Union[str, list[str]]] = None,
num_retained_versions: int = 10,
comment: str = "",
media_validation: Literal["on_read", "on_write"] = "on_write",
if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error",
extra_args: Optional[dict[str, Any]] = None
) -> Table
Create a new base table. Exactly one of schema
or source
must be provided.
If a schema
is provided, then an empty table will be created with the specified schema.
If a source
is provided, then Pixeltable will attempt to infer a data source format and table schema from the
contents of the specified data, and the data will be imported from the specified source into the new table. The
source format and/or schema can be specified directly via the source_format
and schema_overrides
parameters.
Parameters:
-
path
(str
) –Pixeltable path (qualified name) of the table, such as
'my_table'
or'my_dir.my_subdir.my_table'
. -
schema
(Optional[dict[str, Any]]
, default:None
) –Schema for the new table, mapping column names to Pixeltable types.
-
source
(Optional[TableDataSource]
, default:None
) –A data source (file, URL, DataFrame, or list of rows) to import from.
-
source_format
(Optional[Literal['csv', 'excel', 'parquet', 'json']]
, default:None
) –Must be used in conjunction with a
source
. If specified, then the given format will be used to read the source data. (Otherwise, Pixeltable will attempt to infer the format from the source data.) -
schema_overrides
(Optional[dict[str, Any]]
, default:None
) –Must be used in conjunction with a
source
. If specified, then columns inschema_overrides
will be given the specified types. (Pixeltable will attempt to infer the types of any columns not specified.) -
on_error
(Literal['abort', 'ignore']
, default:'abort'
) –Determines the behavior if an error occurs while evaluating a computed column or detecting an invalid media file (such as a corrupt image) for one of the inserted rows.
- If
on_error='abort'
, then an exception will be raised and the rows will not be inserted. - If
on_error='ignore'
, then execution will continue and the rows will be inserted. Any cells with errors will have aNone
value for that cell, with information about the error stored in the correspondingtbl.col_name.errortype
andtbl.col_name.errormsg
fields.
- If
-
primary_key
(Optional[Union[str, list[str]]]
, default:None
) –An optional column name or list of column names to use as the primary key(s) of the table.
-
num_retained_versions
(int
, default:10
) –Number of versions of the table to retain.
-
comment
(str
, default:''
) –An optional comment; its meaning is user-defined.
-
media_validation
(Literal['on_read', 'on_write']
, default:'on_write'
) –Media validation policy for the table.
'on_read'
: validate media files at query time'on_write'
: validate media files during insert/update operations
-
if_exists
(Literal['error', 'ignore', 'replace', 'replace_force']
, default:'error'
) –Determines the behavior if a table already exists at the specified path location.
'error'
: raise an error'ignore'
: do nothing and return the existing table handle'replace'
: if the existing table has no views or snapshots, drop and replace it with a new one; raise an error if the existing table has views or snapshots'replace_force'
: drop the existing table and all its views and snapshots, and create a new one
-
extra_args
(Optional[dict[str, Any]]
, default:None
) –Must be used in conjunction with a
source
. If specified, then additional arguments will be passed along to the source data provider.
Returns:
-
Table
–A handle to the newly created table, or to an already existing table at the path when
if_exists='ignore'
. Please note the schema of the existing table may not match the schema provided in the call.
Raises:
-
Error
–if
- the path is invalid, or
- the path already exists and
if_exists='error'
, or - the path already exists and is not a table, or
- an error occurs while attempting to create the table, or
- an error occurs while attempting to import data from the source.
Examples:
Create a table with an int and a string column:
>>> tbl = pxt.create_table('my_table', schema={'col1': pxt.Int, 'col2': pxt.String})
Create a table from a select statement over an existing table orig_table
(this will create a new table
containing the exact contents of the query):
>>> tbl1 = pxt.get_table('orig_table')
... tbl2 = pxt.create_table('new_table', tbl1.where(tbl1.col1 < 10).select(tbl1.col2))
Create a table if it does not already exist, otherwise get the existing table:
>>> tbl = pxt.create_table('my_table', schema={'col1': pxt.Int, 'col2': pxt.String}, if_exists='ignore')
Create a table with an int and a float column, and replace any existing table:
>>> tbl = pxt.create_table('my_table', schema={'col1': pxt.Int, 'col2': pxt.Float}, if_exists='replace')
Create a table from a CSV file:
>>> tbl = pxt.create_table('my_table', source='data.csv')
create_view
create_view(
path: str,
base: Union[Table, DataFrame],
*,
additional_columns: Optional[dict[str, Any]] = None,
is_snapshot: bool = False,
iterator: Optional[tuple[type[ComponentIterator], dict[str, Any]]] = None,
num_retained_versions: int = 10,
comment: str = "",
media_validation: Literal["on_read", "on_write"] = "on_write",
if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error"
) -> Optional[Table]
Create a view of an existing table object (which itself can be a view or a snapshot or a base table).
Parameters:
-
path
(str
) –A name for the view; can be either a simple name such as
my_view
, or a pathname such asdir1.my_view
. -
base
(Union[Table, DataFrame]
) – -
additional_columns
(Optional[dict[str, Any]]
, default:None
) –If specified, will add these columns to the view once it is created. The format of the
additional_columns
parameter is identical to the format of theschema_or_df
parameter increate_table
. -
is_snapshot
(bool
, default:False
) –Whether the view is a snapshot. Setting this to
True
is equivalent to callingcreate_snapshot
. -
iterator
(Optional[tuple[type[ComponentIterator], dict[str, Any]]]
, default:None
) –The iterator to use for this view. If specified, then this view will be a one-to-many view of the base table.
-
num_retained_versions
(int
, default:10
) –Number of versions of the view to retain.
-
comment
(str
, default:''
) –Optional comment for the view.
-
media_validation
(Literal['on_read', 'on_write']
, default:'on_write'
) –Media validation policy for the view.
'on_read'
: validate media files at query time'on_write'
: validate media files during insert/update operations
-
if_exists
(Literal['error', 'ignore', 'replace', 'replace_force']
, default:'error'
) –Directive regarding how to handle if the path already exists. Must be one of the following:
'error'
: raise an error'ignore'
: do nothing and return the existing view handle'replace'
: if the existing view has no dependents, drop and replace it with a new one'replace_force'
: drop the existing view and all its dependents, and create a new one
Returns:
-
Optional[Table]
–A handle to the
Table
representing the newly created view. If the path already exists andif_exists='ignore'
, returns a handle to the existing view. Please note the schema or the base of the existing view may not match those provided in the call.
Raises:
-
Error
–if
- the path is invalid, or
- the path already exists and
if_exists='error'
, or - the path already exists and is not a view, or
- an error occurs while attempting to create the view.
Examples:
Create a view my_view
of an existing table my_table
, filtering on rows where col1
is greater than 10:
>>> tbl = pxt.get_table('my_table')
... view = pxt.create_view('my_view', tbl.where(tbl.col1 > 10))
Create a view my_view
of an existing table my_table
, filtering on rows where col1
is greater than 10,
and if it not already exist. Otherwise, get the existing view named my_view
:
>>> tbl = pxt.get_table('my_table')
... view = pxt.create_view('my_view', tbl.where(tbl.col1 > 10), if_exists='ignore')
Create a view my_view
of an existing table my_table
, filtering on rows where col1
is greater than 100,
and replace any existing view named my_view
:
>>> tbl = pxt.get_table('my_table')
... view = pxt.create_view('my_view', tbl.where(tbl.col1 > 100), if_exists='replace_force')
drop_table
drop_table(
table: Union[str, Table],
force: bool = False,
if_not_exists: Literal["error", "ignore"] = "error",
) -> None
Drop a table, view, or snapshot.
Parameters:
-
table
(Union[str, Table]
) –Fully qualified name, or handle, of the table to be dropped.
-
force
(bool
, default:False
) –If
True
, will also drop all views and sub-views of this table. -
if_not_exists
(Literal['error', 'ignore']
, default:'error'
) –Directive regarding how to handle if the path does not exist. Must be one of the following:
'error'
: raise an error'ignore'
: do nothing and return
Raises:
-
Error
–if the qualified name
- is invalid, or
- does not exist and
if_not_exists='error'
, or - does not designate a table object, or
- designates a table object but has dependents and
force=False
.
Examples:
Drop a table by its fully qualified name:
>>> pxt.drop_table('subdir.my_table')
Drop a table by its handle:
>>> t = pxt.get_table('subdir.my_table')
... pxt.drop_table(t)
Drop a table if it exists, otherwise do nothing:
>>> pxt.drop_table('subdir.my_table', if_not_exists='ignore')
Drop a table and all its dependents:
>>> pxt.drop_table('subdir.my_table', force=True)
get_table
get_table(path: str) -> Table
Get a handle to an existing table, view, or snapshot.
Parameters:
-
path
(str
) –Path to the table.
Returns:
Raises:
-
Error
–If the path does not exist or does not designate a table object.
Examples:
Get handle for a table in the top-level directory:
>>> tbl = pxt.get_table('my_table')
For a table in a subdirectory:
>>> tbl = pxt.get_table('subdir.my_table')
Handles to views and snapshots are retrieved in the same way:
>>> tbl = pxt.get_table('my_snapshot')
init
init(config_overrides: Optional[dict[str, Any]] = None) -> None
Initializes the Pixeltable environment.
ls
ls(path: str = '') -> DataFrame
List the contents of a Pixeltable directory.
This function returns a Pandas DataFrame representing a human-readable listing of the specified directory, including various attributes such as version and base table, as appropriate.
To get a programmatic list of tables and/or directories, use list_tables() and/or list_dirs() instead.
list_tables
list_tables(dir_path: str = '', recursive: bool = True) -> list[str]
List the Table
s in a directory.
Parameters:
-
dir_path
(str
, default:''
) –Path to the directory. Defaults to the root directory.
-
recursive
(bool
, default:True
) –If
False
, returns only those tables that are directly contained in specified directory; ifTrue
, returns all tables that are descendants of the specified directory, recursively.
Returns:
-
list[str]
–A list of
Table
paths.
Raises:
-
Error
–If the path does not exist or does not designate a directory.
Examples:
List tables in top-level directory:
>>> pxt.list_tables()
List tables in 'dir1':
>>> pxt.list_tables('dir1')
list_dirs
list_dirs(path: str = '', recursive: bool = True) -> list[str]
List the directories in a directory.
Parameters:
-
path
(str
, default:''
) –Name or path of the directory.
-
recursive
(bool
, default:True
) –If
True
, lists all descendants of this directory recursively.
Returns:
-
list[str]
–List of directory paths.
Raises:
-
Error
–If
path_str
does not exist or does not designate a directory.
Examples:
>>> cl.list_dirs('my_dir', recursive=True)
['my_dir', 'my_dir.sub_dir1']
move
move(path: str, new_path: str) -> None
Move a schema object to a new directory and/or rename a schema object.
Parameters:
-
path
(str
) –absolute path to the existing schema object.
-
new_path
(str
) –absolute new path for the schema object.
Raises:
-
Error
–If path does not exist or new_path already exists.
Examples:
Move a table to a different directory:
>>>> pxt.move('dir1.my_table', 'dir2.my_table')
Rename a table:
>>>> pxt.move('dir1.my_table', 'dir1.new_name')
drop_dir
drop_dir(
path: str,
force: bool = False,
if_not_exists: Literal["error", "ignore"] = "error",
) -> None
Remove a directory.
Parameters:
-
path
(str
) –Name or path of the directory.
-
force
(bool
, default:False
) –If
True
, will also drop all tables and subdirectories of this directory, recursively, along with any views or snapshots that depend on any of the dropped tables. -
if_not_exists
(Literal['error', 'ignore']
, default:'error'
) –Directive regarding how to handle if the path does not exist. Must be one of the following:
'error'
: raise an error'ignore'
: do nothing and return
Raises:
-
Error
–If the path
- is invalid, or
- does not exist and
if_not_exists='error'
, or - is not designate a directory, or
- is a direcotory but is not empty and
force=False
.
Examples:
Remove a directory, if it exists and is empty:
>>> pxt.drop_dir('my_dir')
Remove a subdirectory:
>>> pxt.drop_dir('my_dir.sub_dir')
Remove an existing directory if it is empty, but do nothing if it does not exist:
>>> pxt.drop_dir('my_dir.sub_dir', if_not_exists='ignore')
Remove an existing directory and all its contents:
>>> pxt.drop_dir('my_dir', force=True)