Skip to content

Pixeltable

Import conventions:

import pixeltable as pxt

Insertable tables, views, and snapshots all have a tabular interface and are generically referred to as "tables" below.

Overview

Table Operations
pxt.create_table Create a new (insertable) table
pxt.create_view Create a new view
pxt.create_snapshot Create a new snapshot
pxt.drop_table Delete a table
pxt.get_table Get a handle to a table
pxt.list_tables List the tables in a directory
Directory Operations
pxt.create_dir Create a directory
pxt.list_dirs List the directories in a directory
pxt.drop_dir Remove a directory
Misc
pxt.configure_logging Configure logging
pxt.init Initialize Pixeltable runtime now (if not already initialized)
pxt.move Move a schema object to a new directory and/or rename a schema object

pixeltable

configure_logging

configure_logging(
    *,
    to_stdout: Optional[bool] = None,
    level: Optional[int] = None,
    add: Optional[str] = None,
    remove: Optional[str] = None
) -> None

Configure logging.

Parameters:

  • to_stdout (Optional[bool], default: None ) –

    if True, also log to stdout

  • level (Optional[int], default: None ) –

    default log level

  • add (Optional[str], default: None ) –

    comma-separated list of 'module name:log level' pairs; ex.: add='video:10'

  • remove (Optional[str], default: None ) –

    comma-separated list of module names

create_dir

create_dir(
    path: str,
    if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error",
    parents: bool = False,
) -> Optional[Dir]

Create a directory.

Parameters:

  • path (str) –

    Path to the directory.

  • if_exists (Literal['error', 'ignore', 'replace', 'replace_force'], default: 'error' ) –

    Directive regarding how to handle if the path already exists. Must be one of the following:

    • 'error': raise an error
    • 'ignore': do nothing and return the existing directory handle
    • 'replace': if the existing directory is empty, drop it and create a new one
    • 'replace_force': drop the existing directory and all its children, and create a new one
  • parents (bool, default: False ) –

    Create missing parent directories.

Returns:

  • Optional[Dir]

    A handle to the newly created directory, or to an already existing directory at the path when if_exists='ignore'. Please note the existing directory may not be empty.

Raises:

  • Error

    If

    • the path is invalid, or
    • the path already exists and if_exists='error', or
    • the path already exists and is not a directory, or
    • an error occurs while attempting to create the directory.

Examples:

>>> pxt.create_dir('my_dir')

Create a subdirectory:

>>> pxt.create_dir('my_dir.sub_dir')

Create a subdirectory only if it does not already exist, otherwise do nothing:

>>> pxt.create_dir('my_dir.sub_dir', if_exists='ignore')

Create a directory and replace if it already exists:

>>> pxt.create_dir('my_dir', if_exists='replace_force')

Create a subdirectory along with its ancestors:

>>> pxt.create_dir('parent1.parent2.sub_dir', parents=True)

create_snapshot

create_snapshot(
    path_str: str,
    base: Union[Table, DataFrame],
    *,
    additional_columns: Optional[dict[str, Any]] = None,
    iterator: Optional[tuple[type[ComponentIterator], dict[str, Any]]] = None,
    num_retained_versions: int = 10,
    comment: str = "",
    media_validation: Literal["on_read", "on_write"] = "on_write",
    if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error"
) -> Optional[Table]

Create a snapshot of an existing table object (which itself can be a view or a snapshot or a base table).

Parameters:

  • path_str (str) –

    A name for the snapshot; can be either a simple name such as my_snapshot, or a pathname such as dir1.my_snapshot.

  • base (Union[Table, DataFrame]) –

    Table (i.e., table or view or snapshot) or DataFrame to base the snapshot on.

  • additional_columns (Optional[dict[str, Any]], default: None ) –

    If specified, will add these columns to the snapshot once it is created. The format of the additional_columns parameter is identical to the format of the schema_or_df parameter in create_table.

  • iterator (Optional[tuple[type[ComponentIterator], dict[str, Any]]], default: None ) –

    The iterator to use for this snapshot. If specified, then this snapshot will be a one-to-many view of the base table.

  • num_retained_versions (int, default: 10 ) –

    Number of versions of the view to retain.

  • comment (str, default: '' ) –

    Optional comment for the snapshot.

  • media_validation (Literal['on_read', 'on_write'], default: 'on_write' ) –

    Media validation policy for the snapshot.

    • 'on_read': validate media files at query time
    • 'on_write': validate media files during insert/update operations
  • if_exists (Literal['error', 'ignore', 'replace', 'replace_force'], default: 'error' ) –

    Directive regarding how to handle if the path already exists. Must be one of the following:

    • 'error': raise an error
    • 'ignore': do nothing and return the existing snapshot handle
    • 'replace': if the existing snapshot has no dependents, drop and replace it with a new one
    • 'replace_force': drop the existing snapshot and all its dependents, and create a new one

Returns:

  • Optional[Table]

    A handle to the Table representing the newly created snapshot. Please note the schema or base of the existing snapshot may not match those provided in the call.

Raises:

  • Error

    if

    • the path is invalid, or
    • the path already exists and if_exists='error', or
    • the path already exists and is not a snapshot, or
    • an error occurs while attempting to create the snapshot.

Examples:

Create a snapshot my_snapshot of a table my_table:

>>> tbl = pxt.get_table('my_table')
... snapshot = pxt.create_snapshot('my_snapshot', tbl)

Create a snapshot my_snapshot of a view my_view with additional int column col3, if my_snapshot does not already exist:

>>> view = pxt.get_table('my_view')
... snapshot = pxt.create_snapshot(
...     'my_snapshot', view, additional_columns={'col3': pxt.Int}, if_exists='ignore'
... )

Create a snapshot my_snapshot on a table my_table, and replace any existing snapshot named my_snapshot:

>>> tbl = pxt.get_table('my_table')
... snapshot = pxt.create_snapshot('my_snapshot', tbl, if_exists='replace_force')

create_table

create_table(
    path_str: str,
    schema: Optional[dict[str, Any]] = None,
    *,
    source: Optional[TableDataSource] = None,
    source_format: Optional[Literal["csv", "excel", "parquet", "json"]] = None,
    schema_overrides: Optional[dict[str, Any]] = None,
    on_error: Literal["abort", "ignore"] = "abort",
    primary_key: Optional[Union[str, list[str]]] = None,
    num_retained_versions: int = 10,
    comment: str = "",
    media_validation: Literal["on_read", "on_write"] = "on_write",
    if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error",
    extra_args: Optional[dict[str, Any]] = None
) -> Table

Create a new base table.

Parameters:

  • path_str (str) –

    Path to the table.

  • schema (Optional[dict[str, Any]], default: None ) –

    A dictionary that maps column names to column types

  • source (Optional[TableDataSource], default: None ) –

    A data source from which a table schema can be inferred and data imported

  • source_format (Optional[Literal['csv', 'excel', 'parquet', 'json']], default: None ) –

    A hint to the format of the source data

  • schema_overrides (Optional[dict[str, Any]], default: None ) –

    If specified, then columns in schema_overrides will be given the specified types

  • on_error (Literal['abort', 'ignore'], default: 'abort' ) –

    Determines the behavior if an error occurs while evaluating a computed column or detecting an invalid media file (such as a corrupt image) for one of the inserted rows.

    • If on_error='abort', then an exception will be raised and the rows will not be inserted.
    • If on_error='ignore', then execution will continue and the rows will be inserted. Any cells with errors will have a None value for that cell, with information about the error stored in the corresponding tbl.col_name.errortype and tbl.col_name.errormsg fields.
  • primary_key (Optional[Union[str, list[str]]], default: None ) –

    An optional column name or list of column names to use as the primary key(s) of the table.

  • num_retained_versions (int, default: 10 ) –

    Number of versions of the table to retain.

  • comment (str, default: '' ) –

    An optional comment; its meaning is user-defined.

  • media_validation (Literal['on_read', 'on_write'], default: 'on_write' ) –

    Media validation policy for the table.

    • 'on_read': validate media files at query time
    • 'on_write': validate media files during insert/update operations
  • if_exists (Literal['error', 'ignore', 'replace', 'replace_force'], default: 'error' ) –

    Directive regarding how to handle if the path already exists. Must be one of the following:

    • 'error': raise an error
    • 'ignore': do nothing and return the existing table handle
    • 'replace': if the existing table has no views, drop and replace it with a new one
    • 'replace_force': drop the existing table and all its views, and create a new one
  • extra_args (Optional[dict[str, Any]], default: None ) –

    Additional arguments to pass to the source data provider

Returns:

  • Table

    A handle to the newly created table, or to an already existing table at the path when if_exists='ignore'. Please note the schema of the existing table may not match the schema provided in the call.

Raises:

  • Error

    if

    • the path is invalid, or
    • the path already exists and if_exists='error', or
    • the path already exists and is not a table, or
    • an error occurs while attempting to create the table, or
    • an error occurs while attempting to import data from the source.

Examples:

Create a table with an int and a string column:

>>> tbl = pxt.create_table('my_table', schema={'col1': pxt.Int, 'col2': pxt.String})

Create a table from a select statement over an existing table orig_table (this will create a new table containing the exact contents of the query):

>>> tbl1 = pxt.get_table('orig_table')
... tbl2 = pxt.create_table('new_table', tbl1.where(tbl1.col1 < 10).select(tbl1.col2))

Create a table if does not already exist, otherwise get the existing table:

>>> tbl = pxt.create_table('my_table', schema={'col1': pxt.Int, 'col2': pxt.String}, if_exists='ignore')

Create a table with an int and a float column, and replace any existing table:

>>> tbl = pxt.create_table('my_table', schema={'col1': pxt.Int, 'col2': pxt.Float}, if_exists='replace')

Create a table from a CSV file:

>>> tbl = pxt.create_table('my_table', source='data.csv')

create_view

create_view(
    path: str,
    base: Union[Table, DataFrame],
    *,
    additional_columns: Optional[dict[str, Any]] = None,
    is_snapshot: bool = False,
    iterator: Optional[tuple[type[ComponentIterator], dict[str, Any]]] = None,
    num_retained_versions: int = 10,
    comment: str = "",
    media_validation: Literal["on_read", "on_write"] = "on_write",
    if_exists: Literal["error", "ignore", "replace", "replace_force"] = "error"
) -> Optional[Table]

Create a view of an existing table object (which itself can be a view or a snapshot or a base table).

Parameters:

  • path (str) –

    A name for the view; can be either a simple name such as my_view, or a pathname such as dir1.my_view.

  • base (Union[Table, DataFrame]) –

    Table (i.e., table or view or snapshot) or DataFrame to base the view on.

  • additional_columns (Optional[dict[str, Any]], default: None ) –

    If specified, will add these columns to the view once it is created. The format of the additional_columns parameter is identical to the format of the schema_or_df parameter in create_table.

  • is_snapshot (bool, default: False ) –

    Whether the view is a snapshot. Setting this to True is equivalent to calling create_snapshot.

  • iterator (Optional[tuple[type[ComponentIterator], dict[str, Any]]], default: None ) –

    The iterator to use for this view. If specified, then this view will be a one-to-many view of the base table.

  • num_retained_versions (int, default: 10 ) –

    Number of versions of the view to retain.

  • comment (str, default: '' ) –

    Optional comment for the view.

  • media_validation (Literal['on_read', 'on_write'], default: 'on_write' ) –

    Media validation policy for the view.

    • 'on_read': validate media files at query time
    • 'on_write': validate media files during insert/update operations
  • if_exists (Literal['error', 'ignore', 'replace', 'replace_force'], default: 'error' ) –

    Directive regarding how to handle if the path already exists. Must be one of the following:

    • 'error': raise an error
    • 'ignore': do nothing and return the existing view handle
    • 'replace': if the existing view has no dependents, drop and replace it with a new one
    • 'replace_force': drop the existing view and all its dependents, and create a new one

Returns:

  • Optional[Table]

    A handle to the Table representing the newly created view. If the path already exists and if_exists='ignore', returns a handle to the existing view. Please note the schema or the base of the existing view may not match those provided in the call.

Raises:

  • Error

    if

    • the path is invalid, or
    • the path already exists and if_exists='error', or
    • the path already exists and is not a view, or
    • an error occurs while attempting to create the view.

Examples:

Create a view my_view of an existing table my_table, filtering on rows where col1 is greater than 10:

>>> tbl = pxt.get_table('my_table')
... view = pxt.create_view('my_view', tbl.where(tbl.col1 > 10))

Create a view my_view of an existing table my_table, filtering on rows where col1 is greater than 10, and if it not already exist. Otherwise, get the existing view named my_view:

>>> tbl = pxt.get_table('my_table')
... view = pxt.create_view('my_view', tbl.where(tbl.col1 > 10), if_exists='ignore')

Create a view my_view of an existing table my_table, filtering on rows where col1 is greater than 100, and replace any existing view named my_view:

>>> tbl = pxt.get_table('my_table')
... view = pxt.create_view('my_view', tbl.where(tbl.col1 > 100), if_exists='replace_force')

drop_table

drop_table(
    table: Union[str, Table],
    force: bool = False,
    if_not_exists: Literal["error", "ignore"] = "error",
) -> None

Drop a table, view, or snapshot.

Parameters:

  • table (Union[str, Table]) –

    Fully qualified name, or handle, of the table to be dropped.

  • force (bool, default: False ) –

    If True, will also drop all views and sub-views of this table.

  • if_not_exists (Literal['error', 'ignore'], default: 'error' ) –

    Directive regarding how to handle if the path does not exist. Must be one of the following:

    • 'error': raise an error
    • 'ignore': do nothing and return

Raises:

  • Error

    if the qualified name

    • is invalid, or
    • does not exist and if_not_exists='error', or
    • does not designate a table object, or
    • designates a table object but has dependents and force=False.

Examples:

Drop a table by its fully qualified name:

>>> pxt.drop_table('subdir.my_table')

Drop a table by its handle:

>>> t = pxt.get_table('subdir.my_table')
... pxt.drop_table(t)

Drop a table if it exists, otherwise do nothing:

>>> pxt.drop_table('subdir.my_table', if_not_exists='ignore')

Drop a table and all its dependents:

>>> pxt.drop_table('subdir.my_table', force=True)

get_table

get_table(path: str) -> Table

Get a handle to an existing table, view, or snapshot.

Parameters:

  • path (str) –

    Path to the table.

Returns:

Raises:

  • Error

    If the path does not exist or does not designate a table object.

Examples:

Get handle for a table in the top-level directory:

>>> tbl = pxt.get_table('my_table')

For a table in a subdirectory:

>>> tbl = pxt.get_table('subdir.my_table')

Handles to views and snapshots are retrieved in the same way:

>>> tbl = pxt.get_table('my_snapshot')

init

init() -> None

Initializes the Pixeltable environment.

list_tables

list_tables(dir_path: str = '', recursive: bool = True) -> list[str]

List the Tables in a directory.

Parameters:

  • dir_path (str, default: '' ) –

    Path to the directory. Defaults to the root directory.

  • recursive (bool, default: True ) –

    If False, returns only those tables that are directly contained in specified directory; if True, returns all tables that are descendants of the specified directory, recursively.

Returns:

  • list[str]

    A list of Table paths.

Raises:

  • Error

    If the path does not exist or does not designate a directory.

Examples:

List tables in top-level directory:

>>> pxt.list_tables()

List tables in 'dir1':

>>> pxt.list_tables('dir1')

list_dirs

list_dirs(path: str = '', recursive: bool = True) -> list[str]

List the directories in a directory.

Parameters:

  • path (str, default: '' ) –

    Name or path of the directory.

  • recursive (bool, default: True ) –

    If True, lists all descendants of this directory recursively.

Returns:

  • list[str]

    List of directory paths.

Raises:

  • Error

    If path_str does not exist or does not designate a directory.

Examples:

>>> cl.list_dirs('my_dir', recursive=True)
['my_dir', 'my_dir.sub_dir1']

move

move(path: str, new_path: str) -> None

Move a schema object to a new directory and/or rename a schema object.

Parameters:

  • path (str) –

    absolute path to the existing schema object.

  • new_path (str) –

    absolute new path for the schema object.

Raises:

  • Error

    If path does not exist or new_path already exists.

Examples:

Move a table to a different directory:

>>>> pxt.move('dir1.my_table', 'dir2.my_table')

Rename a table:

>>>> pxt.move('dir1.my_table', 'dir1.new_name')

drop_dir

drop_dir(
    path: str,
    force: bool = False,
    if_not_exists: Literal["error", "ignore"] = "error",
) -> None

Remove a directory.

Parameters:

  • path (str) –

    Name or path of the directory.

  • force (bool, default: False ) –

    If True, will also drop all tables and subdirectories of this directory, recursively, along with any views or snapshots that depend on any of the dropped tables.

  • if_not_exists (Literal['error', 'ignore'], default: 'error' ) –

    Directive regarding how to handle if the path does not exist. Must be one of the following:

    • 'error': raise an error
    • 'ignore': do nothing and return

Raises:

  • Error

    If the path

    • is invalid, or
    • does not exist and if_not_exists='error', or
    • is not designate a directory, or
    • is a direcotory but is not empty and force=False.

Examples:

Remove a directory, if it exists and is empty:

>>> pxt.drop_dir('my_dir')

Remove a subdirectory:

>>> pxt.drop_dir('my_dir.sub_dir')

Remove an existing directory if it is empty, but do nothing if it does not exist:

>>> pxt.drop_dir('my_dir.sub_dir', if_not_exists='ignore')

Remove an existing directory and all its contents:

>>> pxt.drop_dir('my_dir', force=True)