Skip to content

pixeltable.io

pixeltable.io

create_label_studio_project

create_label_studio_project(
    t: Table,
    label_config: str,
    name: Optional[str] = None,
    title: Optional[str] = None,
    media_import_method: Literal["post", "file", "url"] = "post",
    col_mapping: Optional[dict[str, str]] = None,
    sync_immediately: bool = True,
    **kwargs: Any
) -> SyncStatus

Creates a new Label Studio project and links it to the specified Table.

The required parameter label_config specifies the Label Studio project configuration, in XML format, as described in the Label Studio documentation. The linked project will have one column for each data field in the configuration; for example, if the configuration has an entry

<Image name="image_obj" value="$image"/>

then the linked project will have a column named image. In addition, the linked project will always have a JSON-typed column annotations representing the output.

By default, Pixeltable will link each of these columns to a column of the specified Table with the same name. If any of the data fields are missing, an exception will be raised. If the annotations column is missing, it will be created. The default names can be overridden by specifying an optional col_mapping, with Pixeltable column names as keys and Label Studio field names as values. In all cases, the Pixeltable columns must have types that are consistent with their corresponding Label Studio fields; otherwise, an exception will be raised.

The API key and URL for a valid Label Studio server must be specified in Pixeltable config. Either:

  • Set the LABEL_STUDIO_API_KEY and LABEL_STUDIO_URL environment variables; or
  • Specify api_key and url fields in the label-studio section of $PIXELTABLE_HOME/config.yaml.

Parameters:

  • t (Table) –

    The Table to link to.

  • label_config (str) –

    The Label Studio project configuration, in XML format.

  • name (Optional[str], default: None ) –

    An optional name for the new project in Pixeltable. If specified, must be a valid Pixeltable identifier and must not be the name of any other external data store linked to t. If not specified, a default name will be used of the form ls_project_0, ls_project_1, etc.

  • title (Optional[str], default: None ) –

    An optional title for the Label Studio project. This is the title that annotators will see inside Label Studio. Unlike name, it does not need to be an identifier and does not need to be unique. If not specified, the table name t.get_name() will be used.

  • media_import_method (Literal['post', 'file', 'url'], default: 'post' ) –

    The method to use when transferring media files to Label Studio: - post: Media will be sent to Label Studio via HTTP post. This should generally only be used for prototyping; due to restrictions in Label Studio, it can only be used with projects that have just one data field, and does not scale well. - file: Media will be sent to Label Studio as a file on the local filesystem. This method can be used if Pixeltable and Label Studio are running on the same host. - url: Media will be sent to Label Studio as externally accessible URLs. This method cannot be used with local media files or with media generated by computed columns. The default is post.

  • col_mapping (Optional[dict[str, str]], default: None ) –

    An optional mapping of local column names to Label Studio fields.

  • sync_immediately (bool, default: True ) –

    If True, immediately perform an initial synchronization by exporting all rows of the Table as Label Studio tasks.

  • kwargs (Any, default: {} ) –

    Additional keyword arguments are passed to the start_project method in the Label Studio SDK, as described here: https://labelstud.io/sdk/project.html#label_studio_sdk.project.Project.start_project

import_csv

import_csv(
    table_path: str,
    filepath_or_buffer,
    schema_overrides: Optional[dict[str, ColumnType]] = None,
    **kwargs
) -> InsertableTable

Creates a new Table from a csv file. This is a convenience method and is equivalent to calling import_pandas(table_path, pd.read_csv(filepath_or_buffer, **kwargs), schema=schema). See the Pandas documentation for read_csv for more details.

import_excel

import_excel(
    table_path: str,
    io,
    *args,
    schema_overrides: Optional[dict[str, ColumnType]] = None,
    **kwargs
) -> InsertableTable

Creates a new Table from an excel (.xlsx) file. This is a convenience method and is equivalent to calling import_pandas(table_path, pd.read_excel(io, *args, **kwargs), schema=schema). See the Pandas documentation for read_excel for more details.

import_pandas

import_pandas(
    tbl_name: str,
    df: DataFrame,
    *,
    schema_overrides: Optional[dict[str, ColumnType]] = None
) -> InsertableTable

Creates a new Table from a Pandas DataFrame, with the specified name. The schema of the table will be inferred from the DataFrame, unless schema is specified.

The column names of the new Table will be identical to those in the DataFrame, as long as they are valid Pixeltable identifiers. If a column name is not a valid Pixeltable identifier, it will be normalized according to the following procedure: - first replace any non-alphanumeric characters with underscores; - then, preface the result with the letter 'c' if it begins with a number or an underscore; - then, if there are any duplicate column names, suffix the duplicates with '_2', '_3', etc., in column order.

Parameters:

  • tbl_name (str) –

    The name of the table to create.

  • df (DataFrame) –

    The Pandas DataFrame.

  • schema_overrides (Optional[dict[str, ColumnType]], default: None ) –

    If specified, then for each (name, type) pair in schema_overrides, the column with name name will be given type type, instead of being inferred from the DataFrame. The keys in schema_overrides should be the column names of the DataFrame (whether or not they are valid Pixeltable identifiers).