pixeltable.io
pixeltable.io
create_label_studio_project
create_label_studio_project(
t: Table,
label_config: str,
name: Optional[str] = None,
title: Optional[str] = None,
media_import_method: Literal["post", "file", "url"] = "post",
col_mapping: Optional[dict[str, str]] = None,
sync_immediately: bool = True,
**kwargs: Any
) -> SyncStatus
Creates a new Label Studio project and links it to the specified Table
.
The required parameter label_config
specifies the Label Studio project configuration,
in XML format, as described in the Label Studio documentation. The linked project will
have one column for each data field in the configuration; for example, if the
configuration has an entry
<Image name="image_obj" value="$image"/>
then the linked project will have a column named image
. In addition, the linked project
will always have a JSON-typed column annotations
representing the output.
By default, Pixeltable will link each of these columns to a column of the specified Table
with the same name. If any of the data fields are missing, an exception will be raised. If
the annotations
column is missing, it will be created. The default names can be overridden
by specifying an optional col_mapping
, with Pixeltable column names as keys and Label
Studio field names as values. In all cases, the Pixeltable columns must have types that are
consistent with their corresponding Label Studio fields; otherwise, an exception will be raised.
The API key and URL for a valid Label Studio server must be specified in Pixeltable config. Either:
- Set the
LABEL_STUDIO_API_KEY
andLABEL_STUDIO_URL
environment variables; or - Specify
api_key
andurl
fields in thelabel-studio
section of$PIXELTABLE_HOME/config.yaml
.
Parameters:
-
t
(Table
) –The Table to link to.
-
label_config
(str
) –The Label Studio project configuration, in XML format.
-
name
(Optional[str]
, default:None
) –An optional name for the new project in Pixeltable. If specified, must be a valid Pixeltable identifier and must not be the name of any other external data store linked to
t
. If not specified, a default name will be used of the formls_project_0
,ls_project_1
, etc. -
title
(Optional[str]
, default:None
) –An optional title for the Label Studio project. This is the title that annotators will see inside Label Studio. Unlike
name
, it does not need to be an identifier and does not need to be unique. If not specified, the table namet.get_name()
will be used. -
media_import_method
(Literal['post', 'file', 'url']
, default:'post'
) –The method to use when transferring media files to Label Studio: -
post
: Media will be sent to Label Studio via HTTP post. This should generally only be used for prototyping; due to restrictions in Label Studio, it can only be used with projects that have just one data field, and does not scale well. -file
: Media will be sent to Label Studio as a file on the local filesystem. This method can be used if Pixeltable and Label Studio are running on the same host. -url
: Media will be sent to Label Studio as externally accessible URLs. This method cannot be used with local media files or with media generated by computed columns. The default ispost
. -
col_mapping
(Optional[dict[str, str]]
, default:None
) –An optional mapping of local column names to Label Studio fields.
-
sync_immediately
(bool
, default:True
) –If
True
, immediately perform an initial synchronization by exporting all rows of theTable
as Label Studio tasks. -
kwargs
(Any
, default:{}
) –Additional keyword arguments are passed to the
start_project
method in the Label Studio SDK, as described here: https://labelstud.io/sdk/project.html#label_studio_sdk.project.Project.start_project
import_csv
import_csv(
table_path: str,
filepath_or_buffer,
schema_overrides: Optional[dict[str, ColumnType]] = None,
**kwargs
) -> InsertableTable
Creates a new Table
from a csv file. This is a convenience method and is equivalent
to calling import_pandas(table_path, pd.read_csv(filepath_or_buffer, **kwargs), schema=schema)
.
See the Pandas documentation for read_csv
for more details.
import_excel
import_excel(
table_path: str,
io,
*args,
schema_overrides: Optional[dict[str, ColumnType]] = None,
**kwargs
) -> InsertableTable
Creates a new Table
from an excel (.xlsx) file. This is a convenience method and is equivalent
to calling import_pandas(table_path, pd.read_excel(io, *args, **kwargs), schema=schema)
.
See the Pandas documentation for read_excel
for more details.
import_pandas
import_pandas(
tbl_name: str,
df: DataFrame,
*,
schema_overrides: Optional[dict[str, ColumnType]] = None
) -> InsertableTable
Creates a new Table
from a Pandas DataFrame
, with the specified name. The schema of the table
will be inferred from the DataFrame
, unless schema
is specified.
The column names of the new Table
will be identical to those in the DataFrame
, as long as they are valid
Pixeltable identifiers. If a column name is not a valid Pixeltable identifier, it will be normalized according to
the following procedure:
- first replace any non-alphanumeric characters with underscores;
- then, preface the result with the letter 'c' if it begins with a number or an underscore;
- then, if there are any duplicate column names, suffix the duplicates with '_2', '_3', etc., in column order.
Parameters:
-
tbl_name
(str
) –The name of the table to create.
-
df
(DataFrame
) –The Pandas
DataFrame
. -
schema_overrides
(Optional[dict[str, ColumnType]]
, default:None
) –If specified, then for each (name, type) pair in
schema_overrides
, the column with namename
will be given typetype
, instead of being inferred from theDataFrame
. The keys inschema_overrides
should be the column names of theDataFrame
(whether or not they are valid Pixeltable identifiers).