huggingface
pixeltable.functions.huggingface
Pixeltable UDFs
that wrap various models from the Hugging Face transformers
package.
These UDFs will cause Pixeltable to invoke the relevant models locally. In order to use them, you must
first pip install transformers
(or in some cases, sentence-transformers
, as noted in the specific
UDFs).
clip_image
clip_image(image: ImageT, *, model_id: str) -> ArrayT
Computes a CLIP embedding for the specified image. model_id
should be a reference to a pretrained
CLIP Model.
Requirements:
pip install transformers
Parameters:
-
image
(ImageT
) –The image to embed.
-
model_id
(str
) –The pretrained model to use for the embedding.
Returns:
-
ArrayT
–An array containing the output of the embedding model.
Examples:
Add a computed column that applies the model openai/clip-vit-base-patch32
to an existing
Pixeltable column image
of the table tbl
:
>>> tbl['result'] = clip_image(tbl.image, model_id='openai/clip-vit-base-patch32')
clip_text
clip_text(text: str, *, model_id: str) -> ArrayT
Computes a CLIP embedding for the specified text. model_id
should be a reference to a pretrained
CLIP Model.
Requirements:
pip install transformers
Parameters:
-
text
(str
) –The string to embed.
-
model_id
(str
) –The pretrained model to use for the embedding.
Returns:
-
ArrayT
–An array containing the output of the embedding model.
Examples:
Add a computed column that applies the model openai/clip-vit-base-patch32
to an existing
Pixeltable column tbl.text
of the table tbl
:
>>> tbl['result'] = clip_text(tbl.text, model_id='openai/clip-vit-base-patch32')
cross_encoder
cross_encoder(sentences1: str, sentences2: str, *, model_id: str) -> float
Performs predicts on the given sentence pair.
model_id
should be a pretrained Cross-Encoder model, as described in the
Cross-Encoder Pretrained Models
documentation.
Requirements:
pip install sentence-transformers
Parameters:
-
sentences1
(str
) –The first sentence to be paired.
-
sentences2
(str
) –The second sentence to be paired.
-
model_id
(str
) –The identifier of the cross-encoder model to use.
Returns:
-
float
–The similarity score between the inputs.
Examples:
Add a computed column that applies the model ms-marco-MiniLM-L-4-v2
to the sentences in
columns tbl.sentence1
and tbl.sentence2
:
>>> tbl['result'] = sentence_transformer(
tbl.sentence1, tbl.sentence2, model_id='ms-marco-MiniLM-L-4-v2'
)
detr_for_object_detection
detr_for_object_detection(
image: ImageT, *, model_id: str, threshold: float = 0.5
) -> JsonT
Computes DETR object detections for the specified image. model_id
should be a reference to a pretrained
DETR Model.
Requirements:
pip install transformers
Parameters:
-
image
(ImageT
) –The image to embed.
-
model_id
(str
) –The pretrained model to use for object detection.
Returns:
-
JsonT
–A dictionary containing the output of the object detection model, in the following format:
{ 'scores': [0.99, 0.999], # list of confidence scores for each detected object 'labels': [25, 25], # list of COCO class labels for each detected object 'label_text': ['giraffe', 'giraffe'], # corresponding text names of class labels 'boxes': [[51.942, 356.174, 181.481, 413.975], [383.225, 58.66, 605.64, 361.346]] # list of bounding boxes for each detected object, as [x1, y1, x2, y2] }
Examples:
Add a computed column that applies the model facebook/detr-resnet-50
to an existing
Pixeltable column image
of the table tbl
:
>>> tbl['detections'] = detr_for_object_detection(
... tbl.image,
... model_id='facebook/detr-resnet-50',
... threshold=0.8
... )
detr_to_coco
detr_to_coco(image: ImageT, detr_info: JsonT) -> JsonT
Converts the output of a DETR object detection model to COCO format.
Parameters:
-
image
(ImageT
) –The image for which detections were computed.
-
detr_info
(JsonT
) –The output of a DETR object detection model, as returned by
detr_for_object_detection
.
Returns:
-
JsonT
–A dictionary containing the data from
detr_info
, converted to COCO format.
Examples:
Add a computed column that converts the output tbl.detections
to COCO format, where tbl.image
is the image for which detections were computed:
>>> tbl['detections_coco'] = detr_to_coco(tbl.image, tbl.detections)
sentence_transformer
sentence_transformer(
sentence: str, *, model_id: str, normalize_embeddings: bool = False
) -> ArrayT
Computes sentence embeddings. model_id
should be a pretrained Sentence Transformers model, as described
in the Sentence Transformers Pretrained Models
documentation.
Requirements:
pip install sentence-transformers
Parameters:
-
sentence
(str
) –The sentence to embed.
-
model_id
(str
) –The pretrained model to use for the encoding.
-
normalize_embeddings
(bool
, default:False
) –If
True
, normalizes embeddings to length 1; see the Sentence Transformers API Docs for more details
Returns:
-
ArrayT
–An array containing the output of the embedding model.
Examples:
Add a computed column that applies the model all-mpnet-base-2
to an existing Pixeltable column tbl.sentence
of the table tbl
:
>>> tbl['result'] = sentence_transformer(tbl.sentence, model_id='all-mpnet-base-v2')
vit_for_image_classification
vit_for_image_classification(
image: ImageT, *, model_id: str, top_k: int = 5
) -> JsonT
Computes image classifications for the specified image using a Vision Transformer (ViT) model.
model_id
should be a reference to a pretrained ViT Model.
Note: Be sure the model is a ViT model that is trained for image classification (that is, a model designed for
use with the
ViTForImageClassification
class), such as google/vit-base-patch16-224
. General feature-extraction models such as
google/vit-base-patch16-224-in21k
will not produce the desired results.
Requirements:
pip install transformers
Parameters:
-
image
(ImageT
) –The image to classify.
-
model_id
(str
) –The pretrained model to use for the classification.
-
top_k
(int
, default:5
) –The number of classes to return.
Returns:
-
JsonT
–A list of the
top_k
highest-scoring classes for each image. Each element in the list is a dictionary in the following format:{ 'p': 0.230, # class probability 'class': 935, # class ID 'label': 'mashed potato', # class label }
Examples:
Add a computed column that applies the model google/vit-base-patch16-224
to an existing
Pixeltable column image
of the table tbl
:
>>> tbl['image_class'] = vit_for_image_classification(
... tbl.image,
... model_id='google/vit-base-patch16-224'
... )