Embedding Models
CLIP
Contrastive Language-Image Pretrained Model
CLIP Class Documentation
Overview
The CLIP
class is designed to perform image classification using the CLIP (Contrastive Language-Image Pretraining) model. It leverages a pre-trained model from OpenAI to classify images into predefined categories based on textual descriptions.
Methods
__init__(self, model_card: str = "openai/clip-vit-large-patch14")
Constructor for the CLIP
class.
- Parameters:
model_card
: A string identifier for the pre-trained CLIP model to be used. Default is"openai/clip-vit-large-patch14"
.
load_resources(self)
Loads the necessary resources for the model including the processor and the model itself from the specified model_card
.
release_resources(self)
Releases the loaded resources to free up memory. This is particularly useful when working with limited memory resources or in a GPU environment to clear the cache.
classify(self, image: Image.Image, classes: list) -> Detections
Classifies an image into one of the provided classes.
- Parameters:
image
: An instance ofImage.Image
to be classified.classes
: A list of strings representing the class labels to classify the image against.
- Returns:
- A
Detections
object containing:xyxy
: Bounding box coordinates (always zero as CLIP does not provide localization).class_ids
: Array containing the index of the predicted class.confidence
: Array containing the confidence score of the prediction.classes
: Array of class labels provided.detection_type
: Type of detection, which isDetectionType.CLASSIFICATION
in this case.
- A
Example Usage
For more information on OwlV2, refer to the original paper.