Contrastive Language-Image Pretrained Model
CLIP
class is designed to perform image classification using the CLIP (Contrastive Language-Image Pretraining) model. It leverages a pre-trained model from OpenAI to classify images into predefined categories based on textual descriptions.
__init__(self, model_card: str = "openai/clip-vit-large-patch14")
CLIP
class.
model_card
: A string identifier for the pre-trained CLIP model to be used. Default is "openai/clip-vit-large-patch14"
.load_resources(self)
model_card
.
release_resources(self)
classify(self, image: Image.Image, classes: list) -> Detections
image
: An instance of Image.Image
to be classified.classes
: A list of strings representing the class labels to classify the image against.Detections
object containing:
xyxy
: Bounding box coordinates (always zero as CLIP does not provide localization).class_ids
: Array containing the index of the predicted class.confidence
: Array containing the confidence score of the prediction.classes
: Array of class labels provided.detection_type
: Type of detection, which is DetectionType.CLASSIFICATION
in this case.