Overview

Grounding DINO (Dynamic INteractive Object detection) is a model designed to perform object detection in images based on textual descriptions. It leverages deep learning techniques to understand and locate objects described by text within an image.

GroundingDINO()

The GroundingDINO class encapsulates the functionality of the Grounding DINO model, providing methods for loading the model, detecting objects based on textual descriptions, and managing resources.

Methods

GroundingDINO.detect(image, classes)
  • detect(image: Union[np.ndarray, Image.Image], classes: List[str])-> Detections:

This method detects objects in the provided image based on the classes described in text. Returns a BoundingBox Detections object.

When to use?

One of the main benifits of Grounding DINO is that it provides a good balance, between speed and accuracy, which makes it a good choice for annotating datasets. However, it may miss out on finer details compared to slower models like OwlV2.

Example Usage

wget -O kites.jpg "https://github.com/overeasy-sh/overeasy/blob/main/examples/kites.jpg?raw=true"
from overeasy import *
from PIL import Image
workflow = Workflow([
BoundingBoxSelectAgent(classes=["butterfly kite"], model=GroundingDino()),
])

image = Image.open("./kites.jpg")
results, graph = workflow.execute(image)
workflow.visualize(graph)

Note how the model missed the smaller butterfly kite in the back

For more information on Grounding DINO, refer to the original paper.