Grounding DINO
Dynamic INteractive Object detection
Overview
Grounding DINO (Dynamic INteractive Object detection) is a model designed to perform object detection in images based on textual descriptions. It leverages deep learning techniques to understand and locate objects described by text within an image.
The GroundingDINO
class encapsulates the functionality of the Grounding DINO model, providing methods for loading the model, detecting objects based on textual descriptions, and managing resources.
Methods
detect(image: Union[np.ndarray, Image.Image], classes: List[str])-> Detections
:
This method detects objects in the provided image
based on the classes
described in text.
Returns a BoundingBox Detections
object.
When to use?
One of the main benifits of Grounding DINO is that it provides a good balance, between speed and accuracy, which makes it a good choice for annotating datasets. However, it may miss out on finer details compared to slower models like OwlV2.
Example Usage
Note how the model missed the smaller butterfly kite in the back
For more information on Grounding DINO, refer to the original paper.