The VisionPromptAgent
handles image-based queries.
VisionPromptAgent
is initialized with two arguments:
Given an image of an X-ray scan,query = "Is there a fracture in the bone?"
Given a frame of security camera footage,query = "Identify any suspicious activities or individuals in this security camera footage."
Given a photo of a manufactured product,query = "Inspect this product for any defects or irregularities."
MultimodalLLM
models can be found below:VisionPromptAgent
designed for a Workflow to detect damage on an egg.