Vision Prompt Agent

On this page

Initialization
Parameters
Example

Initialization

Parameters

The VisionPromptAgent is initialized with two arguments:

VisionPromptAgent(query, model)

query

string

required

The prompt used to analyze the image.Here are a few illustrative examples:

Given an image of an X-ray scan, query = "Is there a fracture in the bone?" Given a frame of security camera footage, query = "Identify any suspicious activities or individuals in this security camera footage." Given a photo of a manufactured product, query = "Inspect this product for any defects or irregularities."

model

MultimodalLLM

required

The selected model. All supported MultimodalLLM models can be found below:

Show Supported MultimodalLLMs

GPT4Vision()

MultimodalLLM (Default)

Supports gpt-4-turbo , gpt-4o .

Claude()

MultimodalLLM

Supports claude-3-opus-20240229 , claude-3-haiku-20240307 , claude-3-sonnet-20240229 .

Gemini()

MultimodalLLM

Supports gemini-pro-vision .

QwenVL()

MultimodalLLM

Supports qwen-vl-chat .

Example

Here is an example of the VisionPromptAgent designed for a Workflow to detect damage on an egg.

example.py

VisionPromptAgent("Is the egg damaged in anyway?",  model=GPT4Vision())

Instructor Image Agent OCR Agent

Getting Started

Types

Agents

Models

Examples

Vision Prompt Agent

Initialization

Parameters

Example

Getting Started

Types

Agents

Models

Examples

​Initialization

​Parameters

​Example

Initialization

Parameters

Example