PPE Example
Overview
We will construct workflows designed to enforce safety equipment compliance in construction areas. Specifically, we will show a few possible approaches related to how you could construct a Workflow
to detect if workers are wearing PPE.
We will be using this image to test our results:
Approach 1: Using ClassificationAgent
This approach is very fast as it only uses local open-source models. However, it is somewhat more error-prone compared to the methods described below. This is because the ClassificationAgent
relies on CLIP
embedding models for zero-shot classification, which, while fast, does not offer the same level of robust image understanding as models like GPTVision
.
Prerequisite: The following approaches leverage the Instructor
library to generate structured data. If youβre not familiar with the library we recommend reading the Docs.
Approach 2: Using InstructorImageAgent
The InstructorImageAgent
generates structured outputs directly from images, streamlining the process with fewer steps while maintaining robustness. The BoundingBoxSelectAgent
allows the InstructorImageAgent
to focus its analysis on person detections.
In this example, we were able to split our image at the person level, instead of having to select for heads. This is possible because the InstructorImageAgent
is able to leverage GPTVision
to provide additional robustness.
Approach 3: Using DenseCaptioningAgent
This approach is similiar to the last approach, but takes the roundabout step of converting the image to a text caption before returning structured data. The benefit of this approach is efficiency.
Many effective ImageCaptioning models can be run locally, which means they can avoid expensive calls to Multimodal Language Models.
In this scenario InstructorTextAgent
can leverage a much cheaper model like GPT-3.5, or even a local LLM.
We also can leverage robust models like GPTVision
to improve the robustness of our model in DenseCaptioningAgent
if necessary.
This approach provides the flexibility in choosing between robustness and speed.