The best Local Vision Language Model
QwenVL
class encapsulates the functionality of the QwenVL models, providing methods for loading the model, detecting objects based on textual descriptions, managing resources, and generating responses based on prompts. It supports multiple model types including base, int4, fp16, and bf16, each optimized for different hardware configurations."base"
, "int4"
, "fp16"
, and "bf16"
. The "int4"
model requires additional dependencies and setup for quantization support."int4"
quantized version of the model, but this does allow you to run the model with only 11GB VRAM.
These instructions are outling in Extra Installation