Extras are currently only supported on Linux

Overeasy currently supports int4 quantization for running models like QwenVL.

To use these models, you will need to install AutoGPTQ in a performant manner. Make sure to build the relevant CUDA extensions!

In our example Colab, we install AutoGPTQ but we also install a prebuilt wheel so things are a bit easier.

You can install AutoGPTQ from pip using the provided instructions.

Alternatively, you can install AutoGPTQ from source:

Source Install
!pip install optimum tiktoken gekko einops transformers_stream_generator accelerate
!pip install git+https://github.com/AutoGPTQ/AutoGPTQ@v0.7.1

Note: Building the source install can take around 20 minutes.

After installing, you can use the int4 quantized model like this, as long as you have over 11GB of VRAM:

from overeasy import QwenVL
model = QwenVL("int4")
model.load_resources()
response = model.prompt("What is the capital of the California?")