Skip to main content
Extras are currently only supported on Linux
Overeasy currently supports int4 quantization for running models like QwenVL. To use these models, you will need to install AutoGPTQ in a performant manner. Make sure to build the relevant CUDA extensions! In our example Colab, we install AutoGPTQ but we also install a prebuilt wheel so things are a bit easier. You can install AutoGPTQ from pip using the provided instructions. Alternatively, you can install AutoGPTQ from source:
Source Install
!pip install optimum tiktoken gekko einops transformers_stream_generator accelerate
!pip install git+https://github.com/AutoGPTQ/AutoGPTQ@v0.7.1
Note: Building the source install can take around 20 minutes.
After installing, you can use the int4 quantized model like this, as long as you have over 11GB of VRAM:
from overeasy import QwenVL
model = QwenVL("int4")
model.load_resources()
response = model.prompt("What is the capital of the California?")
I