Extras are currently only supported on Linux
Overeasy currently supports int4 quantization for running models like QwenVL.
To use these models, you will need to install AutoGPTQ in a performant manner. Make sure to build the relevant CUDA extensions!
In our example Colab, we install AutoGPTQ but we also install a prebuilt wheel so things are a bit easier.
You can install AutoGPTQ from pip using the provided instructions.
Alternatively, you can install AutoGPTQ from source:
Source Install
Note: Building the source install can take around 20 minutes.After installing, you can use the
int4 quantized model like this, as long as you have over 11GB of VRAM: