In this blog we will talk about how to install Pixtral on a GPU cloud. Pixtral is a multimodel with exceptional vision capabilities.
Checkout this blog in video First we begin with installing VLLM. VLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. We will use VLLM to serve Pixtral.
!pip install --upgrade vllm
Next, we install mistral_common
library. This library will allow us to work with Pixtral.
!pip install --upgrade mistral_common
Now, we need to get the HuggingFace tokens, in order to download the model.
import getpass
hf_token = getpass.getpass('Enter your HF Token:')
We authenticate with HuggingFace using the provided token.
from huggingface_hub import login
login(hf_token)
Now we download the model from HuggingFace and run a test inference to check if everything is working fine.
from vllm import LLM
from vllm.sampling_params import SamplingParams
model_name = "mistralai/Pixtral-12B-2409"
sampling_params = SamplingParams(max_tokens=8192)
llm = LLM(model=model_name, tokenizer_mode="mistral", )
prompt = "Describe this image in one sentence."
image_url = "https://picsum.photos/id/237/200/300"
messages = [
{
"role": "user",
"content": [{"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": image_url}}]
},
]
outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
Another inference for fun 😁
prompt = "Describe this image in detail."
image_url = "https://99designs-blog.imgix.net/blog/wp-content/uploads/2019/03/0-e1552926315420.jpg?auto=format&q=60&w=1693&h=1494&fit=crop&crop=faces"
messages = [
{
"role": "user",
"content": [{"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": image_url}}]
},
]
outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
Conclusion: In this blog we learned how to install Pixtral on a GPU cloud. We also learned how to use VLLM to serve Pixtral.