This is pure genius! Thank you!
Hello all. I'm new here, I'm a french engineer. I was searching for a solution to self-host Mistral for days and couldn’t find the right way to do it correctly with Python and llama.cpp. I just couldn’t manage to offload the model to the GPU without CUDA errors. After lots of digging, I discovered vLLM and then Ollama. Just want to say THANK YOU! 🙌 This program works flawlessly from scratch on Docker 🐳, and I’ll now implement it to auto-start Mistral and run directly in memory 🧠⚡. This is incredible, huge thanks to the devs! 🚀🔥