Pinned Introducing Unsloth Studio ✨ A new open-source web UI to train and run LLMs. • Run models locally on Mac, Windows, Linux • Train 500+ models 2x faster with 70% less VRAM • Supports GGUF, vision, audio, embedding models • Auto-create datasets from PDF, CSV, DOCX •
You can now run Kimi K2.7 Code locally! 🌘 We shrank the 1T model to 325GB (-48%) via Dynamic 2-bit where important layers are upcasted. Run at >40 tok/s on 330GB RAM/VRAM setups. Run full precision on 610 GB. Guide: unsloth.ai/docs/models/ki…GGUF: huggingface.co/unsloth/Kimi-K…
Unsloth AI reposted Local AI in action! MiniMax M3 unning locally on a single M3 Ultra 512GB in Unsloth Studio! 🔥 Here UD-Q5_K_XL decoding at 32.5 toks/s!
Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚡️ MTP enables Google Gemma 4 run ~1.4–2.2× faster with no accuracy loss. Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s. GGUFs + Guide: unsloth.ai/docs/models/mtp
Google releases DiffusionGemma.✨ The new 26B-A4B diffusion text model runs locally on 18GB RAM. It supports high-speed text generation, thinking, image, video and 256K context. Run and train via Unsloth Studio. GGUF: huggingface.co/unsloth/diffus…Guide: unsloth.ai/docs/models/di…
You can now run NVIDIA Nemotron 3 Ultra, a new 550B open model. Nemotron-3-Ultra-550B-A55B is NVIDIA's largest LLM yet, with 1M context, frontier coding & chat. Run 2-bit on 200GB RAM, 3-bit on 256GB, 8-bit on 600GB. GGUF: huggingface.co/unsloth/NVIDIA…Guide: unsloth.ai/docs/models/ne…
2-bit Gemma 4 12B GGUF, only 4.66 GB on disk, managed to cite 15 sites from a single prompt. Try this locally on >6GB RAM via Unsloth Studio. GitHub: github.com/unslothai/unsl…
Vision and audio support for Gemma 4 12B GGUF is now added. Please update to the latest version of Unsloth and llama.cpp. 🙏
Local models are coming to your laptop soon! 🚀 We're excited to partner with @Microsoft to enable millions of developers run local models on Windows!