Xuan Son NGUYEN - Software engineer (original) (raw)
My favorite recipe
A rich, savory fusion of modern AI and artisanal computing, Local LLM Runtime à la Carte is a hand-crafted digital delicacy designed for those with a taste for privacy, speed, and full control.

Ingredients
- A computer (GPU is not required)
- A modern OS (Linux, Mac, Windows)
- 10GB+ of disk space
- A terminal or command-line interface
- Maybe a cup of café or tea, whatever
Step 1: Install llama.cpp
Right, listen up! Installing llama.cpp is simple!
Follow this beautiful install guide for your OS.
Step 2: Pick a model
I've teamed up with the absolute legends at LM Studio, Bartowski, and Unsloth to serve you the most exquisite, perfectly quantized GGUF models. Models around 8 billion parameters? Chef's kiss - that's your sweet spot, the perfect balance between performance and quality. Beautiful!
And here's the beautiful part - no manual downloads, no faff! Just grab the model's Hugging Face repository name in the format <user>/<model>. Write it down, respect it - we'll need this golden ticket for the next step.
Step 3: Fire it up!
Command line warriors, this one's for you - clean and simple:
llama-cli -hf <user>/<model>
Or maybe you want the full restaurant experience? Spawn a server and get that gorgeous web UI at http://127.0.0.1:8080 - it's stunning!
llama-server -hf <user>/<model>
The secret sauce: Multimodal magic!
Listen, I didn't just add vision and audio support to llama.cpp - I perfected it! Grab any compatible model from this incredible collection and upload images or audio files straight through the web UI. It's so smooth, so elegant - even Gordon Ramsay would be proud!

This website is designed and coded by me.