Xuan Son NGUYEN - Software engineer (original) (raw)

My favorite recipe

A rich, savory fusion of modern AI and artisanal computing, Local LLM Runtime à la Carte is a hand-crafted digital delicacy designed for those with a taste for privacy, speed, and full control.

A cocktail of multiple OS

Ingredients

A computer (GPU is not required)
A modern OS (Linux, Mac, Windows)
10GB+ of disk space
A terminal or command-line interface
Maybe a cup of café or tea, whatever

Step 1: Install llama.cpp

Right, listen up! Installing llama.cpp is simple!
Follow this beautiful install guide for your OS.

Step 2: Pick a model

I've teamed up with the absolute legends at LM Studio, Bartowski, and Unsloth to serve you the most exquisite, perfectly quantized GGUF models. Models around 8 billion parameters? Chef's kiss - that's your sweet spot, the perfect balance between performance and quality. Beautiful!

And here's the beautiful part - no manual downloads, no faff! Just grab the model's Hugging Face repository name in the format <user>/<model>. Write it down, respect it - we'll need this golden ticket for the next step.

Step 3: Fire it up!

Command line warriors, this one's for you - clean and simple:

llama-cli -hf <user>/<model>

Or maybe you want the full restaurant experience? Spawn a server and get that gorgeous web UI at http://127.0.0.1:8080 - it's stunning!

llama-server -hf <user>/<model>

The secret sauce: Multimodal magic!

Listen, I didn't just add vision and audio support to llama.cpp - I perfected it! Grab any compatible model from this incredible collection and upload images or audio files straight through the web UI. It's so smooth, so elegant - even Gordon Ramsay would be proud!

Edvard Munch The Scream saying: Wow, so easy!

This website is designed and coded by me.