GitHub - ggml-org/llama.vscode: VS Code extension for LLM-assisted code/text completion (original) (raw)

Local LLM-assisted text completion, chat with AI and agentic coding extension for VS Code

image


llama vscode-swift0

Features

Installation

VS Code extension setup

Install the llama-vscode extension from the VS Code extension marketplace:

image

Note: also available at Open VSX

llama.cpp setup

Prerequisites:

Show llama-vscode menu by clicking on llama-vscode in the status bar or Ctrl+Shift+M and select "Install/Upgrade llama.cpp". This will install llama.cpp automatically for Mac and Windows. For Linux get the latest binaries and add the bin folder to the path.

Once you have llama.cpp installed, you can select env for your needs from llama-vscode menu "Select/start env..."

Below are some details how to install llama.cpp manually (if you prefer it).

Mac OS

Windows

Any other OS

Either use the latest binaries or build llama.cpp from source. For more information how to run the llama.cpp server, please refer to the Wiki.

llama.cpp settings

Here are recommended settings, depending on the amount of VRAM that you have:

These are llama-server settings for CPU-only hardware. Note that the quality will be significantly lower:

llama-server
-hf ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF
--port 8012 -ub 512 -b 512 --ctx-size 0 --cache-reuse 256

llama-server
-hf ggml-org/Qwen2.5-Coder-0.5B-Q8_0-GGUF
--port 8012 -ub 1024 -b 1024 --ctx-size 0 --cache-reuse 256

You can use any other FIM-compatible model that your system can handle. By default, the models downloaded with the -hf flag are stored in:

The plugin requires FIM-compatible models: HF collection

Llama Agent

The extension includes Llama Agent

Features

Usage

  1. Open Llama Agent with Ctrl+Shift+A or from llama-vscode menu "Show Llama Agent"
  2. Select Env with an agent if you haven't done it before.
  3. Write a query and attach files with the @ button if needed

More details(https://github.com/ggml-org/llama.vscode/wiki)

Examples

Speculative FIMs running locally on a M2 Studio:

llama-vscode-1.mp4

Implementation details

The extension aims to be very simple and lightweight and at the same time to provide high-quality and performant local FIM completions, even on consumer-grade hardware.

Other IDEs