Tutorial: How to convert HuggingFace model to GGUF format · ggml-org/llama.cpp · Discussion #2948 (original) (raw)
Source: https://www.substratus.ai/blog/converting-hf-model-gguf-model/
I published this on our blog but though others here might benefit as well, so sharing the raw blog here on Github too. Hope it's helpful to folks here and feedback is welcome.
Downloading a HuggingFace model
There are various ways to download models, but in my experience the huggingface_hub
library has been the most reliable. The git clone
method occasionally results in
OOM errors for large models.
Install the huggingface_hub
library:
pip install huggingface_hub
Create a Python script named download.py
with the following content:
from huggingface_hub import snapshot_download model_id="lmsys/vicuna-13b-v1.5" snapshot_download(repo_id=model_id, local_dir="vicuna-hf", local_dir_use_symlinks=False, revision="main")
Run the Python script:
You should now have the model downloaded to a directory calledvicuna-hf
. Verify by running:
Converting the model
Now it's time to convert the downloaded HuggingFace model to a GGUF model.
Llama.cpp comes with a converter script to do this.
Get the script by cloning the llama.cpp repo:
git clone https://github.com/ggerganov/llama.cpp.git
Install the required python libraries:
pip install -r llama.cpp/requirements.txt
Verify the script is there and understand the various options:
python llama.cpp/convert.py -h
Convert the HF model to GGUF model:
python llama.cpp/convert.py vicuna-hf
--outfile vicuna-13b-v1.5.gguf
--outtype q8_0
In this case we're also quantizing the model to 8 bit by setting--outtype q8_0
. Quantizing helps improve inference speed, but it can
negatively impact quality.
You can use --outtype f16
(16 bit) or --outtype f32
(32 bit) to preserve original
quality.
Verify the GGUF model was created:
ls -lash vicuna-13b-v1.5.gguf
Pushing the GGUF model to HuggingFace
You can optionally push back the GGUF model to HuggingFace.
Create a Python script with the filename upload.py
that
has the following content:
from huggingface_hub import HfApi api = HfApi()
model_id = "substratusai/vicuna-13b-v1.5-gguf" api.create_repo(model_id, exist_ok=True, repo_type="model") api.upload_file( path_or_fileobj="vicuna-13b-v1.5.gguf", path_in_repo="vicuna-13b-v1.5.gguf", repo_id=model_id, )
Get a HuggingFace Token that has write permission from here:
https://huggingface.co/settings/tokens
Set your HuggingFace token:
export HUGGING_FACE_HUB_TOKEN=
Run the upload.py
script: