GitHub - chujiezheng/chat_templates: Chat Templates for 🤗 HuggingFace Large Language Models (original) (raw)
NOTICE: I may not update or maintain this repository actively, as I noticed that recent HF LLMs have usually implemented their chat templates in the tokenizer_config.json file. I am glad about this common practice and it will definitely facilitate out-of-box use of these powerful models.
This is a repository that includes proper chat templates (or input formats) for instruction-tuned large language models (LLMs), to support transformers's chat_template feature. If you are interested to include more chat templates, feel free to open a pull request
If you find this repo useful, please kindly cite it:
@misc{zheng-2024-chat-templates, author = {Zheng, Chujie}, title = {Chat Templates for HuggingFace Large Language Models}, year = {2024}, howpublished = {\url{https://github.com/chujiezheng/chat_templates}} }
Updates
- [11/2024] Added support for Meta's Llama-3.2 models
- [10/2024] Added support for IBM's Granite-3.0 models
- [07/2024] Added support for Meta's Llama-3.1 models
- [06/2024] Added support for Google's Gemma-2 models
- [05/2024] Added support for Nvidia's ChatQA models
- [04/2024] Added support for Microsoft's Phi-3 models
- [04/2024] Added support for Meta's Llama-3 models
- [02/2024] Added support for Google's Gemma models
- [02/2024] Added usage explanation for generation_configs
- [01/2024] Added support for Alibaba's Qwen2 models
What are Contained in This Repo?
- chat_templates contains the jinja files of collected chat templates, which can be directly replaced in the Huggingface tokenizers
- generation_configs contains the corresponding json configs used for controlling the ending of response generations. Specially, the
stop_token_idsshould be directly passed into thegeneratemethod by theeos_token_idargument
Usage Examples
Important NOTE: As mentioned in this issue, the messages should contain at least one user message. It is strongly not recommented to pass only the system message, as there may result in unexpected outputs (because the models are not trained in this way).
Example 1: Meta-Llama-3-8B-Instruct
This example may check if the jinja file is correctly implemented.
from transformers import AutoTokenizer
toker = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", token="YOUR_OWN_TOKEN") messages = [ {'role': 'system', 'content': 'This is a system prompt.'}, {'role': 'user', 'content': 'This is the first user input.'}, {'role': 'assistant', 'content': 'This is the first assistant response.'}, {'role': 'user', 'content': 'This is the second user input.'}, ] print('###### Default (yet Correct) Chat Template ######') print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)) print('###### Corrected Chat Template ######') chat_template = open('./chat_templates/llama-3-instruct.jinja').read() chat_template = chat_template.replace(' ', '').replace('\n', '') toker.chat_template = chat_template print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
Expected output:
###### Default (yet Correct) Chat Template ######
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
This is a system prompt.<|eot_id|><|start_header_id|>user<|end_header_id|>
This is the first user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
This is the first assistant response.<|eot_id|><|start_header_id|>user<|end_header_id|>
This is the second user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
###### Corrected Chat Template ######
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
This is a system prompt.<|eot_id|><|start_header_id|>user<|end_header_id|>
This is the first user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
This is the first assistant response.<|eot_id|><|start_header_id|>user<|end_header_id|>
This is the second user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Example 2: Mistral-7B-Instruct-v0.2
For mistral-instruct (also gemma-it), it does not natively support the system message, so passing the system message would raise error.
from transformers import AutoTokenizer
toker = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") messages = [ {'role': 'system', 'content': 'This is a system prompt.'}, {'role': 'user', 'content': 'This is the first user input.'}, {'role': 'assistant', 'content': 'This is the first assistant response.'}, {'role': 'user', 'content': 'This is the second user input.'}, ] print('###### Default (but Improper) Chat Template ######')
raising error
#print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)) print('###### Corrected Chat Template ######') chat_template = open('./chat_templates/mistral-instruct.jinja').read() chat_template = chat_template.replace(' ', '').replace('\n', '') toker.chat_template = chat_template print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
Expected output:
###### Default (but Error-Raising) Chat Template ######
jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...
###### Corrected Chat Template ######
<s>[INST] This is a system prompt.
This is the first user input. [/INST] This is the first assistant response. </s>[INST] This is the second user input. [/INST]
Example 3: vicuna-7b-v1.5
NOTE: In fast-chat, vicuna does not add linebreaks between roles' messages. But I found that adding linebreaks leads to a bit better performance (especially for the v1.5 version).
Also, I found vicuna-7/13/33b-v1.3 may not work well when given a system message different from its default one. So I would recommend to use vicuna-7/13b-v1.5 instead.
from transformers import AutoTokenizer
toker = AutoTokenizer.from_pretrained("lmsys/vicuna-7b-v1.5") messages = [ {'role': 'system', 'content': 'This is a system prompt.'}, {'role': 'user', 'content': 'This is the first user input.'}, {'role': 'assistant', 'content': 'This is the first assistant response.'}, {'role': 'user', 'content': 'This is the second user input.'}, ] print('###### Default (but Improper) Chat Template ######') print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)) print('###### Corrected Chat Template ######') chat_template = open('./chat_templates/vicuna.jinja').read() chat_template = chat_template.replace(' ', '').replace('\n', '') toker.chat_template = chat_template print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
Expected output:
###### Default (but Improper) Chat Template ######
<s>[INST] <<SYS>>
This is a system prompt.
<</SYS>>
This is the first user input. [/INST] This is the first assistant response. </s><s>[INST] This is the second user input. [/INST]
###### Corrected Chat Template ######
<s>This is a system prompt.
USER: This is the first user input.
ASSISTANT: This is the first assistant response.</s>
USER: This is the second user input.
ASSISTANT:
Supported Models
NOTE: The listed models are not inclusive and also include other-sized ones in the same model family
granite-3.0-instruct
- Models:
ibm-granite/granite-3.0-2b-instruct,ibm-granite/granite-3.0-8b-instruct,ibm-granite/granite-3.0-1b-a400m-instruct,ibm-granite/granite-3.0-3b-a800m-instruct - Chat template:
chat_templates/granite-3.0-instruct.jinja - Generation config:
generation_configs/granite-3.0-instruct.json - Reference: https://huggingface.co/ibm-granite/granite-3.0-8b-instruct/blob/main/tokenizer_config.json#L188 Llama-3.2-Instruct
- Models:
meta-llama/Meta-Llama-3.2-1B-Instruct,meta-llama/Meta-Llama-3.2-3B-Instruct - Chat template:
chat_templates/llama-3-instruct.jinja - Generation config:
generation_configs/llama-3.1-instruct.json - Reference: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/blob/main/tokenizer_config.json#L2053 Llama-3.1-Instruct
- Models:
meta-llama/Meta-Llama-3.1-8B-Instruct,meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 - Chat template:
chat_templates/llama-3-instruct.jinja - Generation config:
generation_configs/llama-3.1-instruct.json - Reference: https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/tokenizer_config.json#L2053 Llama-3-Instruct
- Models:
meta-llama/Meta-Llama-3-8B-Instruct - Chat template:
chat_templates/llama-3-instruct.jinja - Generation config:
generation_configs/llama-3-instruct.json - Reference: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/tokenizer_config.json#L2053 Llama-2-Chat, CodeLlama-Instruct
- Models:
meta-llama/Llama-2-7b-chat-hf,meta-llama/CodeLlama-7b-Instruct-hf - Chat template:
chat_templates/llama-2-chat.jinja - Generation config:
generation_configs/llama-2-chat.json - Reference: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/tokenizer_config.json#L12 Qwen2-Instruct, Qwen1.5-Chat
- Models:
Qwen/Qwen2-7B-Instruct,Qwen/Qwen1.5-7B-Chat - Chat template:
chat_templates/chatml.jinja - Generation config:
generation_configs/qwen2-instruct.json - Reference: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/tokenizer_config.json#L31 Mistral-Instruct
- Models:
mistralai/Mistral-7B-Instruct-v0.3,mistralai/Mixtral-8x7B-Instruct-v0.1 - Chat template:
chat_templates/mistral-instruct.jinja - Generation config:
generation_configs/mistral-instruct.json - Reference: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/blob/main/tokenizer_config.json#L42
- Comment: System message is acceptable by prepending it before the first user input Phi-3-Instruct
- Models:
microsoft/Phi-3-mini-4k-instruct,microsoft/Phi-3-small-8k-instruct - Chat template:
chat_templates/phi-3.jinja,chat_templates/phi-3-small.jinja - Generation config:
generation_configs/phi-3.json,generation_configs/phi-3-small.json - Reference: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/tokenizer_config.json#L338
- Note:
Phi-3-mini/mediumandPhi-3-smalladopt different configs Yi-1.5-Chat, Yi-Chat - Models:
01-ai/Yi-1.5-6B-Chat,01-ai/Yi-6B-Chat - Chat template:
chat_templates/chatml.jinja - Generation config:
generation_configs/yi-chat.json - Reference: https://huggingface.co/01-ai/Yi-6B-Chat/blob/main/tokenizer_config.json#L60 gemma-it, gemma-2-it
- Models:
google/gemma-7b-it,google/gemma-2-9b-it - Chat template:
chat_templates/gemma-it.jinja - Generation config:
generation_configs/gemma-it.json - Reference: https://huggingface.co/google/gemma-7b-it/blob/main/tokenizer_config.json#L1507
- Comment: System message is acceptable Llama3-ChatQA-1.5
- Models:
nvidia/Llama3-ChatQA-1.5-8B - Chat template:
chat_templates/chatqa.jinja - Generation config:
generation_configs/chatqa.json - Reference: https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B#when-context-is-available
- Comment: Context message is acceptable openchat-3.5, Starling-LM
- Models:
openchat/openchat_3.5,berkeley-nest/Starling-LM-7B-alpha - Chat template:
chat_templates/openchat-3.5.jinja - Generation config:
generation_configs/openchat-3.5.json - Reference: https://huggingface.co/openchat/openchat_3.5/blob/main/tokenizer_config.json#L51 zephyr
- Models:
zephyr-7b-alpha - Chat template:
chat_templates/zephyr.jinja - Generation config:
generation_configs/zephyr.json - Reference: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json#L34 vicuna
- Models:
vicuna-7b-v1.5,vicuna-7b-v1.3 - Chat template:
chat_templates/vicuna.jinja - Generation config:
generation_configs/vicuna.json - Reference: https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md#prompt-template Orca-2
- Models:
microsoft/Orca-2-7b - Chat template:
chat_templates/chatml.jinja - Generation config:
generation_configs/orca-2.json - Reference: https://huggingface.co/microsoft/Orca-2-7b falcon-instruct
- Models:
tiiuae/falcon-7b-instruct - Chat template:
chat_templates/falcon-instruct.jinja - Reference: https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md#prompt-template SOLAR-Instruct
- Models:
upstage/SOLAR-10.7B-Instruct-v1.0 - Chat template:
chat_templates/solar-instruct.jinja - Generation config:
generation_configs/solar-instruct.json - Reference: https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0/blob/main/tokenizer_config.json#L31 Alpaca
- Models:
tatsu-lab/alpaca-7b-wdiff - Chat template:
chat_templates/alpaca.jinja - Generation config:
generation_configs/alpaca.json - Reference: https://github.com/tatsu-lab/stanford_alpaca AmberChat
- Models:
LLM360/AmberChat,LLM360/AmberSafe - Chat template:
chat_templates/amberchat.jinja - Generation config:
generation_configs/amberchat.json - Reference: https://huggingface.co/LLM360/AmberChat saiga
- Models:
IlyaGusev/saiga_mistral_7b_lora - Chat template:
chat_templates/saiga.jinja - Generation config:
generation_configs/saiga.json - Reference: https://huggingface.co/IlyaGusev/saiga_mistral_7b_lora#saigamistral-7b-russian-mistral-based-chatbot
- Comment: A series of Russian models