feat: support StarCoder model architectures by wsxiaoys 路 Pull Request #3187 路 ggml-org/llama.cpp (original) (raw)
added 15 commits
wsxiaoys marked this pull request as ready for review
feat: support starcoder mqa
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
pkrmf pushed a commit to morlockstudios-com/llama.cpp that referenced this pull request
add placeholder of starcoder in gguf / llama.cpp
support convert starcoder weights to gguf
convert MQA to MHA
fix ffn_down name
add LLM_ARCH_STARCODER to llama.cpp
set head_count_kv = 1
load starcoder weight
add max_position_embeddings
set n_positions to max_positioin_embeddings
properly load all starcoder params
fix head count kv
fix comments
fix vram calculation for starcoder
store mqa directly
add input embeddings handling
add TBD
working in cpu, metal buggy
cleanup useless code
metal : fix out-of-bounds access in soft_max kernels
llama : make starcoder graph build more consistent with others
refactor: cleanup comments a bit
add other starcoder models: 3B, 7B, 15B
support-mqa-directly
fix: remove max_position_embeddings, use n_train_ctx
Update llama.cpp
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Update llama.cpp
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Apply suggestions from code review
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- fix: switch to space from tab
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request
add placeholder of starcoder in gguf / llama.cpp
support convert starcoder weights to gguf
convert MQA to MHA
fix ffn_down name
add LLM_ARCH_STARCODER to llama.cpp
set head_count_kv = 1
load starcoder weight
add max_position_embeddings
set n_positions to max_positioin_embeddings
properly load all starcoder params
fix head count kv
fix comments
fix vram calculation for starcoder
store mqa directly
add input embeddings handling
add TBD
working in cpu, metal buggy
cleanup useless code
metal : fix out-of-bounds access in soft_max kernels
llama : make starcoder graph build more consistent with others
refactor: cleanup comments a bit
add other starcoder models: 3B, 7B, 15B
support-mqa-directly
fix: remove max_position_embeddings, use n_train_ctx
Update llama.cpp
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Update llama.cpp
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Apply suggestions from code review
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- fix: switch to space from tab
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request
add placeholder of starcoder in gguf / llama.cpp
support convert starcoder weights to gguf
convert MQA to MHA
fix ffn_down name
add LLM_ARCH_STARCODER to llama.cpp
set head_count_kv = 1
load starcoder weight
add max_position_embeddings
set n_positions to max_positioin_embeddings
properly load all starcoder params
fix head count kv
fix comments
fix vram calculation for starcoder
store mqa directly
add input embeddings handling
add TBD
working in cpu, metal buggy
cleanup useless code
metal : fix out-of-bounds access in soft_max kernels
llama : make starcoder graph build more consistent with others
refactor: cleanup comments a bit
add other starcoder models: 3B, 7B, 15B
support-mqa-directly
fix: remove max_position_embeddings, use n_train_ctx
Update llama.cpp
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Update llama.cpp
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Apply suggestions from code review
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- fix: switch to space from tab
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request
add placeholder of starcoder in gguf / llama.cpp
support convert starcoder weights to gguf
convert MQA to MHA
fix ffn_down name
add LLM_ARCH_STARCODER to llama.cpp
set head_count_kv = 1
load starcoder weight
add max_position_embeddings
set n_positions to max_positioin_embeddings
properly load all starcoder params
fix head count kv
fix comments
fix vram calculation for starcoder
store mqa directly
add input embeddings handling
add TBD
working in cpu, metal buggy
cleanup useless code
metal : fix out-of-bounds access in soft_max kernels
llama : make starcoder graph build more consistent with others
refactor: cleanup comments a bit
add other starcoder models: 3B, 7B, 15B
support-mqa-directly
fix: remove max_position_embeddings, use n_train_ctx
Update llama.cpp
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Update llama.cpp
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Apply suggestions from code review
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- fix: switch to space from tab
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request
add placeholder of starcoder in gguf / llama.cpp
support convert starcoder weights to gguf
convert MQA to MHA
fix ffn_down name
add LLM_ARCH_STARCODER to llama.cpp
set head_count_kv = 1
load starcoder weight
add max_position_embeddings
set n_positions to max_positioin_embeddings
properly load all starcoder params
fix head count kv
fix comments
fix vram calculation for starcoder
store mqa directly
add input embeddings handling
add TBD
working in cpu, metal buggy
cleanup useless code
metal : fix out-of-bounds access in soft_max kernels
llama : make starcoder graph build more consistent with others
refactor: cleanup comments a bit
add other starcoder models: 3B, 7B, 15B
support-mqa-directly
fix: remove max_position_embeddings, use n_train_ctx
Update llama.cpp
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Update llama.cpp
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Apply suggestions from code review
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- fix: switch to space from tab
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request
add placeholder of starcoder in gguf / llama.cpp
support convert starcoder weights to gguf
convert MQA to MHA
fix ffn_down name
add LLM_ARCH_STARCODER to llama.cpp
set head_count_kv = 1
load starcoder weight
add max_position_embeddings
set n_positions to max_positioin_embeddings
properly load all starcoder params
fix head count kv
fix comments
fix vram calculation for starcoder
store mqa directly
add input embeddings handling
add TBD
working in cpu, metal buggy
cleanup useless code
metal : fix out-of-bounds access in soft_max kernels
llama : make starcoder graph build more consistent with others
refactor: cleanup comments a bit
add other starcoder models: 3B, 7B, 15B
support-mqa-directly
fix: remove max_position_embeddings, use n_train_ctx
Update llama.cpp
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Update llama.cpp
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Apply suggestions from code review
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- fix: switch to space from tab
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})