feat: support StarCoder model architectures by wsxiaoys · Pull Request #3187 · ggml-org/llama.cpp (original) (raw)

added 15 commits

September 15, 2023 10:39

wsxiaoys marked this pull request as ready for review

September 15, 2023 16:11

feat: support starcoder mqa

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

pkrmf pushed a commit to morlockstudios-com/llama.cpp that referenced this pull request

add placeholder of starcoder in gguf / llama.cpp
support convert starcoder weights to gguf
convert MQA to MHA
fix ffn_down name
add LLM_ARCH_STARCODER to llama.cpp
set head_count_kv = 1
load starcoder weight
add max_position_embeddings
set n_positions to max_positioin_embeddings
properly load all starcoder params
fix head count kv
fix comments
fix vram calculation for starcoder
store mqa directly
add input embeddings handling
add TBD
working in cpu, metal buggy
cleanup useless code
metal : fix out-of-bounds access in soft_max kernels
llama : make starcoder graph build more consistent with others
refactor: cleanup comments a bit
add other starcoder models: 3B, 7B, 15B
support-mqa-directly
fix: remove max_position_embeddings, use n_train_ctx
Update llama.cpp

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Update llama.cpp

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Apply suggestions from code review

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

fix: switch to space from tab

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request

add placeholder of starcoder in gguf / llama.cpp
support convert starcoder weights to gguf
convert MQA to MHA
fix ffn_down name
add LLM_ARCH_STARCODER to llama.cpp
set head_count_kv = 1
load starcoder weight
add max_position_embeddings
set n_positions to max_positioin_embeddings
properly load all starcoder params
fix head count kv
fix comments
fix vram calculation for starcoder
store mqa directly
add input embeddings handling
add TBD
working in cpu, metal buggy
cleanup useless code
metal : fix out-of-bounds access in soft_max kernels
llama : make starcoder graph build more consistent with others
refactor: cleanup comments a bit
add other starcoder models: 3B, 7B, 15B
support-mqa-directly
fix: remove max_position_embeddings, use n_train_ctx
Update llama.cpp

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Update llama.cpp

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Apply suggestions from code review

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

fix: switch to space from tab

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request

add placeholder of starcoder in gguf / llama.cpp
support convert starcoder weights to gguf
convert MQA to MHA
fix ffn_down name
add LLM_ARCH_STARCODER to llama.cpp
set head_count_kv = 1
load starcoder weight
add max_position_embeddings
set n_positions to max_positioin_embeddings
properly load all starcoder params
fix head count kv
fix comments
fix vram calculation for starcoder
store mqa directly
add input embeddings handling
add TBD
working in cpu, metal buggy
cleanup useless code
metal : fix out-of-bounds access in soft_max kernels
llama : make starcoder graph build more consistent with others
refactor: cleanup comments a bit
add other starcoder models: 3B, 7B, 15B
support-mqa-directly
fix: remove max_position_embeddings, use n_train_ctx
Update llama.cpp

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Update llama.cpp

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Apply suggestions from code review

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

fix: switch to space from tab

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request

add placeholder of starcoder in gguf / llama.cpp
support convert starcoder weights to gguf
convert MQA to MHA
fix ffn_down name
add LLM_ARCH_STARCODER to llama.cpp
set head_count_kv = 1
load starcoder weight
add max_position_embeddings
set n_positions to max_positioin_embeddings
properly load all starcoder params
fix head count kv
fix comments
fix vram calculation for starcoder
store mqa directly
add input embeddings handling
add TBD
working in cpu, metal buggy
cleanup useless code
metal : fix out-of-bounds access in soft_max kernels
llama : make starcoder graph build more consistent with others
refactor: cleanup comments a bit
add other starcoder models: 3B, 7B, 15B
support-mqa-directly
fix: remove max_position_embeddings, use n_train_ctx
Update llama.cpp

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Update llama.cpp

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Apply suggestions from code review

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

fix: switch to space from tab

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request

add placeholder of starcoder in gguf / llama.cpp
support convert starcoder weights to gguf
convert MQA to MHA
fix ffn_down name
add LLM_ARCH_STARCODER to llama.cpp
set head_count_kv = 1
load starcoder weight
add max_position_embeddings
set n_positions to max_positioin_embeddings
properly load all starcoder params
fix head count kv
fix comments
fix vram calculation for starcoder
store mqa directly
add input embeddings handling
add TBD
working in cpu, metal buggy
cleanup useless code
metal : fix out-of-bounds access in soft_max kernels
llama : make starcoder graph build more consistent with others
refactor: cleanup comments a bit
add other starcoder models: 3B, 7B, 15B
support-mqa-directly
fix: remove max_position_embeddings, use n_train_ctx
Update llama.cpp

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Update llama.cpp

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Apply suggestions from code review

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

fix: switch to space from tab

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request

add placeholder of starcoder in gguf / llama.cpp
support convert starcoder weights to gguf
convert MQA to MHA
fix ffn_down name
add LLM_ARCH_STARCODER to llama.cpp
set head_count_kv = 1
load starcoder weight
add max_position_embeddings
set n_positions to max_positioin_embeddings
properly load all starcoder params
fix head count kv
fix comments
fix vram calculation for starcoder
store mqa directly
add input embeddings handling
add TBD
working in cpu, metal buggy
cleanup useless code
metal : fix out-of-bounds access in soft_max kernels
llama : make starcoder graph build more consistent with others
refactor: cleanup comments a bit
add other starcoder models: 3B, 7B, 15B
support-mqa-directly
fix: remove max_position_embeddings, use n_train_ctx
Update llama.cpp

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Update llama.cpp

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Apply suggestions from code review

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

fix: switch to space from tab

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})