Releases · ggml-org/llama.cpp (original) (raw)

b5648

sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125)

b5646

server : re-enable SWA speculative decoding (#14131)

ggml-ci

b5645

context : simplify output counting logic during decode (#14142)

ggml-ci

ggml-ci

b5644

batch : remove logits_all flag (#14141)

ggml-ci

b5642

kv-cache : fix split_equal handling in unified implementation (#14130)

ggml-ci

b5641

context : round n_tokens to next multiple of n_seqs when reserving (#…

b5640

common: fix issue with regex_escape routine on windows (#14133)

b5639

Implement GGML_CPU_ALL_VARIANTS for ARM (#14080)

This is analogous to cpu-feats-x86.cpp. However, to detect compile-time activation of features, we rely on GGML_USE_ which need to be set in cmake, instead of GGML_ that users would set for x86.

This is because on ARM, users specify features with GGML_CPU_ARM_ARCH, rather than with individual flags.

Like x86, however to pass around arch flags within cmake, we use GGML_INTERNAL_ as we don't have GGML_.

Some features are optional, so we may need to build multiple backends per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring function sort out which one can be used.

The other platforms will need their own specific variants.

This also fixes the bug that the the variant-building branch was always being executed as the else-branch of GGML_NATIVE=OFF. The branch is moved to an elseif-branch which restores the previous behavior.

b5638

chore : clean up relative source dir paths (#14128)

b5637

tests : add test-tokenizers-repo (#14017)