Releases · ggml-org/llama.cpp (original) (raw)
b5648
sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125)
b5646
server : re-enable SWA speculative decoding (#14131)
ggml-ci
b5645
context : simplify output counting logic during decode (#14142)
- batch : remove logits_all flag
ggml-ci
- context : simplify output counting logic during decode
ggml-ci
- cont : fix comments
b5644
batch : remove logits_all flag (#14141)
ggml-ci
b5642
kv-cache : fix split_equal handling in unified implementation (#14130)
ggml-ci
b5641
context : round n_tokens to next multiple of n_seqs when reserving (#…
b5640
common: fix issue with regex_escape routine on windows (#14133)
b5639
Implement GGML_CPU_ALL_VARIANTS for ARM (#14080)
ggml-cpu: Factor out feature detection build from x86
ggml-cpu: Add ARM feature detection and scoring
This is analogous to cpu-feats-x86.cpp. However, to detect compile-time activation of features, we rely on GGML_USE_ which need to be set in cmake, instead of GGML_ that users would set for x86.
This is because on ARM, users specify features with GGML_CPU_ARM_ARCH, rather than with individual flags.
- ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM
Like x86, however to pass around arch flags within cmake, we use GGML_INTERNAL_ as we don't have GGML_.
Some features are optional, so we may need to build multiple backends per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring function sort out which one can be used.
- ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now
The other platforms will need their own specific variants.
This also fixes the bug that the the variant-building branch was always being executed as the else-branch of GGML_NATIVE=OFF. The branch is moved to an elseif-branch which restores the previous behavior.
b5638
chore : clean up relative source dir paths (#14128)
b5637
tests : add test-tokenizers-repo (#14017)