Eval bug: vulkan: regression: vram usage increased (original) (raw)

716bd6d
bisected

Linux

Vulkan

amdgpu 8g

Qwen2.5-Coder-14B-Instruct-Q4_K_M
or any model with similar size

on c250ecb . the weight part of model can fit into vram. left only context/kv cache on gtt. memory usage is 8166m vram + 2271m gtt.

but on 716bd6d . memory usage is 6342m vram + 4107m gtt. significantly slowed down the tg speed.

no difference on log output