Eval bug: vulkan: regression: vram usage increased (original) (raw)

Name and Version

716bd6d
bisected

Operating systems

Linux

GGML backends

Vulkan

Hardware

amdgpu 8g

Models

Qwen2.5-Coder-14B-Instruct-Q4_K_M
or any model with similar size

Problem description & steps to reproduce

on c250ecb . the weight part of model can fit into vram. left only context/kv cache on gtt. memory usage is 8166m vram + 2271m gtt.

but on 716bd6d . memory usage is 6342m vram + 4107m gtt. significantly slowed down the tg speed.

First Bad Commit

716bd6d

Relevant log output

no difference on log output