Blog (original) (raw)
Zero Inference
September 12, 2023
title: “ZeRO-Inference: 20X faster inference through weight quantization and KV cache offloading” excerpt: “” link: https://github.com/deepspeedai/DeepSpeedE...
September 12, 2023
title: “ZeRO-Inference: 20X faster inference through weight quantization and KV cache offloading” excerpt: “” link: https://github.com/deepspeedai/DeepSpeedE...