KTransformers Roadmap (2026 Q2) (original) (raw)

Focus

Fine-tuning: Make LoRA SFT run on the exact same hardware as inference — if you can run it, you can fine-tune it.
Consumer Hardware: Deliver the best MoE inference performance on consumer-grade x86 + NVIDIA GPU.
Windows: Native Windows support with full heterogeneous performance advantage.
Performance: VNNI instruction set optimization and AI SSD exploration.

Fine-tuning Service @JimmyPeilinLi

Goal: Lower the barrier for LoRA SFT to match inference — same hardware, same setup, zero extra cost.

Release fine-tuning service to main branch (basic version first, iterate later).
Unify inference and SFT paths so both run on one set of hardware with shared model support, weight handling, and config.
Support transformers v4 and v5, minimum 12G VRAM for 67B model. ✅

Consumer Hardware Support

Goal: Best-in-class MoE inference performance on consumer hardware.

Windows native C++ inference support — investigate JIT compiler Windows compatibility.
Evaluate S1 inference backend migration to Windows as alternative path.
Complete VNNI instruction set optimization.
Investigate AI SSD performance bottleneck (slow mmap reads from disk).
Promote AVX2 to first-class supported tier.
Improve support quality and stability for RTX 30/40/50 series.

Performance

Optimize decode and prefill paths. @ouqingliang
Optimize expert offloading and scheduling strategies. @ovowei
Benchmark on representative AVX2 + RTX 30/40/50 configurations. @yyj6666667
Focus on heterogeneous performance overhead in Windows path.
Support nvfp4 and mxfp4 quantization formats (carried over from Q1).

Coverage

Fast support for important frontier MoE models.
Improve support quality and completeness for major model families.

Contribution / Maintenance

Maintain CI/CD pipelines.
Clearer documentation
Establish recommended configurations for typical consumer hardware setups.
Continue NUMA-aware optimization and CPU-GPU coordination improvements.

Any contribution is welcomed, please email yzwliam@126.com / ervinxie@qq.com if you want to join development WeChat group.