KTransformers Roadmap (2026 Q2) (original) (raw)
Focus
- Fine-tuning: Make LoRA SFT run on the exact same hardware as inference — if you can run it, you can fine-tune it.
- Consumer Hardware: Deliver the best MoE inference performance on consumer-grade x86 + NVIDIA GPU.
- Windows: Native Windows support with full heterogeneous performance advantage.
- Performance: VNNI instruction set optimization and AI SSD exploration.
Fine-tuning Service @JimmyPeilinLi
Goal: Lower the barrier for LoRA SFT to match inference — same hardware, same setup, zero extra cost.
- Release fine-tuning service to main branch (basic version first, iterate later).
- Unify inference and SFT paths so both run on one set of hardware with shared model support, weight handling, and config.
- Support transformers v4 and v5, minimum 12G VRAM for 67B model. ✅
Consumer Hardware Support
Goal: Best-in-class MoE inference performance on consumer hardware.
- Windows native C++ inference support — investigate JIT compiler Windows compatibility.
- Evaluate S1 inference backend migration to Windows as alternative path.
- Complete VNNI instruction set optimization.
- Investigate AI SSD performance bottleneck (slow mmap reads from disk).
- Promote AVX2 to first-class supported tier.
- Improve support quality and stability for RTX 30/40/50 series.
Performance
- Optimize decode and prefill paths. @ouqingliang
- Optimize expert offloading and scheduling strategies. @ovowei
- Benchmark on representative AVX2 + RTX 30/40/50 configurations. @yyj6666667
- Focus on heterogeneous performance overhead in Windows path.
- Support nvfp4 and mxfp4 quantization formats (carried over from Q1).
Coverage
- Fast support for important frontier MoE models.
- Improve support quality and completeness for major model families.
Contribution / Maintenance
- Maintain CI/CD pipelines.
- Clearer documentation
- Establish recommended configurations for typical consumer hardware setups.
- Continue NUMA-aware optimization and CPU-GPU coordination improvements.
Any contribution is welcomed, please email yzwliam@126.com / ervinxie@qq.com if you want to join development WeChat group.