Windows systems perfomance issue (original) (raw)

Hello everyone,

We’ve encountered performance issues when using TensorRT with Triton on Windows systems. There’s an open issue on GitHub (Performance issues on Windows · Issue #8106 · triton-inference-server/server · GitHub), but it doesn’t seem very active, so we’re hoping to find others here who might be facing the same problem.

The issue is described in detail in the GitHub thread. In short, inference on Windows is, on average, twice as slow as on Linux when running on the same hardware. We’ve reviewed the source code but haven’t found any obvious bottlenecks or suspicious slowdowns.

Hi @ESlava ,
thank you for highlighting this,
I will check with respective team on teh progress and put a request forward to have an update on teh thread.

Thanks