[Core] Enable multiple profiler consumers and add a timeline/tracing profiler by froce · Pull Request #1788 · stride3d/stride (original) (raw)
I am basically done with my first pass. If the general approach is okay to merge I will add more tests and fix the comments/docs. Everything else can probably be addressed during review or in follow-up PRs.
The remaining problems are mainly around GPU events, but none of it should affect existing functionality.
Currently I map all GPU events to a single thread id, which is certainly not correct, but I also don't see an easy fix for now.
Syncing CPU and GPU timestamps also isn't working correctly, need to investigate if it's just my logic that's wrong or something else:
Then there's this:
Total time is 16 ms, but threadpool work takes 123 ms combined ;) Should I add sorting by avg time?
For having variants of the same profiling key attributes should be used. And for things like the GC counts we may need a counter primitive as you mentioned instead.
Do you have an example? I still don't think I get it.
ProfilingKeys are particularly awkward for me when profiling around polymorphism, but I don't have a suggestion how to improve it:
//Before: using (Profiler.Begin($"{child.GetType().Name}.Draw")) { child.Draw(drawContext); } //Good. Every subclass gets it's own key.
//Now: using (Profiler.Begin(DrawChildKey, $"{child.GetType().Name}.Draw")) { child.Draw(drawContext); } //Sadness.
Doing it properly now requires some Dictionary<Type, ProfilingKey>, which seems like a lot of effort for something that should be quick to add and remove again. I guess I'll do it for these base in-engine cases where it makes sense, but it's not ideal.
I think we might be able to separate the message logging from the profiler. There isn't really a good reason for it to be tied together (except being enabled when profiling is enabled) and it will simplify things a bit.
It doesn't have to be tied together, but having an easy way to log some specific (e.g. asset loading time) profiling events to a file/console is nice. I'd just like to do it without bloating all other profiling in the process.
First look at performance:
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19045.3448) AMD Ryzen 5 3600, 1 CPU, 12 logical and 6 physical cores .NET SDK=8.0.100-preview.7.23376.3 [Host] : .NET 6.0.14 (6.0.1423.7309), X64 RyuJIT AVX2 Job-MBWYOB : .NET 6.0.14 (6.0.1423.7309), X64 RyuJIT AVX2 Job-YPSZPM : .NET 6.0.14 (6.0.1423.7309), X64 RyuJIT AVX2
Method | Job | NuGetReferences | NUM_ENTITIES | WithProfiling | Mean | Error | StdDev | Median | Allocated |
---|---|---|---|---|---|---|---|---|---|
Run | Job-MBWYOB | Stride.Core 4.1.1.1 | 2000 | False | 6.412 ms | 0.1267 ms | 0.2645 ms | 6.387 ms | 12.08 KB |
Run | Job-YPSZPM | Stride.Core 4.1.2.1 | 2000 | False | 6.348 ms | 0.1249 ms | 0.1870 ms | 6.262 ms | 12.08 KB |
Run | Job-MBWYOB | Stride.Core 4.1.1.1 | 2000 | True | 18.580 ms | 0.3451 ms | 0.3059 ms | 18.606 ms | 12.57 KB |
Run | Job-YPSZPM | Stride.Core 4.1.2.1 | 2000 | True | 7.184 ms | 0.1436 ms | 0.2766 ms | 7.052 ms | 12.39 KB |
4.1.1.1 is master with the added profiling keys, 4.1.2.1 is this PR, so we can now add profiling to work happening on the threadpool without trashing performance (on desktop at least...).