fix: use injected now() instead of time.Now() in summary methods by imorph · Pull Request #1672 · prometheus/client_golang (original) (raw)

=== RUN   TestSummaryDecay
--- PASS: TestSummaryDecay (0.00s)
PASS

But on CI it (sometimes) fails.

--- FAIL: TestSummaryDecay (1.17s)
    summary_test.go:396: 170. got 102.000000, want 80.000000
    summary_test.go:396: 180. got 112.000000, want 90.000000

Looks like it takes very performance constrained environment to reveal problem. Lets recreate one.

Run something like stress -c 14 and while it running execute go test -run TestSummaryDecay again:

=== RUN   TestSummaryDecay
    summary_test.go:396: 220. got 166.000000, want 130.000000
    summary_test.go:396: 230. got 181.000000, want 140.000000
    summary_test.go:396: 240. got 197.000000, want 150.000000
    summary_test.go:396: 250. got 205.000000, want 160.000000
    summary_test.go:396: 260. got 221.000000, want 170.000000
    summary_test.go:396: 270. got 226.000000, want 180.000000
    summary_test.go:396: 280. got 236.000000, want 190.000000
    summary_test.go:396: 290. got 243.000000, want 200.000000
    summary_test.go:396: 300. got 249.000000, want 210.000000
    summary_test.go:396: 310. got 256.000000, want 220.000000
    summary_test.go:396: 320. got 264.000000, want 230.000000
    summary_test.go:396: 330. got 268.000000, want 240.000000
    summary_test.go:396: 340. got 273.000000, want 250.000000
    summary_test.go:396: 820. got 752.000000, want 730.000000
    summary_test.go:396: 830. got 762.000000, want 740.000000
    summary_test.go:396: 840. got 778.000000, want 750.000000
    summary_test.go:396: 850. got 793.000000, want 760.000000
    summary_test.go:396: 860. got 801.000000, want 770.000000
    summary_test.go:396: 870. got 810.000000, want 780.000000
    summary_test.go:396: 880. got 819.000000, want 790.000000
    summary_test.go:396: 890. got 827.000000, want 800.000000
    summary_test.go:396: 900. got 834.000000, want 810.000000
--- FAIL: TestSummaryDecay (1.28s)
FAIL

The testis failing due to its dependency on real-time progression and the Summary implementation's direct use of time.Now(). By modifying the Summary implementation to utilize the injected now function from SummaryOpts and adjusting the test to control time progression, we eliminate the flakiness and make the test deterministic.

Test is now stable and deterministic even if all CPUs are busy.