Fix #2392: Dequeue all timed out messages from the backlog when not connected, even when no completion is needed, to be able to dequeue and complete other timed out messages. by kornelpal · Pull Request #2397 · StackExchange/StackExchange.Redis (original) (raw)
@NickCraver, I believe that I was able to figure out the root cause of your issue.
According to my tests when there is no active connection, all the client objects, including the connection, the backlog, and even the Task instance are freed by the CG. This will leave nothing to invoke the continuations that the async flow depends on.
Debugging it is very difficult, because debugging results in object lifetimes being extended either by the runtime or by the debugging aids holding on to Task or other object instances.
Somewhat ironically adding WithTimeout also fixes the issue because timers are rooted.
Running other tests likely helps reproducing it because of the added memory pressure.
Although I haven't tried creating a unit test for it, using a longer timeout, and a GC.Collect + GC.WaitForPendingFinalizers loop on a separate thread should help reproducing it more reliably.
Rooting the connection in the test should fix the issue:
Exception ex; var gcRoot = GCHandle.Alloc(conn); try { ex = await Assert.ThrowsAsync(() => db.PingAsync()); } finally { gcRoot.Free(); }
I don't think that this is affecting real world applications much, because the connection is likely rooted from a static field. Although I am not entirely positive that the same issue is not possible when there is an active connection, the native/unmanaged connection likely keeps these objects alive most of the time.
My recommended fix is rooting the PhysicalBridge instance using a GCHandle to itself when the backlog is non-empty, and possibly doing the same for the sent message queue too. When the message volume is low, creating and destroying a GCHandle should not be an issue and it will be kept allocated all the time when the volume is high.
As a conclusion I believe that this is a separate issue from both this PR and the changes your are working on.