Firestore: Fix spurious "Backend didn't respond within 10 seconds" errors when network just slow by dconeybe · Pull Request #8145 · firebase/firebase-js-sdk (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation19 Commits9 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@dconeybe

Firestore was erroneously concluding that the connection to the backend was broken when the response to the initial listen request was large (e.g. 1000 documents) and the network speed was slow.

This bug resulted in erroneous errors that looked like this:

Could not reach Cloud Firestore backend. Backend didn't respond within 10 seconds.
This typically indicates that your device does not have a healthy Internet connection at the moment. The client will operate in offline mode until it is able to successfully connect to the backend.

The fix is to propagate the first "OPEN" message from WebChannel to the online state detector so that it will consider the connection to be "alive" even though no proto messages have been received on the stream.

Googlers see b/325591749

@changeset-bot

@google-oss-bot

Size Report 1

Affected Products

Type Base (d1eae3a) Merge (e03fd3f) Diff
browser 377 kB 377 kB +275 B (+0.1%)
esm5 362 kB 362 kB +345 B (+0.1%)
main 580 kB 580 kB +464 B (+0.1%)
module 377 kB 377 kB +275 B (+0.1%)
react-native 377 kB 377 kB +275 B (+0.1%)
Type Base (d1eae3a) Merge (e03fd3f) Diff
firestore (CSI Auto Indexing Disable and Delete) 268 kB 269 kB +113 B (+0.0%)
firestore (CSI Auto Indexing Enable) 268 kB 269 kB +113 B (+0.0%)
firestore (Persistence) 303 kB 303 kB +172 B (+0.1%)
firestore (Query Cursors) 247 kB 241 kB -6.36 kB (-2.6%)
firestore (Query) 245 kB 238 kB -6.36 kB (-2.6%)
firestore (Read data once) 233 kB 226 kB -6.36 kB (-2.7%)
firestore (Read Write w Persistence) 323 kB 323 kB +172 B (+0.1%)
firestore (Realtime updates) 235 kB 229 kB -6.36 kB (-2.7%)
firestore (Transaction) 212 kB 205 kB -6.47 kB (-3.1%)
firestore (Write data) 212 kB 205 kB -6.39 kB (-3.0%)
Type Base (d1eae3a) Merge (e03fd3f) Diff
firebase-compat.js 786 kB 786 kB +165 B (+0.0%)
firebase-firestore-compat.js 342 kB 342 kB +165 B (+0.0%)
firebase-firestore.js 436 kB 436 kB +275 B (+0.1%)

Test Logs

  1. https://storage.googleapis.com/firebase-sdk-metric-reports/EdgLCb4PBr.html

@google-oss-bot

@thomasdao

@dconeybe thanks for working on this issue. I tested the PR, the query can load about 100 documents then found the below error

Error: FIRESTORE (10.6.0) INTERNAL ASSERTION FAILED: Unexpected state
    at fail (index.esm2017.js:194:1)
    at __PRIVATE_AsyncQueueImpl.Iu (index.esm2017.js:18538:1)
    at __PRIVATE_AsyncQueueImpl.enqueue (index.esm2017.js:18454:1)
    at __PRIVATE_AsyncQueueImpl.enqueueAndForget (index.esm2017.js:18439:1)
    at DelayedOperation.handleDelayElapsed (index.esm2017.js:15141:1)
    at index.esm2017.js:15123:1

@Valansch

Hi @dconeybe thank you for working on the fix.
Do you have a tip how I can build the sdk and import it as a dart package?

@dconeybe

thomasdao

Please try these steps:

  1. cd packages/firestore
  2. yarn build:debug
  3. cp -r dist ~/YOUR_PROJECT/node_modules/@firebase/firestore

The build:debug script will build dist/index.esm2017.js without all of the optimizations and code mangling. This will make your stack traces easier to interpret.

Now, please reproduce again and post an updated stack trace. Hopefully the stack trace from unmangled code will make more sense.

@dconeybe

Hi @dconeybe thank you for working on the fix. Do you have a tip how I can build the sdk and import it as a dart package?

@Valansch Unfortunately, no, I don't have any experience with dart or flutter. Would you be able to ask this question on the https://github.com/firebase/flutterfire repository?

@thomasdao

@dconeybe please see the error below

Error log

firebase.worker.js:50 Error: FIRESTORE (10.6.0) INTERNAL ASSERTION FAILED: AsyncQueue is already failed: Error: FIRESTORE (10.6.0) INTERNAL ASSERTION FAILED: `pendingResponses` is less than 0. Actual value: -1. This indicates that the SDK received more target acks from the server than expected. The SDK should not continue to operate.
  at fail (index.esm2017.js:213:1)
  at hardAssert (index.esm2017.js:223:1)
  at TargetState.recordTargetResponse (index.esm2017.js:8464:1)
  at index.esm2017.js:8523:1
  at Array.forEach (<anonymous>)
  at WatchChangeAggregator.forEachTarget (index.esm2017.js:8570:1)
  at WatchChangeAggregator.handleTargetChange (index.esm2017.js:8512:1)
  at onWatchStreamChange (index.esm2017.js:20721:1)
  at PersistentListenStream.onMessage (index.esm2017.js:20010:1)
  at index.esm2017.js:19945:1
  at fail (index.esm2017.js:213:1)
  at AsyncQueueImpl.verifyNotFailed (index.esm2017.js:25633:1)
  at AsyncQueueImpl.enqueue (index.esm2017.js:25539:1)
  at AsyncQueueImpl.enqueueAndForget (index.esm2017.js:25521:1)
  at DelayedOperation.handleDelayElapsed (index.esm2017.js:21168:1)
  at index.esm2017.js:21145:1

@dconeybe

@thomasdao Thanks for providing the updated stack trace. That error looks like the bug that you reported in #7652 has resurfaced. That is surprising because the bug you reported should have been fixed in v10.5.2 (and you are using v10.6.0).

To rule out this issue, could you try the following:

  1. Upgrade your dependency on firebase in package.json to the latest: "firebase": "^10.11.0"
  2. Delete your package-lock.json and node_modules directory.
  3. Run npm install in your application.
  4. Make sure that the firebase-js-sdk that you checked out from GitHub is at the latest commit in the master branch.

Thanks!

@thomasdao

@dconeybe I updated Firebase to 10.11.0, then copy the firestore dist folder, and the query still hangs with below error:

index.esm2017.js:78 [2024-04-16T15:19:20.510Z]  @firebase/firestore: Firestore (10.11.0): WebChannelConnection RPC 'Listen' stream 0x5a8c4d87 transport errored: 
Qd {type: 'c', target: Q, g: Q, defaultPrevented: false, status: 1}

@dconeybe

@thomasdao Can you confirm that you have the dconeybe/WebChannelOnOpenFix_Bug325591749 branch checked out in your local clone of this repository? If so, could you try re-running the yarn build:debug and cp -r dist ~/YOUR_PROJECT/node_modules/@firebase/firestore steps?

Also, if you're willing, could you enable Firestore debug logging (i.e. call setLogLevel('debug')) and provide the entire log that is produced? Feel free to email your log to me directly (my email is my github username at google.com) if you're concerned about posting the logs publicly.

@thomasdao

@dconeybe I've sent the debug log to your email. I've checked the branch again and it's correct:

$ git pull
Already up to date.

$ git rev-parse --abbrev-ref HEAD
dconeybe/WebChannelOnOpenFix_Bug325591749

$ git branch --show-current
dconeybe/WebChannelOnOpenFix_Bug325591749

@dconeybe

@thomasdao Thank you for sending me the logs. Unfortunately, based on the logs I can confirm that this PR will NOT fix the issue you are reporting (#7860). One thing, however, that I noticed from your logs is that your app seems to using Firestore before a user is signed in, and then continue to use it after the user is signed in. This could be innocuous, or could be undesirable behavior, depending on what your app is doing. I just wanted to point it out.

@dconeybe dconeybe marked this pull request as ready for review

April 18, 2024 14:55

MarkDuckworth

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with comments.

*
* A stream can be closed locally with close() or can be closed remotely or
* through network errors. onClose is guaranteed to be called. onOpen will only
* be called if the stream successfully established a connection.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify the difference between onOpen and onConnected in the documentation? Currently the docs state that "onOpen will only be called if the stream successfully established a connection." That definition of onOpen is what I would expect for onConneted.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the docs. LMK if they clarify.

The naming "onOpen" is somewhat unfortunate, as it does not actually indicate that an opened connection with the backend has been established. I think at one point this was the meaning, but the channel is now optimized to send the "handshake" with the first message to avoid an extra round-trip with the backend. So WebChannelConnection now sends onOpen as soon as it is ready to send a message so that it can send the handshake along with the first message. I added onConnected to send an event when an actual connection with the backend has been established.

});
});
it('gets connected event before first message', () => {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the title of this a little misleading because the test doesn't appear to test for any messages. Or does it and I'm just not seeing how?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The awaitCallback() will fail if some other callback occurs (such as a message). The test doesn't care about if the "message" callback occurs, only that "connected" occurs before it. Does that clarify?

@dconeybe

@dconeybe dconeybe deleted the dconeybe/WebChannelOnOpenFix_Bug325591749 branch

April 18, 2024 17:47

@thomasdao

One thing, however, that I noticed from your logs is that your app seems to using Firestore before a user is signed in, and then continue to use it after the user is signed in.

@dconeybe beside user data, I also store public settings in Firestore, so I need to setup Firestore even if the user does not sign in.

@firebase firebase locked and limited conversation to collaborators

May 19, 2024

Labels