fix: chat channel runtime by buua436 · Pull Request #16129 · infiniflow/ragflow (original) (raw)
⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
Add a log entry for the new loop-rebinding flow.
The new control path that rebinds lark_ws_client.loop should emit at least a debug/info log to make WS loop ownership diagnosable in production incidents.
As per coding guidelines, "Add logging for new flows".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/channels/feishu/channel.py` around lines 77 - 81, Add a debug or info
level log entry immediately before or after the line where lark_ws_client.loop
is rebinded to the thread-local loop variable. The log should clearly indicate
that the WebSocket client loop is being rebinded to use the thread-local loop
instead of the API server's main loop, providing visibility into this control
path for production diagnostics. This logging should be placed in the same code
block as the lark_ws_client.loop = loop assignment.
Source: Coding guidelines
⚠️ Potential issue | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
Verify whether the repo config/runtime can instantiate multiple Feishu accounts concurrently.
1) Find Feishu channel construction and startup paths.
rg -n -C3 'register_channel("feishu"|FeishuChannel(|_build(|.start()' --type=py
2) Look for config entries that can declare multiple feishu accounts.
rg -n -C2 '(feishu|channels).:' -g '.yml' -g '.yaml' -g '.json' -g '*.toml'
Repository: infiniflow/ragflow
Length of output: 26049
🏁 Script executed:
Read the full FeishuChannel implementation
wc -l api/channels/feishu/channel.py
Repository: infiniflow/ragflow
Length of output: 97
🏁 Script executed:
Get the full FeishuChannel class to verify the loop rebinding and logging
head -220 api/channels/feishu/channel.py | tail -180
Repository: infiniflow/ragflow
Length of output: 7040
🏁 Script executed:
Check the channel bootstrap to understand if multiple channels can run concurrently
cat -n api/channels/bootstrap.py | head -210 | tail -100
Repository: infiniflow/ragflow
Length of output: 4196
🏁 Script executed:
Search for channel configuration loading
rg -n 'register_channel|_build.*cfg.*dict|FeishuChannel' api/channels/ --type=py -B2 -A2 | head -80
Repository: infiniflow/ragflow
Length of output: 3979
Shared module-global loop rebinding can break multi-account Feishu channels.
lark_ws_client.loop is process-global; each _run_ws call overwrites it. If two FeishuChannel instances run concurrently (supported by the multi-account bootstrap architecture), the later thread can steal scheduling for the earlier client, causing nondeterministic WebSocket task routing and loop-collision errors.
Add a per-channel synchronization mechanism to ensure only one Feishu instance can bind the shared lark_ws_client.loop at any time. Additionally, add logging for the loop rebinding operation to comply with the guideline to log new flows.
Potential direction
class FeishuChannel(Channel):
- _ws_global_lock = threading.Lock()
- _active_ws_owner: Optional[str] = None
def _run_ws(self) -> None: loop = asyncio.new_event_loop() asyncio.set_event_loop(loop)with FeishuChannel._ws_global_lock:if (FeishuChannel._active_ws_owner is not Noneand FeishuChannel._active_ws_owner != self.account_id):raise RuntimeError("lark_oapi websocket loop is global; concurrent FeishuChannel instances are unsafe")FeishuChannel._active_ws_owner = self.account_idLOGGER.info("[feishu:%s] binding lark websocket loop", self.account_id) lark_ws_client.loop = loop try: ... finally:with FeishuChannel._ws_global_lock:if FeishuChannel._active_ws_owner == self.account_id:FeishuChannel._active_ws_owner = NoneLOGGER.info("[feishu:%s] unbinding lark websocket loop", self.account_id) loop.close()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/channels/feishu/channel.py` around lines 77 - 81, The module-global
`lark_ws_client.loop` assignment in the `_run_ws` method creates a race
condition when multiple FeishuChannel instances run concurrently, as each
instance overwrites the shared loop causing nondeterministic WebSocket task
routing. Add a class-level or module-level synchronization lock (such as
threading.Lock) to ensure thread-safe exclusive access to the
`lark_ws_client.loop` rebinding operation, protecting the critical section where
`lark_ws_client.loop = loop` is set. Wrap the loop rebinding operation with the
lock acquisition and release to prevent concurrent threads from interfering with
each other's WebSocket scheduling. Additionally, add a log statement (using
appropriate logger) before or after the loop rebinding to document this
operation and aid in debugging multi-account scenarios.