Revert "fix: paginate non-DeepDOC PDF parsing tasks to prevent OOM" by wangq8 · Pull Request #16104 · infiniflow/ragflow (original) (raw)

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info ⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e60aac87-9ce7-4612-82b6-e672e465df26

📥 Commits

Reviewing files that changed from the base of the PR and between 1e4796d and a86d01d.

📒 Files selected for processing (1)


📝 Walkthrough

Walkthrough

In queue_tasks' PDF branch of task_service.py, the page chunk sizing logic now derives do_layout from parser_config.layout_recognize (defaulting to "DeepDOC") and extends the page_size forced-maximum condition to include cases where layout recognition is not "DeepDOC", in addition to the existing parser_id and toc_extraction checks.

Changes

PDF Task Chunk Sizing Condition

Layer / File(s) Summary
PDF task page_size override condition api/db/services/task_service.py Derives do_layout from parser_config.layout_recognize (default "DeepDOC") and extends the condition that forces page_size to MAXIMUM_TASK_PAGE_NUMBER to also trigger when do_layout != "DeepDOC", replacing the previous fallback that used task_page_size with a "30" default in those cases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

Suggested labels

🐞 bug, 🌈 python

Poem

🐇 Hop along, little PDF page,
Your layout check now sets the stage,
If DeepDOC's gone, the max takes hold,
No more fallback to "thirty" of old,
The rabbit tidied chunking code —
Each task now walks the proper road! 📄


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.