Enhancement: update ci for parallel test execution by 6ba3i · Pull Request #16133 · infiniflow/ragflow (original) (raw)
What problem does this PR solve?
split ci into multiple jobs
Type of change
- Performance Improvement
This PR changes 0-9 lines, ignoring generated files.
label
Continue Integration
and removed size:XS
This PR changes 0-9 lines, ignoring generated files.
labels
📝 Walkthrough
Walkthrough
The workflow introduces a dedicated ragflow_preflight job that exports http_api_test_level as an output for downstream test jobs. Host port allocation is replaced with a partition-based reservation scheme using lock files in per-run artifact directories. Docker Compose lifecycle commands are standardized to use a computed COMPOSE_PROJECT_NAME (${GITHUB_RUN_ID}-${DOC_ENGINE}). Both Infinity and Elasticsearch test jobs are rewired to depend on the preflight output and use the new port allocation and compose naming scheme. Elasticsearch SDK test artifacts are renamed to ES-specific filenames.
Changes
CI Workflow Refactor
| Layer / File(s) | Summary |
|---|---|
| Preflight job and test-level output .github/workflows/tests.yml | Introduces ragflow_preflight job that replaces the prior ragflow_tests job block, computes http_api_test_level (scheduled vs. pull request), and exposes it as a job-level output for downstream Infinity consumption. |
| Partition-based port allocation scheme .github/workflows/tests.yml | Replaces ad-hoc port selection with engine-specific PARTITION_BASE and shared PARTITION_SIZE bounds; implements reserve_port_offset() function that iterates candidate offsets, reserves via lock files in per-run artifact directories, validates availability via ss, and exits with DOC_ENGINE-specific error if no free offset is found. |
| Docker Compose standardization and environment setup .github/workflows/tests.yml | Computes COMPOSE_PROJECT_NAME from GITHUBRUNID−{GITHUB_RUN_ID}-GITHUBRUNID−{DOC_ENGINE}; consolidates .env configuration by deleting prior CI overrides and writing all computed port variables, COMPOSE_PROFILES, TEI_MODEL, RAGFLOW_IMAGE, and DOC_ENGINE in a single block; updates compose start and down commands to use ${COMPOSE_PROJECT_NAME}. |
| Infinity test job implementation .github/workflows/tests.yml | Wires ragflow_tests_infinity to depend on ragflow_preflight output; adds Set test level step to compute and export HTTP_API_TEST_LEVEL; updates BUILDER_CONTAINER naming to include run and engine identifiers. |
| Elasticsearch test job and artifact naming .github/workflows/tests.yml | Introduces ragflow_tests_elasticsearch job setup with COMPOSE_PROJECT_NAME computation and partition-based port reservation identical to Infinity; corrects Elasticsearch SDK pytest invocation to write pytest-es-sdk.xml and coverage-es-sdk.xml instead of Infinity-named artifacts. |
| Resource cleanup and lifecycle management .github/workflows/tests.yml | Updates coverage-save "stop" steps for both Infinity and Elasticsearch to use docker compose ... -p COMPOSEPROJECTNAMEstop;updates"stopragflow:nightly"cleanupforbothenginestouse{COMPOSE_PROJECT_NAME} stop; updates "stop ragflow:nightly" cleanup for both engines to use COMPOSEPROJECTNAMEstop;updates"stopragflow:nightly"cleanupforbothenginestouse{COMPOSE_PROJECT_NAME} for docker compose ... down -v, remove compose-project containers, optionally remove built image, and delete port reservation lock files. |
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
🐇 A rabbit hops through CI lanes with care,
Whereragflow_preflightbroadcasts test levels in the air.
Partitions carved for ports—no more collisions here,
AndCOMPOSE_PROJECT_NAMEmakes cleanup crystal clear!
Lock files guard the reservation dance,
While both engines now have their rightful chance. 🎉
🚥 Pre-merge checks | ✅ 5 ✅ Passed checks (5 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title accurately captures the main objective of the PR: updating CI configuration to enable parallel test execution. |
| Description check | ✅ Passed | The PR description is minimal but covers the required template sections with the problem statement and type of change clearly marked. |
| Docstring Coverage | ✅ Passed | No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check. |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
❤️ Share
Comment @coderabbitai help to get the list of available commands and usage tips.
[](/apps/coderabbitai)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/workflows/tests.yml (2)
239-244: 💤 Low valueConsider pinning actions to commit SHA and setting
persist-credentials: false.Static analysis flagged unpinned action references (
actions/upload-artifact@v4,actions/download-artifact@v4) and missingpersist-credentials: falseon checkout steps. Pinning to a commit SHA prevents supply-chain attacks if the action tag is compromised.This is typically a repo-wide policy decision.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/tests.yml around lines 239 - 244, Pin the `actions/upload-artifact@v4` action reference to a specific commit SHA instead of using the version tag (e.g., replace `v4` with a full commit hash) to prevent supply chain attacks. Additionally, locate all checkout action steps in the workflow and add `persist-credentials: false` to their configuration to limit credential exposure.Source: Linters/SAST tools
295-399: ⚖️ Poor tradeoffSignificant code duplication in port allocation logic.
The "Prepare function test environment" steps in both
ragflow_tests_infinity(lines 295-399) andragflow_tests_elasticsearch(lines 670-778) contain ~100 nearly identical lines for port allocation, .env persistence, and entrypoint patching. This duplication makes maintenance error-prone—any fix must be applied twice.Consider extracting this logic into a reusable composite action or a shared shell script checked into the repo.
Also applies to: 670-778
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/tests.yml around lines 295 - 399, The port allocation logic in the "Prepare function test environment" step is duplicated across both ragflow_tests_infinity and ragflow_tests_elasticsearch jobs (approximately 100 lines each). Extract the duplicated code block containing the RUNNER_NUM calculation, PORT_BASES array definition, port offset availability checking function (port_offset_available), the offset search loop, and all the port variable calculations and .env persistence operations into a separate reusable shell script or GitHub composite action. Then replace both duplicated sections with a call to this shared script or action to eliminate maintenance burden and reduce error-prone duplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/tests.yml:
- Line 804: The Elasticsearch SDK test command on line 804 is missing an
explicit DOC_ENGINE environment variable prefix that should be set before the
pytest execution for consistency with the Infinity SDK test. Add
DOC_ENGINE=elasticsearch before the pytest command in the line containing
junitxml=pytest-es-sdk.xml, following the same pattern used in the Infinity SDK
test where DOC_ENGINE=infinity is explicitly set inline before pytest execution.
---
Nitpick comments:
In @.github/workflows/tests.yml:
- Around line 239-244: Pin the `actions/upload-artifact@v4` action reference to
a specific commit SHA instead of using the version tag (e.g., replace `v4` with
a full commit hash) to prevent supply chain attacks. Additionally, locate all
checkout action steps in the workflow and add `persist-credentials: false` to
their configuration to limit credential exposure.
- Around line 295-399: The port allocation logic in the "Prepare function test
environment" step is duplicated across both ragflow_tests_infinity and
ragflow_tests_elasticsearch jobs (approximately 100 lines each). Extract the
duplicated code block containing the RUNNER_NUM calculation, PORT_BASES array
definition, port offset availability checking function (port_offset_available),
the offset search loop, and all the port variable calculations and .env
persistence operations into a separate reusable shell script or GitHub composite
action. Then replace both duplicated sections with a call to this shared script
or action to eliminate maintenance burden and reduce error-prone duplication.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info ⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b48b799c-88e1-4439-b0d5-7eb7c470e918
📥 Commits
Reviewing files that changed from the base of the PR and between 44164e1 and 763fc88.
📒 Files selected for processing (1)
.github/workflows/tests.yml
| echo "Start to run test sdk on Elasticsearch" |
|---|
| source .venv/bin/activate && set -o pipefail; pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-infinity-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-es-sdk.xml test/testcases/test_sdk_api 2>&1 | tee es_sdk_test.log |
| source .venv/bin/activate && set -o pipefail; pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-es-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-es-sdk.xml test/testcases/test_sdk_api 2>&1 | tee es_sdk_test.log |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Potential issue | 🟡 Minor
🧩 Analysis chain
🏁 Script executed:
DOC_ENGINE=infinity pytest -s --tb=short ...
Repository: infiniflow/ragflow
Length of output: 109
🏁 Script executed:
Check the workflow file at the relevant lines
wc -l .github/workflows/tests.yml
Repository: infiniflow/ragflow
Length of output: 94
🏁 Script executed:
Read around line 429 (Infinity SDK test)
sed -n '425,435p' .github/workflows/tests.yml
Repository: infiniflow/ragflow
Length of output: 773
🏁 Script executed:
Read around line 628 (job-level env var)
sed -n '625,635p' .github/workflows/tests.yml
Repository: infiniflow/ragflow
Length of output: 688
🏁 Script executed:
Read around line 804 (ES SDK test)
sed -n '800,810p' .github/workflows/tests.yml
Repository: infiniflow/ragflow
Length of output: 745
🏁 Script executed:
Find the Infinity job definition and its env section
sed -n '1,430p' .github/workflows/tests.yml | tail -100 | grep -A 20 "runs-on"
Repository: infiniflow/ragflow
Length of output: 44
🏁 Script executed:
More targeted: search backwards from line 429 to find the job name and env section
sed -n '380,430p' .github/workflows/tests.yml
Repository: infiniflow/ragflow
Length of output: 3416
🏁 Script executed:
Search for the Infinity job definition (should be before line 380)
sed -n '300,390p' .github/workflows/tests.yml | grep -B 50 "Start to run test sdk on Infinity" | head -60
Repository: infiniflow/ragflow
Length of output: 44
🏁 Script executed:
Look for the job header and env section for the Infinity job
sed -n '250,380p' .github/workflows/tests.yml
Repository: infiniflow/ragflow
Length of output: 6200
Add explicit DOC_ENGINE=elasticsearch prefix for consistency with Infinity SDK test.
Both jobs define DOC_ENGINE at the job level (Infinity: line ~255, Elasticsearch: line 628), and these environment variables propagate to all steps automatically in GitHub Actions. However, the Infinity SDK test (line 429) explicitly sets DOC_ENGINE=infinity inline before pytest, while the Elasticsearch SDK test (line 804) omits this. For consistency and clarity across both test jobs, add the explicit prefix to the Elasticsearch command.
🔧 Suggested fix
source .venv/bin/activate && set -o pipefail; pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-es-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-es-sdk.xml test/testcases/test_sdk_api 2>&1 | tee es_sdk_test.log
source .venv/bin/activate && set -o pipefail; DOC_ENGINE=elasticsearch pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-es-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-es-sdk.xml test/testcases/test_sdk_api 2>&1 | tee es_sdk_test.log
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| source .venv/bin/activate && set -o pipefail; pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-es-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-es-sdk.xml test/testcases/test_sdk_api 2>&1 | tee es_sdk_test.log |
|---|
| source .venv/bin/activate && set -o pipefail; DOC_ENGINE=elasticsearch pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-es-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-es-sdk.xml test/testcases/test_sdk_api 2>&1 | tee es_sdk_test.log |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/tests.yml at line 804, The Elasticsearch SDK test command
on line 804 is missing an explicit DOC_ENGINE environment variable prefix that
should be set before the pytest execution for consistency with the Infinity SDK
test. Add DOC_ENGINE=elasticsearch before the pytest command in the line
containing junitxml=pytest-es-sdk.xml, following the same pattern used in the
Infinity SDK test where DOC_ENGINE=infinity is explicitly set inline before
pytest execution.
This PR changes 0-9 lines, ignoring generated files.
label
[](/apps/coderabbitai)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/tests.yml:
- Around line 285-300: The reserve_port_offset() function creates lock files
that may accumulate if jobs crash or are cancelled before cleanup reaches the
removal step, eventually exhausting the 60 retry attempts. Add a stale lock
cleanup mechanism before the retry loop starts: iterate through existing lock
files in PORT_LOCK_DIR, extract the timestamp from each file (the third field in
the format written to the file), compare it against the current time, and remove
any lock files older than a threshold such as 6 hours using the comparison logic
to prevent stale locks from blocking port reservations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info ⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0db6ef93-59d4-4d28-9cd3-ba16a6169a1b
📥 Commits
Reviewing files that changed from the base of the PR and between 763fc88 and bbd118c.
📒 Files selected for processing (1)
.github/workflows/tests.yml
Codecov Report
✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.16%. Comparing base (b3ac03b) to head (0c5c5c0).
⚠️ Report is 7 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@ ## main #16133 +/- ##
Coverage 93.16% 93.16%
Files 10 10
Lines 717 717
Branches 118 118
Hits 668 668
Misses 29 29
Partials 20 20
☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.
🚀 New features to boost your workflow:
- ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Pull requests that update test cases.
label
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})