Enhancement: update ci for parallel test execution by 6ba3i · Pull Request #16133 · infiniflow/ragflow (original) (raw)

@6ba3i

What problem does this PR solve?

split ci into multiple jobs

Type of change

@6ba3i

@dosubot dosubot Bot added the size:XS

This PR changes 0-9 lines, ignoring generated files.

label

Jun 17, 2026

@6ba3i 6ba3i added ci

Continue Integration

and removed size:XS

This PR changes 0-9 lines, ignoring generated files.

labels

Jun 17, 2026

@coderabbitai

Review Change Stack

📝 Walkthrough

Walkthrough

The workflow introduces a dedicated ragflow_preflight job that exports http_api_test_level as an output for downstream test jobs. Host port allocation is replaced with a partition-based reservation scheme using lock files in per-run artifact directories. Docker Compose lifecycle commands are standardized to use a computed COMPOSE_PROJECT_NAME (${GITHUB_RUN_ID}-${DOC_ENGINE}). Both Infinity and Elasticsearch test jobs are rewired to depend on the preflight output and use the new port allocation and compose naming scheme. Elasticsearch SDK test artifacts are renamed to ES-specific filenames.

Changes

CI Workflow Refactor

Layer / File(s) Summary
Preflight job and test-level output .github/workflows/tests.yml Introduces ragflow_preflight job that replaces the prior ragflow_tests job block, computes http_api_test_level (scheduled vs. pull request), and exposes it as a job-level output for downstream Infinity consumption.
Partition-based port allocation scheme .github/workflows/tests.yml Replaces ad-hoc port selection with engine-specific PARTITION_BASE and shared PARTITION_SIZE bounds; implements reserve_port_offset() function that iterates candidate offsets, reserves via lock files in per-run artifact directories, validates availability via ss, and exits with DOC_ENGINE-specific error if no free offset is found.
Docker Compose standardization and environment setup .github/workflows/tests.yml Computes COMPOSE_PROJECT_NAME from GITHUBRUNID−{GITHUB_RUN_ID}-GITHUBRUNID{DOC_ENGINE}; consolidates .env configuration by deleting prior CI overrides and writing all computed port variables, COMPOSE_PROFILES, TEI_MODEL, RAGFLOW_IMAGE, and DOC_ENGINE in a single block; updates compose start and down commands to use ${COMPOSE_PROJECT_NAME}.
Infinity test job implementation .github/workflows/tests.yml Wires ragflow_tests_infinity to depend on ragflow_preflight output; adds Set test level step to compute and export HTTP_API_TEST_LEVEL; updates BUILDER_CONTAINER naming to include run and engine identifiers.
Elasticsearch test job and artifact naming .github/workflows/tests.yml Introduces ragflow_tests_elasticsearch job setup with COMPOSE_PROJECT_NAME computation and partition-based port reservation identical to Infinity; corrects Elasticsearch SDK pytest invocation to write pytest-es-sdk.xml and coverage-es-sdk.xml instead of Infinity-named artifacts.
Resource cleanup and lifecycle management .github/workflows/tests.yml Updates coverage-save "stop" steps for both Infinity and Elasticsearch to use docker compose ... -p COMPOSEPROJECTNAMEstop;updates"stopragflow:nightly"cleanupforbothenginestouse{COMPOSE_PROJECT_NAME} stop; updates "stop ragflow:nightly" cleanup for both engines to use COMPOSEPROJECTNAMEstop;updates"stopragflow:nightly"cleanupforbothenginestouse{COMPOSE_PROJECT_NAME} for docker compose ... down -v, remove compose-project containers, optionally remove built image, and delete port reservation lock files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 A rabbit hops through CI lanes with care,
Where ragflow_preflight broadcasts test levels in the air.
Partitions carved for ports—no more collisions here,
And COMPOSE_PROJECT_NAME makes cleanup crystal clear!
Lock files guard the reservation dance,
While both engines now have their rightful chance. 🎉

🚥 Pre-merge checks | ✅ 5 ✅ Passed checks (5 passed)

Check name Status Explanation
Title check ✅ Passed The title accurately captures the main objective of the PR: updating CI configuration to enable parallel test execution.
Description check ✅ Passed The PR description is minimal but covers the required template sections with the problem statement and type of change clearly marked.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

[coderabbitai[bot]](/apps/coderabbitai)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)

.github/workflows/tests.yml (2)

239-244: 💤 Low value

Consider pinning actions to commit SHA and setting persist-credentials: false.

Static analysis flagged unpinned action references (actions/upload-artifact@v4, actions/download-artifact@v4) and missing persist-credentials: false on checkout steps. Pinning to a commit SHA prevents supply-chain attacks if the action tag is compromised.

This is typically a repo-wide policy decision.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/tests.yml around lines 239 - 244, Pin the
`actions/upload-artifact@v4` action reference to a specific commit SHA instead
of using the version tag (e.g., replace `v4` with a full commit hash) to prevent
supply chain attacks. Additionally, locate all checkout action steps in the
workflow and add `persist-credentials: false` to their configuration to limit
credential exposure.

Source: Linters/SAST tools


295-399: ⚖️ Poor tradeoff

Significant code duplication in port allocation logic.

The "Prepare function test environment" steps in both ragflow_tests_infinity (lines 295-399) and ragflow_tests_elasticsearch (lines 670-778) contain ~100 nearly identical lines for port allocation, .env persistence, and entrypoint patching. This duplication makes maintenance error-prone—any fix must be applied twice.

Consider extracting this logic into a reusable composite action or a shared shell script checked into the repo.

Also applies to: 670-778

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/tests.yml around lines 295 - 399, The port allocation
logic in the "Prepare function test environment" step is duplicated across both
ragflow_tests_infinity and ragflow_tests_elasticsearch jobs (approximately 100
lines each). Extract the duplicated code block containing the RUNNER_NUM
calculation, PORT_BASES array definition, port offset availability checking
function (port_offset_available), the offset search loop, and all the port
variable calculations and .env persistence operations into a separate reusable
shell script or GitHub composite action. Then replace both duplicated sections
with a call to this shared script or action to eliminate maintenance burden and
reduce error-prone duplication.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/tests.yml:
- Line 804: The Elasticsearch SDK test command on line 804 is missing an
explicit DOC_ENGINE environment variable prefix that should be set before the
pytest execution for consistency with the Infinity SDK test. Add
DOC_ENGINE=elasticsearch before the pytest command in the line containing
junitxml=pytest-es-sdk.xml, following the same pattern used in the Infinity SDK
test where DOC_ENGINE=infinity is explicitly set inline before pytest execution.

---

Nitpick comments:
In @.github/workflows/tests.yml:
- Around line 239-244: Pin the `actions/upload-artifact@v4` action reference to
a specific commit SHA instead of using the version tag (e.g., replace `v4` with
a full commit hash) to prevent supply chain attacks. Additionally, locate all
checkout action steps in the workflow and add `persist-credentials: false` to
their configuration to limit credential exposure.
- Around line 295-399: The port allocation logic in the "Prepare function test
environment" step is duplicated across both ragflow_tests_infinity and
ragflow_tests_elasticsearch jobs (approximately 100 lines each). Extract the
duplicated code block containing the RUNNER_NUM calculation, PORT_BASES array
definition, port offset availability checking function (port_offset_available),
the offset search loop, and all the port variable calculations and .env
persistence operations into a separate reusable shell script or GitHub composite
action. Then replace both duplicated sections with a call to this shared script
or action to eliminate maintenance burden and reduce error-prone duplication.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:


ℹ️ Review info ⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b48b799c-88e1-4439-b0d5-7eb7c470e918

📥 Commits

Reviewing files that changed from the base of the PR and between 44164e1 and 763fc88.

📒 Files selected for processing (1)

echo "Start to run test sdk on Elasticsearch"
source .venv/bin/activate && set -o pipefail; pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-infinity-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-es-sdk.xml test/testcases/test_sdk_api 2>&1 | tee es_sdk_test.log
source .venv/bin/activate && set -o pipefail; pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-es-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-es-sdk.xml test/testcases/test_sdk_api 2>&1 | tee es_sdk_test.log

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

DOC_ENGINE=infinity pytest -s --tb=short ...

Repository: infiniflow/ragflow

Length of output: 109


🏁 Script executed:

Check the workflow file at the relevant lines

wc -l .github/workflows/tests.yml

Repository: infiniflow/ragflow

Length of output: 94


🏁 Script executed:

Read around line 429 (Infinity SDK test)

sed -n '425,435p' .github/workflows/tests.yml

Repository: infiniflow/ragflow

Length of output: 773


🏁 Script executed:

Read around line 628 (job-level env var)

sed -n '625,635p' .github/workflows/tests.yml

Repository: infiniflow/ragflow

Length of output: 688


🏁 Script executed:

Read around line 804 (ES SDK test)

sed -n '800,810p' .github/workflows/tests.yml

Repository: infiniflow/ragflow

Length of output: 745


🏁 Script executed:

Find the Infinity job definition and its env section

sed -n '1,430p' .github/workflows/tests.yml | tail -100 | grep -A 20 "runs-on"

Repository: infiniflow/ragflow

Length of output: 44


🏁 Script executed:

More targeted: search backwards from line 429 to find the job name and env section

sed -n '380,430p' .github/workflows/tests.yml

Repository: infiniflow/ragflow

Length of output: 3416


🏁 Script executed:

Search for the Infinity job definition (should be before line 380)

sed -n '300,390p' .github/workflows/tests.yml | grep -B 50 "Start to run test sdk on Infinity" | head -60

Repository: infiniflow/ragflow

Length of output: 44


🏁 Script executed:

Look for the job header and env section for the Infinity job

sed -n '250,380p' .github/workflows/tests.yml

Repository: infiniflow/ragflow

Length of output: 6200


Add explicit DOC_ENGINE=elasticsearch prefix for consistency with Infinity SDK test.

Both jobs define DOC_ENGINE at the job level (Infinity: line ~255, Elasticsearch: line 628), and these environment variables propagate to all steps automatically in GitHub Actions. However, the Infinity SDK test (line 429) explicitly sets DOC_ENGINE=infinity inline before pytest, while the Elasticsearch SDK test (line 804) omits this. For consistency and clarity across both test jobs, add the explicit prefix to the Elasticsearch command.

🔧 Suggested fix

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

source .venv/bin/activate && set -o pipefail; pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-es-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-es-sdk.xml test/testcases/test_sdk_api 2>&1 | tee es_sdk_test.log
source .venv/bin/activate && set -o pipefail; DOC_ENGINE=elasticsearch pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-es-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-es-sdk.xml test/testcases/test_sdk_api 2>&1 | tee es_sdk_test.log

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/tests.yml at line 804, The Elasticsearch SDK test command
on line 804 is missing an explicit DOC_ENGINE environment variable prefix that
should be set before the pytest execution for consistency with the Infinity SDK
test. Add DOC_ENGINE=elasticsearch before the pytest command in the line
containing junitxml=pytest-es-sdk.xml, following the same pattern used in the
Infinity SDK test where DOC_ENGINE=infinity is explicitly set inline before
pytest execution.

@6ba3i

@dosubot dosubot Bot added the size:XS

This PR changes 0-9 lines, ignoring generated files.

label

Jun 17, 2026

[coderabbitai[bot]](/apps/coderabbitai)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/tests.yml:
- Around line 285-300: The reserve_port_offset() function creates lock files
that may accumulate if jobs crash or are cancelled before cleanup reaches the
removal step, eventually exhausting the 60 retry attempts. Add a stale lock
cleanup mechanism before the retry loop starts: iterate through existing lock
files in PORT_LOCK_DIR, extract the timestamp from each file (the third field in
the format written to the file), compare it against the current time, and remove
any lock files older than a threshold such as 6 hours using the comparison logic
to prevent stale locks from blocking port reservations.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:


ℹ️ Review info ⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0db6ef93-59d4-4d28-9cd3-ba16a6169a1b

📥 Commits

Reviewing files that changed from the base of the PR and between 763fc88 and bbd118c.

📒 Files selected for processing (1)

@6ba3i

@codecov

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.16%. Comparing base (b3ac03b) to head (0c5c5c0).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files

@@ Coverage Diff @@ ## main #16133 +/- ##

Coverage 93.16% 93.16%

Files 10 10
Lines 717 717
Branches 118 118

Hits 668 668
Misses 29 29
Partials 20 20

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

@6ba3i 6ba3i added the 🧪 test

Pull requests that update test cases.

label

Jun 17, 2026

@6ba3i

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})