Performance in SimpleJobRepository.update(StepExecution) since Spring Batch 6.x (original) (raw)
Please do a quick search on Github issues first, there might be already a duplicate issue for the one you are about to create.
If the bug is trivial, just go ahead and create the issue. Otherwise, please take a few moments and fill in the following sections:
Bug description
SimpleJobRepository.update(StepExecution) was changed in Spring Batch 6.x to call getStepExecution(stepExecution.getId()), which triggers a query against BATCH_JOB_EXECUTION_PARAMS on every chunk commit. On databases such as Oracle, which do not automatically create an index for foreign key constraints, JOB_EXECUTION_ID has no index on BATCH_JOB_EXECUTION_PARAMS, resulting in a full table scan on every chunk commit. As historical job execution data accumulates, this causes a severe and growing performance regression.
For reference, in Spring Batch 5.x, update(StepExecution) called checkForInterruption() → synchronizeStatus(), which issued only a single lightweight query:
SELECT VERSION FROM BATCH_JOB_EXECUTION WHERE JOB_EXECUTION_ID = ?
In Spring Batch 6.x, getStepExecution(stepExecution.getId()) triggers the following 8 queries on every chunk commit:
| # | Table | Description |
|---|---|---|
| 1 | BATCH_JOB_EXECUTION + BATCH_STEP_EXECUTION | fetch JOB_EXECUTION_ID |
| 2 | BATCH_JOB_INSTANCE + BATCH_JOB_EXECUTION | fetch JOB_INSTANCE_ID |
| 3 | BATCH_JOB_INSTANCE | fetch job instance |
| 4 | BATCH_JOB_EXECUTION_PARAMS | fetch job parameters ⚠️ full table scan on Oracle |
| 5 | BATCH_JOB_EXECUTION | fetch job execution |
| 6 | BATCH_STEP_EXECUTION | fetch step execution |
| 7 | BATCH_STEP_EXECUTION_CONTEXT | fetch step execution context |
| 8 | BATCH_JOB_EXECUTION_CONTEXT | fetch job execution context |
In our environment, per-chunk latency increased from ~10ms (Spring Batch 5.x) to ~200ms (Spring Batch 6.x), caused by the full table scan on BATCH_JOB_EXECUTION_PARAMS.
Call chain:
TaskletStep (on chunk commit)
→ jobRepository.update(stepExecution)
→ SimpleJobRepository.update(StepExecution)
→ getStepExecution(id) // added in 6.x
→ JdbcStepExecutionDao
→ getJobExecutionId() // SQL #1
→ JdbcJobExecutionDao.getJobExecution()
→ getJobInstanceId() // SQL #2
→ getJobInstance() // SQL #3
→ getJobParameters() // SQL #4 ← BATCH_JOB_EXECUTION_PARAMS
→ GET_EXECUTION_BY_ID // SQL #5
→ getStepExecution() // SQL #6
→ fillStepExecutionDependencies()
→ getExecutionContext(stepExecution) // SQL #7
→ getExecutionContext(jobExecution) // SQL #8
Environment
- Spring Batch version: 6.0.x (Spring Boot 4.0.4)
- Java version: 17+
- Database: Oracle (primary impact; full table scan on
BATCH_JOB_EXECUTION_PARAMSdue to missing index onJOB_EXECUTION_ID) - Regression introduced by: upgrade from Spring Batch 5.2.x → 6.x (Spring Boot 3.5.x → 4.0.x)
Steps to reproduce
- Use Spring Batch 6.x with an Oracle database
- Accumulate a significant number of historical job executions (so
BATCH_JOB_EXECUTION_PARAMShas many rows) - Run a chunk-oriented job with a large number of chunks
- Observe per-chunk latency via SQL trace logs (
logging.level.org.springframework.jdbc=TRACE)
Expected behavior
update(StepExecution) should complete in a similar time as Spring Batch 5.x (~10ms per chunk).
Minimal Complete Reproducible example
// Note: this is a pseudo-code to illustrate the scenario
@Test void updateStepExecutionShouldNotQueryJobExecutionParamsPerChunk() { // Given: a StepExecution with many rows in BATCH_JOB_EXECUTION_PARAMS // (simulate by inserting historical job execution params rows)
// When: jobRepository.update(stepExecution) is called (as happens on each chunk commit)
long start = System.currentTimeMillis();
for (int i = 0; i < 100; i++) {
jobRepository.update(stepExecution);
}
long elapsed = System.currentTimeMillis() - start;
// Then: should complete quickly, not proportional to BATCH_JOB_EXECUTION_PARAMS table size
assertThat(elapsed).isLessThan(1000);}
Suggestion
Adding an index on JOB_EXECUTION_ID in BATCH_JOB_EXECUTION_PARAMS would mitigate the full table scan issue on Oracle. It would be helpful if the official schema scripts (schema-oracle.sql, etc.) included this index, or at least if the documentation mentioned it as a recommendation for production environments.
CREATE INDEX BATCH_JOB_EXEC_PARAMS_IDX ON BATCH_JOB_EXECUTION_PARAMS(JOB_EXECUTION_ID);