LSMTree compaction creates duplicate timestamped indexes that are not cleaned up (original) (raw)

Description

When creating indexes on large datasets (33.8M records), ArcadeDB's LSMTree compaction process creates multiple timestamped duplicate indexes that persist in the database instead of being cleaned up after compaction completes.

Steps to Reproduce

  1. Import a large dataset (e.g., MovieLens ml-latest with 33,832,163 ratings)
  2. Create indexes on the imported data:

CREATE INDEX ON Movie (movieId) UNIQUE CREATE INDEX ON Rating (userId) NOTUNIQUE CREATE INDEX ON Rating (movieId) NOTUNIQUE CREATE INDEX ON Link (movieId) UNIQUE CREATE INDEX ON Tag (movieId) NOTUNIQUE

  1. Query the schema to see all indexes:

SELECT name, typeName, properties, unique, automatic FROM schema:indexes ORDER BY typeName, name

Expected Behavior

Expected 5 indexes total (one per CREATE INDEX command).

Actual Behavior

Found 80 indexes instead of 5 - with 15+ timestamped duplicates per table:

Analysis

Based on source code review:

  1. LSMTreeIndexMutable.java (line 168):

public LSMTreeIndexCompacted createNewForCompaction() { final String newName = componentName.substring(0, last_) + "_" + System.nanoTime(); return new LSMTreeIndexCompacted(..., newName, ...); }

  1. LSMTreeIndex.java (line 548):

protected LSMTreeIndexMutable splitIndex(...) { final String newName = mutable.getName().substring(0, last_) + "_" + System.nanoTime(); final LSMTreeIndexMutable newMutableIndex = new LSMTreeIndexMutable(..., newName, ...); }

These timestamped index files are created during compaction but appear not to be properly cleaned up after compaction completes.

Impact

Environment

Logs

During index creation on large dataset:

⚠️ Index creation failed: Command failed: com.arcadedb.exception.NeedRetryException:
Cannot create a new index while asynchronous tasks are running (LSMTreeIndexCompactor)

LSMTree compaction logs show:

LSMTreeIndex 'Movie[movieId]' compacted 50 pages, remaining 0 pages
(totalKeys=289037 totalValues=2251732)

Questions

  1. Are timestamped index files intended to be temporary during compaction?
  2. Should they be automatically cleaned up after compaction completes?
  3. Is there a configuration to control compaction cleanup behavior?

Suggested Fix

After compaction completes, cleanup logic should:

  1. Identify timestamped index files matching pattern {indexName}_\d+
  2. Remove them from schema if they're marked as temporary/compaction artifacts
  3. Delete the corresponding physical files

Workaround

Users can manually drop timestamped indexes:

DROP INDEX Movie_0_172987397898984; -- Repeat for all timestamped duplicates

However, this requires knowing which indexes are duplicates vs. legitimate user-created indexes.[+] Tested on 25.10.1-SNAPSHOT