reduce(...) over an inline aggregate expression may be evaluated incorrectly (original) (raw)
ArcadeDB version
Observed on Docker images:
arcadedata/arcadedb:26.4.1-SNAPSHOTarcadedata/arcadedb:26.4.2
Environment
- Host OS: Windows 10
- Architecture: x86_64
- Deployment: Docker
- ArcadeDB endpoint: HTTP
/api/v1/command/arcade - Request mode used for direct reproduction:
language: cypherserializer: studio
- Differential comparison target: Neo4j Docker
neo4j:latest
Describe the bug
ArcadeDB may evaluate reduce(...) incorrectly when the list operand is produced by an aggregate expression inline in the same projection.
The problem is not reduce(...) by itself, and it is not the aggregate by itself. The boundary is much narrower:
WITH ... AS ages RETURN reduce(... IN ages ...)worksRETURN reduce(... IN collect(...) ...)does notWITH ... AS age_sum RETURN reduce(... IN [age_sum] ...)worksRETURN reduce(... IN [sum(...)] ...)does not
So the issue appears to be specific to inline aggregate expressions nested directly inside the reduce(...) list operand.
To Reproduce
Setup:
CREATE (:Person {name:'Alice', age:30}); CREATE (:Person {name:'Bob', age:25}); CREATE (:Person {name:'Charlie', age:35});
Query:
MATCH (p:Person) RETURN p.name AS name, reduce(total = 0, n IN collect(p.age) | total + n) AS total_age_sum ORDER BY name;
Expected behavior
Each grouped row should reduce over its own one-element collected list:
Alice, 30
Bob, 25
Charlie, 35
Observed Neo4j result:
Alice, 30
Bob, 25
Charlie, 35
Actual behavior
Observed ArcadeDB 26.4.1-SNAPSHOT and 26.4.2 result:
Alice, 0
Bob, 0
Charlie, 0
So the inline collect(p.age) is not being fed into reduce(...) correctly.
Control case
If the aggregate result is materialized first and only then passed into reduce(...), ArcadeDB behaves normally:
MATCH (p:Person) WITH p.name AS name, collect(p.age) AS ages RETURN name, reduce(total = 0, n IN ages | total + n) AS total_age_sum ORDER BY name;
Observed result on Neo4j and ArcadeDB 26.4.1-SNAPSHOT / 26.4.2:
Alice, 30
Bob, 25
Charlie, 35
This makes the boundary clear: reduce(...) itself works, and collect(...) itself works, but the inline aggregate expression inside reduce(...) does not.
Stronger reproducer
The same family also appears with other inline aggregate expressions, not just collect(...).
For example:
MATCH (p:Person) RETURN p.name AS name, reduce(total = 0, n IN [sum(p.age)] | total + n) AS s ORDER BY name;
Observed Neo4j result:
Alice, 30
Bob, 25
Charlie, 35
Observed ArcadeDB 26.4.1-SNAPSHOT / 26.4.2 result:
Alice, 30
Bob, 55
Charlie, 90
The same cumulative pattern also appears with count(*):
MATCH (p:Person) RETURN p.name AS name, reduce(total = 0, n IN [count(*)] | total + n) AS s ORDER BY name;
Observed Neo4j result:
Alice, 1
Bob, 1
Charlie, 1
Observed ArcadeDB 26.4.1-SNAPSHOT / 26.4.2 result:
Alice, 1
Bob, 2
Charlie, 3
So this is not limited to collect(...). More generally, inline aggregate expressions inside reduce(...) may be evaluated against the wrong state.
Additional failure mode
Without a grouping key, ArcadeDB can also silently lose the projected result column:
MATCH (p:Person) RETURN reduce(total = 0, n IN collect(p.age) | total + n) AS total_age_sum;
Observed Neo4j result:
Observed ArcadeDB 26.4.1-SNAPSHOT / 26.4.2 result:
<row present, but `total_age_sum` missing>
This suggests the same underlying issue can show up either as a wrong numeric result or as a dropped projection value, depending on the aggregation shape.