Expressions — CodeQL (original) (raw)

An expression evaluates to a set of values and has a type.

For example, the expression 1 + 2evaluates to the integer 3 and the expression "QL" evaluates to the string "QL". 1 + 2 has type int and "QL" has type string.

The following sections describe the expressions that are available in QL.

Variable references

A variable reference is the name of a declared variable. This kind of expression has the same type as the variable it refers to.

For example, if you have declared the variables int i and LocalScopeVariable lsv, then the expressions i and lsv have types int and LocalScopeVariable respectively.

You can also refer to the variables this and result. These are used in predicate definitions and act in the same way as other variable references.

Literals

You can express certain values directly in QL, such as numbers, booleans, and strings.

Parenthesized expressions

A parenthesized expression is an expression surrounded by parentheses, ( and ). This expression has exactly the same type and values as the original expression. Parentheses are useful for grouping expressions together to remove ambiguity and improve readability.

Ranges

A range expression denotes a range of values ordered between two expressions. It consists of two expressions separated by .. and enclosed in brackets ([ and ]). For example, [3 .. 7] is a valid range expression. Its values are any integers between3 and 7 (including 3 and 7 themselves).

In a valid range, the start and end expression are integers, floats, or dates. If one of them is a date, then both must be dates. If one of them is an integer and the other a float, then both are treated as floats.

Set literal expressions

A set literal expression allows the explicit listing of a choice between several values. It consists of a comma-separated collection of expressions that are enclosed in brackets ([ and ]). For example, [2, 3, 5, 7, 11, 13, 17, 19, 23, 29] is a valid set literal expression. Its values are the first ten prime numbers.

The values of the contained expressions need to be of compatible types for a valid set literal expression. Furthermore, at least one of the set elements has to be of a type that is a supertype of the types of all the other contained expressions.

Super expressions

Super expressions in QL are similar to super expressions in other programming languages, such as Java. You can use them in predicate calls, when you want to use the predicate definition from a supertype. In practice, this is useful when a predicate inherits two definitions from its supertypes. In that case, the predicate must overridethose definitions to avoid ambiguity. However, if you want to use the definition from a particular supertype instead of writing a new definition, you can use a super expression.

In the following example, the class C inherits two definitions of the predicategetANumber()—one from A and one from B. Instead of overriding both definitions, it uses the definition from B.

class A extends int { A() { this = 1 } int getANumber() { result = 2 } }

class B extends int { B() { this = 1 } int getANumber() { result = 3 } }

class C extends A, B { // Need to define int getANumber(); otherwise it would be ambiguous override int getANumber() { result = B.super.getANumber() } }

from C c select c, c.getANumber()

The result of this query is 1, 3.

Calls to predicates (with result)

Calls to predicates with results are themselves expressions, unlike calls to predicates without results which are formulas. For more information, see “Calls to predicates.”

A call to a predicate with result evaluates to the values of the result variable of the called predicate.

For example a.getAChild() is a call to a predicate getAChild() on a variable a. This call evaluates to the set of children of a.

Aggregations

An aggregation is a mapping that computes a result value from a set of input values that are specified by a formula.

The general syntax is:

( | | )

The variables declared in <variable declarations> are called the aggregation variables.

Ordered aggregates (namely min, max, rank, concat, and strictconcat) are ordered by their <expression> values by default. The ordering is either numeric (for numeric types) or lexicographic (for strings). Lexicographic ordering is based on the Unicode valueof each character.

To specify a different order, follow <expression> with the keywords order by, then one or more comma-separated expressions that specify the order, and optionally the keyword asc or desc after each expression (to determine whether to order the expression in ascending or descending order). If you don’t specify an ordering, it defaults to asc. For example, order by o.getName() asc, o.getSize() descmight be used to order some object by name, breaking ties by descending size.

The following aggregates are available in QL:

Evaluation of aggregates

In general, aggregate evaluation involves the following steps:

  1. Determine the input variables: these are the aggregation variables declared in <variable declarations> and also the variables declared outside of the aggregate that are used in some component of the aggregate.
  2. Generate all possible distinct tuples (combinations) of the values of input variables such that the<formula> holds true. Note that the same value of an aggregate variable may appear in multiple distinct tuples. All such occurrences of the same value are treated as distinct occurrences when processing tuples.
  3. Apply <expression> on each tuple and collect the generated (distinct) values. The application of <expression> on a tuple may result in generating more than one value.
  4. Apply the aggregation function on the values generated in step 3 to compute the final result.

Let us apply these steps to the sum aggregate in the following query:

select sum(int i, int j | exists(string s | s = "hello".charAt(i)) and exists(string s | s = "world!".charAt(j)) | i)

  1. Input variables: i, j.
  2. All possible tuples (<value of i>, <value of j>) satisfying the given condition:(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 0), (1, 1), ..., (4, 5).
    30 tuples are generated in this step.
  3. Apply the <expression> i on all tuples. This means selecting all values of i from all tuples: 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4.
  4. Apply the aggregation function sum on the above values to get the final result 60.

If we change <expression> to i + j in the above query, the query result is 135 since applying i + j on all tuples results in following values:0, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 7, 3, 4, 5, 6, 7, 8, 4, 5, 6, 7, 8, 9.

Next, consider the following query:

select count(string s | s = "hello" | s.charAt(_))

  1. s is the input variable of the aggregate.
  2. A single tuple "hello" is generated in this step.
  3. The <expression> charAt(_) is applied on this tuple. The underscore _ in charAt(_)is a don’t-care expression, which represents any value.s.charAt(_) generates four distinct values h, e, l, o.
  4. Finally, count is applied on these values, and the query returns 4.

Omitting parts of an aggregation

The three parts of an aggregation are not always required, so you can often write the aggregation in a simpler form:

  1. If you want to write an aggregation of the form <aggregate>(<type> v | <expression> = v | v), then you can omit the <variable declarations> and <formula> parts and write it as follows:
    ()
    For example, the following aggregations determine how many times the letter l occurs in string "hello". These forms are equivalent:
    count(int i | i = "hello".indexOf("l") | i)
    count("hello".indexOf("l"))
  2. If there is only one aggregation variable, you can omit the <expression> part instead. In this case, the expression is considered to be the aggregation variable itself. For example, the following aggregations are equivalent:
    avg(int i | i = [0 .. 3] | i)
    avg(int i | i = [0 .. 3])
  3. As a special case, you can omit the <expression> part from count even if there is more than one aggregation variable. In such a case, it counts the number of distinct tuples of aggregation variables that satisfy the formula. In other words, the expression part is considered to be the constant 1. For example, the following aggregations are equivalent:
    count(int i, int j | i in [1 .. 3] and j in [1 .. 3] | 1)
    count(int i, int j | i in [1 .. 3] and j in [1 .. 3])
  4. You can omit the <formula> part, but in that case you should include two vertical bars:
    ( | | )
    This is useful if you don’t want to restrict the aggregation variables any further. For example, the following aggregation returns the maximum number of lines across all files:
    max(File f | | f.getTotalNumberOfLines())
  5. Finally, you can also omit both the <formula> and <expression> parts. For example, the following aggregations are equivalent ways to count the number of files in a database:
    count(File f | any() | 1)
    count(File f | | 1)
    count(File f)

Monotonic aggregates

In addition to standard aggregates, QL also supports monotonic aggregates. Monotonic aggregates differ from standard aggregates in the way that they deal with the values generated by the <expression> part of the formula:

In general, if the <expression> is total and functional, then monotonic aggregates are equivalent to standard aggregates. Results differ when there is not precisely one <expression>value for each value generated by the <formula>:

Example of monotonic aggregates

Consider this query:

string getPerson() { result = "Alice" or result = "Bob" or result = "Charles" or result = "Diane" } string getFruit(string p) { p = "Alice" and result = "Orange" or p = "Alice" and result = "Apple" or p = "Bob" and result = "Apple" or p = "Charles" and result = "Apple" or p = "Charles" and result = "Banana" } int getPrice(string f) { f = "Apple" and result = 100 or f = "Orange" and result = 100 or f = "Orange" and result = 1 }

predicate nonmono(string p, int cost) { p = getPerson() and cost = sum(string f | f = getFruit(p) | getPrice(f)) }

language[monotonicAggregates] predicate mono(string p, int cost) { p = getPerson() and cost = sum(string f | f = getFruit(p) | getPrice(f)) }

from string variant, string person, int cost where variant = "default" and nonmono(person, cost) or variant = "monotonic" and mono(person, cost) select variant, person, cost order by variant, person

The query produces these results:

variant person cost
default Alice 201
default Bob 100
default Charles 100
default Diane 0
monotonic Alice 101
monotonic Alice 200
monotonic Bob 100
monotonic Diane 0

The two variants of the aggregate semantics differ in what happens when getPrice(f) has either multiple results or no results for a given f.

In this query, oranges are available at two different prices, and the default sum aggregate returns a single line where Alice buys an orange at a price of 100, another orange at a price of 1, and an apple at a price of 100, totalling 201. On the other hand, in the_monotonic_ semantics for sum, Alice always buys one orange and one apple, and a line of output is produced for each way she can complete her shopping list.

If there had been two different prices for apples too, the monotonicsum would have produced four output lines for Alice.

Charles wants to buy a banana, which is not for sale at all. In the default case, the sum produced for Charles includes the cost of the apple he can buy, but there’s no line for Charles in the monotonicsum output, because there is no way for Charles to buy one apple plus one banana.

(Diane buys no fruit at all, and in both variants her total cost is 0. The strictsum aggregate would have excluded her from the results in both cases).

In actual QL practice, it is quite rare to use monotonic aggregates with the goal of having multiple output lines, as in the “Alice” case of this example. The more significant point is the “Charles” case: As long as there’s no price for bananas, no output is produced for him. This means that if we later do learn of a banana price, we don’t need to remove any output tuple already produced. The importance of this is that the monotonic aggregate behavior works well with a fixpoint-based semantics for recursion, so it will be meaningful to let the getPrice predicate be mutually recursive with the count aggregate itself. (On the other hand, getFruit still cannot be allowed to be recursive, because adding another fruit to someone’s shopping list would invalidate the total costs we already knew for them).

This opportunity to use recursion is the main practical reason for requesting monotonic semantics of aggregates.

Recursive monotonic aggregates

Monotonic aggregates may be used recursively, but the recursive call may only appear in the expression, and not in the range. The recursive semantics for aggregates are the same as the recursive semantics for the rest of QL. For example, we might define a predicate to calculate the distance of a node in a graph from the leaves as follows:

language[monotonicAggregates] int depth(Node n) { if not exists(n.getAChild()) then result = 0 else result = 1 + max(Node child | child = n.getAChild() | depth(child)) }

Here the recursive call is in the expression, which is legal. The recursive semantics for aggregates are the same as the recursive semantics for the rest of QL. If you understand how aggregates work in the non-recursive case then you should not find it difficult to use them recursively. However, it is worth seeing how the evaluation of a recursive aggregation proceeds.

Consider the depth example we just saw with the following graph as input (arrows point from children to parents):

image0

Then the evaluation of the depth predicate proceeds as follows:

Stage depth Comments
0 We always begin with the empty set.
1 (0, b), (0, d), (0, e) The nodes with no children have depth 0. The recursive step for a and c fails to produce a value, since some of their children do not have values for depth.
2 (0, b), (0, d), (0, e), (1, c) The recursive step for c succeeds, since depth now has a value for all its children (d and e). The recursive step for a still fails.
3 (0, b), (0, d), (0, e), (1, c), (2, a) The recursive step for a succeeds, since depth now has a value for all its children (b and c).

Here, we can see that at the intermediate stages it is very important for the aggregate to fail if some of the children lack a value - this prevents erroneous values being added.

Any

The general syntax of an any expression is similar to the syntax of anaggregation, namely:

any( | | )

You should always include the variable declarations, but theformula and expression parts are optional.

The any expression denotes any values that are of a particular form and that satisfy a particular condition. More precisely, the any expression:

  1. Introduces temporary variables.
  2. Restricts their values to those that satisfy the <formula> part (if it’s present).
  3. Returns <expression> for each of those variables. If there is no <expression> part, then it returns the variables themselves.

The following table lists some examples of different forms of any expressions:

Expression Values
any(File f) all Files in the database
any(Element e | e.getName()) the names of all Elements in the database
any(int i | i = [0 .. 3]) the integers 0, 1, 2, and 3
any(int i | i = [0 .. 3] i * i) the integers 0, 1, 4, and 9

Unary operations

A unary operation is a minus sign (-) or a plus sign (+) followed by an expression of typeint, float or QlBuiltins::BigInt. For example:

-6.28 +(10 - 4) +avg(float f | f = 3.4 or f = -9.8) -sum(int i | i in [0 .. 9] | i * i)

A plus sign leaves the values of the expression unchanged, while a minus sign takes the arithmetic negations of the values.

Binary operations

A binary operation consists of an expression, followed by a binary operator, followed by another expression. For example:

5 % 2 (9 + 1) / (-2) "Q" + "L" 2 * min(float f | f in [-3 .. 3])

You can use the following binary operators in QL:

Name Symbol
Addition/concatenation +
Multiplication *
Division /
Subtraction -
Modulo %

If both expressions are numbers, these operators act as standard arithmetic operators. For example, 10.6 - 3.2 has value 7.4, 123.456 * 0 has value 0.0, and 9 % 4 has value 1 (the remainder after dividing 9 by 4). If both operands are subtypes of int or QlBuiltins::BigInt, then the result type isint or QlBuiltins::BigInt, respectively. Otherwise, if both operands are subtypes offloat, then the result type is float.

You can also use + as a string concatenation operator. In this case, at least one of the expressions must be a string—the other expression is implicitly converted to a string using thetoString() predicate. The two expressions are concatenated, and the result is a string. For example, the expression 221 + "B" has value "221B".

Casts

A cast allows you to constrain the type of an expression. This is similar to casting in other languages, for example in Java.

You can write a cast in two ways:

Note that a postfix cast is equivalent to a prefix cast surrounded by parentheses—x.(Foo)is exactly equivalent to ((Foo)x).

Casts are useful if you want to call a member predicate that is only defined for a more specific type. For example, the following query selects Javaclassesthat have a direct supertype called “List”:

import java

from Type t where t.(Class).getASupertype().hasName("List") select t

Since the predicate getASupertype() is defined for Class, but not for Type, you can’t call t.getASupertype() directly. The cast t.(Class) ensures that t is of type Class, so it has access to the desired predicate.

If you prefer to use a prefix cast, you can rewrite the where part as:

where ((Class)t).getASupertype().hasName("List")

Don’t-care expressions

This is an expression written as a single underscore _. It represents any value. (You “don’t care” what the value is.)

Unlike other expressions, a don’t-care expression does not have a type. In practice, this means that _ doesn’t have any member predicates, so you can’t call _.somePredicate().

For example, the following query selects all the characters in the string "hello":

from string s where s = "hello".charAt(_) select s

The charAt(int i) predicate is defined on strings and usually takes an int argument. Here the don’t care expression _ is used to tell the query to select characters at every possible index. The query returns the values h, e, l, and o.