CodeQL library for JavaScript — CodeQL (original) (raw)

When you’re analyzing a JavaScript program, you can make use of the large collection of classes in the CodeQL library for JavaScript.

Introducing the library

The CodeQL library for JavaScript presents information about JavaScript source code at different levels:

Note that representations above the textual level (for example the lexical representation or the flow graphs) are only available for JavaScript code that does not contain fatal syntax errors. For code with such errors, the only information available is at the textual level, as well as information about the errors themselves.

Additionally, there is library support for working with HTML documents, JSON, and YAML data, JSDoc comments, and regular expressions.

Textual level

At its most basic level, a JavaScript code base can simply be viewed as a collection of files organized into folders, where each file is composed of zero or more lines of text.

Note that the textual content of a program is not included in the CodeQL database unless you specifically request it during extraction.

Files and folders

In the CodeQL libraries, files are represented as entities of class File, and folders as entities of class Folder, both of which are subclasses of class Container.

Class Container provides the following member predicates:

Note that while getAFile and getAFolder are declared on class Container, they currently only have results for Folders.

Both files and folders have paths, which can be accessed by the predicate Container.getAbsolutePath(). For example, if f represents a file with the path /home/user/project/src/index.js, then f.getAbsolutePath() evaluates to the string "/home/user/project/src/index.js", while f.getParentContainer().getAbsolutePath() returns "/home/user/project/src".

These paths are absolute file system paths. If you want to obtain the path of a file relative to the source location in the CodeQL database, use Container.getRelativePath() instead. Note, however, that a database may contain files that are not located underneath the source location; for such files, getRelativePath() will not return anything.

The following member predicates of class Container provide more information about the name of a file or folder:

For example, the following query computes, for each folder, the number of JavaScript files (that is, files with extension js) contained in the folder:

import javascript

from Folder d select d.getRelativePath(), count(File f | f = d.getAFile() and f.getExtension() = "js")

When you run the query on most projects, the results include folders that contain files with a js extension and folders that don’t.

Locations

Most entities in a CodeQL database have an associated source location. Locations are identified by five pieces of information: a file, a start line, a start column, an end line, and an end column. Line and column counts are 1-based (so the first character of a file is at line 1, column 1), and the end position is inclusive.

All entities associated with a source location belong to the class Locatable. The location itself is modeled by the class Location and can be accessed through the member predicate Locatable.getLocation(). The Location class provides the following member predicates:

Lines

Lines of text in files are represented by the class Line. This class offers the following member predicates:

Note that, as mentioned above, the textual representation of the program is not included in the CodeQL database by default.

Lexical level

A slightly more structured view of a JavaScript program is provided by the classes Token and Comment, which represent tokens and comments, respectively.

Tokens

The most important member predicates of class Token are as follows:

The Token class has nine subclasses, each representing a particular kind of token:

As an example of a query operating entirely on the lexical level, consider the following query, which finds consecutive comma tokens arising from an omitted element in an array expression:

import javascript

class CommaToken extends PunctuatorToken { CommaToken() { getValue() = "," } }

from CommaToken comma where comma.getNextToken() instanceof CommaToken select comma, "Omitted array elements are bad style."

If the query returns no results, this pattern isn’t used in the projects that you analyzed.

You can use predicate Locatable.getFirstToken() and Locatable.getLastToken() to access the first and last token (if any) belonging to an element with a source location.

Syntactic level

The majority of classes in the JavaScript library is concerned with representing a JavaScript program as a collection of abstract syntax trees (ASTs).

The class ASTNode contains all entities representing nodes in the abstract syntax trees and defines generic tree traversal predicates:

Note

These predicates should only be used to perform generic AST traversal. To access children of specific AST node types, the specialized predicates introduced below should be used instead. In particular, queries should not rely on the numeric indices of child nodes relative to their parent nodes: these are considered an implementation detail that may change between versions of the library.

Top-levels

From a syntactic point of view, each JavaScript program is composed of one or more top-level code blocks (or top-levels for short), which are blocks of JavaScript code that do not belong to a larger code block. Top-levels are represented by the class TopLevel and its subclasses:

Every TopLevel class is contained in a File class, but a single File may contain more than one TopLevel. To go from a TopLevel tl to its File, use tl.getFile(); conversely, for a File f, predicate f.getATopLevel() returns a top-level contained in f. For every AST node, predicate ASTNode.getTopLevel() can be used to find the top-level it belongs to.

The TopLevel class additionally provides the following member predicates:

Note

By default, GitHub code scanning filters out alerts in minified top-levels, since they are often hard to interpret. When you write your own queries in Visual Studio Code, this filtering is not done automatically, so you may want to explicitly add a condition of the form and not e.getTopLevel().isMinified() or similar to your query to exclude results in minified code.

Statements and expressions

The most important subclasses of ASTNode besides TopLevel are Stmt and Expr, which, together with their subclasses, represent statements and expressions, respectively. This section briefly discusses some of the more important classes and predicates. For a full reference of all the subclasses of Stmt and Expr and their API, seeStmt.qll and Expr.qll.

Stmt and Expr share a common superclass ExprOrStmt which is useful for queries that should operate either on statements or on expressions, but not on any other AST nodes.

As an example of how to use expression AST nodes, here is a query that finds expressions of the form e + f >> g; such expressions should be rewritten as (e + f) >> g to clarify operator precedence:

import javascript

from ShiftExpr shift, AddExpr add where add = shift.getAnOperand() select add, "This expression should be bracketed to clarify precedence rules."

Functions

JavaScript provides several ways of defining functions: in ECMAScript 5, there are function declaration statements and function expressions, and ECMAScript 2015 adds arrow function expressions. These different syntactic forms are represented by the classes FunctionDeclStmt (a subclass of Stmt), FunctionExpr (a subclass of Expr) and ArrowFunctionExpr (also a subclass ofExpr), respectively. All three are subclasses of Function, which provides common member predicates for accessing function parameters or the function body:

As an example, here is a query that finds all expression closures:

import javascript

from FunctionExpr fe where fe.getBody() instanceof Expr select fe, "Use arrow expressions instead of expression closures."

As another example, this query finds functions that have two parameters that bind the same variable:

import javascript

from Function fun, Parameter p, Parameter q, int i, int j where p = fun.getParameter(i) and q = fun.getParameter(j) and i < j and p.getAVariable() = q.getAVariable() select fun, "This function has two parameters that bind the same variable."

Classes

Classes can be defined either by class declaration statements, represented by the CodeQL class ClassDeclStmt (which is a subclass of Stmt), or by class expressions, represented by the CodeQL class ClassExpr (which is a subclass of Expr). Both of these classes are also subclasses of ClassDefinition, which provides common member predicates for accessing the name of a class, its superclass, and its body:

Note that class fields are not a standard language feature yet, so details of their representation may change.

Method definitions are represented by the class MethodDefinition, which (like its counterpart FieldDefinition for fields) is a subclass of MemberDefinition. That class provides the following important member predicates:

There are three classes for modeling special methods: ConstructorDefinition models constructors, while GetterMethodDefinition and SetterMethodDefinition model getter and setter methods, respectively.

Declarations and binding patterns

Variables are declared by declaration statements (class DeclStmt), which come in three flavors: var statements (represented by class VarDeclStmt), const statements (represented by class ConstDeclStmt), and let statements (represented by class LetStmt). Every declaration statement has one or more declarators, represented by class VariableDeclarator.

Each declarator consists of a binding pattern, returned by predicate VariableDeclarator.getBindingPattern(), and an optional initializing expression, returned by VariableDeclarator.getInit().

Often, the binding pattern is a simple identifier, as in var x = 42. In ECMAScript 2015 and later, however, it can also be a more complex destructuring pattern, as in var [x, y] = arr.

The various kinds of binding patterns are represented by class BindingPattern and its subclasses:

Here is an example of a query to find declaration statements that declare the same variable more than once, excluding results in minified code:

import javascript

from DeclStmt ds, VariableDeclarator d1, VariableDeclarator d2, Variable v, int i, int j where d1 = ds.getDecl(i) and d2 = ds.getDecl(j) and i < j and v = d1.getBindingPattern().getAVariable() and v = d2.getBindingPattern().getAVariable() and not ds.getTopLevel().isMinified() select ds, "Variable " + v.getName() + " is declared both @and@ and @and@.", d1, "here", d2, "here"

This is not a common problem, so you may not find any results in your own projects.

Notice the use of not ... isMinified() here and in the next few queries. This excludes any results found in minified code. If you delete and not ds.getTopLevel().isMinified() and re-run the query, two results in minified code in the meteor/meteor project are reported.

Properties

Properties in object literals are represented by class Property, which is also a subclass of ASTNode, but neither of Expr nor of Stmt.

Class Property has two subclasses ValueProperty and PropertyAccessor, which represent, respectively, normal value properties and getter/setter properties. Class PropertyAccessor, in turn, has two subclasses PropertyGetter and PropertySetter representing getters and setters, respectively.

The predicates Property.getName() and Property.getInit() provide access to the defined property’s name and its initial value. For PropertyAccessor and its subclasses, getInit() is overloaded to return the getter/setter function.

As an example of a query involving properties, consider the following query that flags object expressions containing two identically named properties, excluding results in minified code:

import javascript

from ObjectExpr oe, Property p1, Property p2, int i, int j where p1 = oe.getProperty(i) and p2 = oe.getProperty(j) and i < j and p1.getName() = p2.getName() and not oe.getTopLevel().isMinified() select oe, "Property " + p1.getName() + " is defined both @and@ and @and@.", p1, "here", p2, "here"

Modules

The JavaScript library has support for working with ECMAScript 2015 modules, as well as legacy CommonJS modules (still commonly employed by Node.js code bases) and AMD-style modules. The classes ES2015Module, NodeModule, and AMDModule represent these three types of modules, and all three extend the common superclass Module.

The most important member predicates defined by Module are:

Moreover, there is a class Import that models both ECMAScript 2015-style import declarations and CommonJS/AMD-style require calls; its member predicate Import.getImportedModule provides access to the module the import refers to, if it can be determined statically.

Name binding

Name binding is modeled in the JavaScript libraries using four concepts: scopes, variables, variable declarations, and variable accesses, represented by the classes Scope, Variable, VarDecl and VarAccess, respectively.

Scopes

In ECMAScript 5, there are three kinds of scopes: the global scope (one per program), function scopes (one per function), and catch clause scopes (one per catch clause). These three kinds of scopes are represented by the classes GlobalScope, FunctionScope and CatchScope. ECMAScript 2015 adds block scopes for let-bound variables, which are also represented by class Scope, class expression scopes (ClassExprScope), and module scopes (ModuleScope).

Class Scope provides the following API:

Variables

The Variable class models all variables in a JavaScript program, including global variables, local variables, and parameters (both of functions and catch clauses), whether explicitly declared or not.

It is important not to confuse variables and their declarations: local variables may have more than one declaration, while global variables and the implicitly declared local arguments variable need not have a declaration at all.

Variable declarations and accesses

Variables may be declared by variable declarators, by function declaration statements and expressions, by class declaration statements or expressions, or by parameters of functions and catch clauses. While these declarations differ in their syntactic form, in each case there is an identifier naming the declared variable. We consider that identifier to be the declaration proper, and assign it the class VarDecl. Identifiers that reference a variable, on the other hand, are given the class VarAccess.

The most important predicates involving variables, their declarations, and their accesses are as follows:

As an example, consider the following query which finds distinct function declarations that declare the same variable, that is, two conflicting function declarations within the same scope (again excluding minified code):

import javascript

from FunctionDeclStmt f, FunctionDeclStmt g where f != g and f.getVariable() = g.getVariable() and not f.getTopLevel().isMinified() and not g.getTopLevel().isMinified() select f, g

Some projects declare conflicting functions of the same name and rely on platform-specific behavior to disambiguate the two declarations.

Control flow

A different program representation in terms of intraprocedural control flow graphs (CFGs) is provided by the classes in library CFG.qll.

Class ControlFlowNode represents a single node in the control flow graph, which is either an expression, a statement, or a synthetic control flow node. Note that Expr and Stmt do not inherit from ControlFlowNode at the CodeQL level, although their entity types are compatible, so you can explicitly cast from one to the other if you need to map between the AST-based and the CFG-based program representations.

There are two kinds of synthetic control flow nodes: entry nodes (class ControlFlowEntryNode), which represent the beginning of a top-level or function, and exit nodes (class ControlFlowExitNode), which represent their end. They do not correspond to any AST nodes, but simply serve as the unique entry point and exit point of a control flow graph. Entry and exit nodes can be accessed through the predicates StmtContainer.getEntry() and StmtContainer.getExit().

Most, but not all, top-levels and functions have another distinguished CFG node, the start node. This is the CFG node at which execution begins. Unlike the entry node, which is a synthetic construct, the start node corresponds to an actual program element: for top-levels, it is the first CFG node of the first statement; for functions, it is the CFG node corresponding to their first parameter or, if there are no parameters, the first CFG node of the body. Empty top-levels do not have a start node.

For most purposes, using start nodes is preferable to using entry nodes.

The structure of the control flow graph is reflected in the member predicates of ControlFlowNode:

Many control-flow-based analyses are phrased in terms of basic blocks rather than single control flow nodes, where a basic block is a maximal sequence of control flow nodes without branches or joins. The class BasicBlock from BasicBlocks.qll represents all such basic blocks. Similar to ControlFlowNode, it provides member predicates getASuccessor() and getAPredecessor() to navigate the control flow graph at the level of basic blocks, and member predicates getANode(), getNode(int), getFirstNode() and getLastNode() to access individual control flow nodes within a basic block. The predicateFunction.getEntryBB() returns the entry basic block in a function, that is, the basic block containing the function’s entry node. Similarly, Function.getStartBB() provides access to the start basic block, which contains the function’s start node. As for CFG nodes, getStartBB() should normally be preferred over getEntryBB().

As an example of an analysis using basic blocks, BasicBlock.isLiveAtEntry(v, u) determines whether variable v is live at the entry of the given basic block, and if so binds u to a use of v that refers to its value at the entry. We can use it to find global variables that are used in a function where they are not live (that is, every read of the variable is preceded by a write), suggesting that the variable was meant to be declared as a local variable instead:

import javascript

from Function f, GlobalVariable gv where gv.getAnAccess().getEnclosingFunction() = f and not f.getStartBB().isLiveAtEntry(gv, _) select f, "This function uses " + gv + " like a local variable."

Many projects have some variables which look as if they were intended to be local.

Data flow

Definitions and uses

Library DefUse.qll provides classes and predicates to determine def-use relationships between definitions and uses of variables.

Classes VarDef and VarUse contain all expressions that define and use a variable, respectively. For the former, you can use predicate VarDef.getAVariable() to find out which variables are defined by a given variable definition (recall that destructuring assignments in ECMAScript 2015 define several variables at the same time). Similarly, predicate VarUse.getVariable() returns the (single) variable being accessed by a variable use.

The def-use information itself is provided by predicate VarUse.getADef(), that connects a use of a variable to a definition of the same variable, where the definition may reach the use.

As an example, the following query finds definitions of local variables that are not used anywhere; that is, the variable is either not referenced at all after the definition, or its value is overwritten:

import javascript

from VarDef def, LocalVariable v where v = def.getAVariable() and not exists (VarUse use | def = use.getADef()) select def, "Dead store of local variable."

SSA

A more fine-grained representation of a program’s data flow based on Static Simple Assignment Form (SSA) is provided by the library semmle.javascript.SSA.

In SSA form, each use of a local variable has exactly one (SSA) definition that reaches it. SSA definitions are represented by class SsaDefinition. They are not AST nodes, since not every SSA definition corresponds to an explicit element in the source code.

Altogether, there are five kinds of SSA definitions:

  1. Explicit definitions (SsaExplicitDefinition): these simply wrap a VarDef, that is, a definition like x = 1 appearing explicitly in the source code.
  2. Implicit initializations (SsaImplicitInit): these represent the implicit initialization of local variables with undefined at the beginning of their scope.
  3. Phi nodes (SsaPhiNode): these are pseudo-definitions that merge two or more SSA definitions where necessary; see the Wikipedia page linked to above for an explanation.
  4. Variable captures (SsaVariableCapture): these are pseudo-definitions appearing at places in the code where the value of a captured variable may change without there being an explicit assignment, for example due to a function call.
  5. Refinement nodes (SsaRefinementNode): these are pseudo-definitions appearing at places in the code where something becomes known about a variable; for example, a conditional if (x === null) induces a refinement node at the beginning of its “then” branch recording the fact that x is known to be null there. (In the literature, these are sometimes known as “pi nodes.”)

Data flow nodes

Moving beyond just variable definitions and uses, library semmle.javascript.dataflow.DataFlow provides a representation of the program as a data flow graph. Its nodes are values of class DataFlow::Node, which has two subclasses ValueNode and SsaDefinitionNode. Nodes of the former kind wrap an expression or a statement that is considered to produce a value (specifically, a function or class declaration statement, or a TypeScript namespace or enum declaration). Nodes of the latter kind wrap SSA definitions.

You can use the predicate DataFlow::valueNode to convert an expression, function or class into its corresponding ValueNode, and similarly DataFlow::ssaDefinitionNode to map an SSA definition to its corresponding SsaDefinitionNode.

There is also an auxiliary predicate DataFlow::parameterNode that maps a parameter to its corresponding data flow node. (This is really just a convenience wrapper around DataFlow::ssaDefinitionNode, since parameters are also considered to be SSA definitions.)

Going in the other direction, there is a predicate ValueNode.getAstNode() for mapping from ValueNodes to ASTNodes, and SsaDefinitionNode.getSsaVariable() for mapping from SsaDefinitionNodes to SsaVariables. There is also a utility predicate Node.asExpr() that gets the underlying expression for a ValueNode, and is undefined for all nodes that do not correspond to an expression. (Note in particular that this predicate is not defined for ValueNodes wrapping function or class declaration statements!)

You can use the predicate DataFlow::Node.getAPredecessor() to find other data flow nodes from which values may flow into this node, and getASuccessor for the other direction.

For example, here is a query that finds all invocations of a method called send on a value that comes from a parameter named res, indicating that it is perhaps sending an HTTP response:

import javascript

from SimpleParameter res, DataFlow::Node resNode, MethodCallExpr send where res.getName() = "res" and resNode = DataFlow::parameterNode(res) and resNode.getASuccessor+() = DataFlow::valueNode(send.getReceiver()) and send.getMethodName() = "send" select send

Note that the data flow modeling in this library is intraprocedural, that is, flow across function calls and returns is not modeled. Likewise, flow through object properties and global variables is not modeled.

Type inference

The library semmle.javascript.dataflow.TypeInference implements a simple type inference for JavaScript based on intraprocedural, heap-insensitive flow analysis. Basically, the inference algorithm approximates the possible concrete runtime values of variables and expressions as sets of abstract values (represented by the class AbstractValue), each of which stands for a set of concrete values.

For example, there is an abstract value representing all non-zero numbers, and another representing all non-empty strings except for those that can be converted to a number. Both of these abstract values are fairly coarse approximations that represent very large sets of concrete values.

Other abstract values are more precise, to the point where they represent single concrete values: for example, there is an abstract value representing the concrete null value, and another representing the number zero.

There is a special group of abstract values called indefinite abstract values that represent all concrete values. The analysis uses these to handle expressions for which it cannot infer a more precise value, such as function parameters (as mentioned above, the analysis is intraprocedural and hence does not model argument passing) or property reads (the analysis does not model property values either).

Each indefinite abstract value is associated with a string value describing the cause of imprecision. In the above examples, the indefinite value for the parameter would have cause "call", while the indefinite value for the property would have cause "heap".

To check whether an abstract value is indefinite, you can use the isIndefinite member predicate. Its single argument describes the cause of imprecision.

Each abstract value has one or more associated types (CodeQL class InferredType corresponding roughly to the type tags computed by the typeof operator. The types are null, undefined, boolean, number, string, function, class, date and object.

To access the results of the type inference, use class DataFlow::AnalyzedNode: any DataFlow::Node can be cast to this class, and additionally there is a convenience predicate Expr::analyze that maps expressions directly to their corresponding AnalyzedNodes.

Once you have an AnalyzedNode, you can use predicate AnalyzedNode.getAValue() to access the abstract values inferred for it, and getAType() to get the inferred types.

For example, here is a query that looks for null checks on expressions that cannot, in fact, be null:

import javascript

from StrictEqualityTest eq, DataFlow::AnalyzedNode nd, NullLiteral null where eq.hasOperands(nd.asExpr(), null) and not nd.getAValue().isIndefinite(_) and not nd.getAValue() instanceof AbstractNull select eq, "Spurious null check."

To paraphrase, the query looks for equality tests eq where one operand is a null literal and the other some expression that we convert to an AnalyzedNode. If the type inference results for that node are precise (that is, none of the inferred values is indefinite) and (the abstract representation of) null is not among them, we flag eq.

You can add custom type inference rules by defining new subclasses of DataFlow::AnalyzedNode and overriding getAValue. You can also introduce new abstract values by extending the abstract class CustomAbstractValueTag, which is a subclass of string: each string belonging to that class induces a corresponding abstract value of type CustomAbstractValue. You can use the predicate CustomAbstractValue.getTag() to map from the abstract value to its tag. By implementing the abstract predicates of class CustomAbstractValueTag you can define the semantics of your custom abstract values, such as what primitive value they coerce to and what type they have.

Call graph

The JavaScript library implements a simple call graph construction algorithm to statically approximate the possible call targets of function calls and new expressions. Due to the dynamically typed nature of JavaScript and its support for higher-order functions and reflective language features, building static call graphs is quite difficult. Simple call graph algorithms tend to be incomplete, that is, they often fail to resolve all possible call targets. More sophisticated algorithms can suffer from the opposite problem of imprecision, that is, they may infer many spurious call targets.

The call graph is represented by the member predicate getACallee() of class DataFlow::InvokeNode, which computes possible callees of the given invocation, that is, functions that may at runtime be invoked by this expression.

Furthermore, there are three member predicates that indicate the quality of the callee information for this invocation:

As an example of a call-graph-based query, here is a query to find invocations for which the call graph builder could not find any callees, despite the analysis being complete for this invocation:

import javascript

from DataFlow::InvokeNode invk where not invk.isIncomplete() and not exists(invk.getACallee()) select invk, "Unable to find a callee for this invocation."

Inter-procedural data flow

The data flow graph-based analyses described so far are all intraprocedural: they do not take flow from function arguments to parameters or from a return to the function’s caller into account. The data flow library also provides a framework for constructing custom inter-procedural analyses.

We distinguish here between data flow proper, and taint tracking: the latter not only considers value-preserving flow (such as from variable definitions to uses), but also cases where one value influences (“taints”) another without determining it entirely. For example, in the assignment s2 = s1.substring(i), the value of s1 influences the value of s2, because s2 is assigned a substring of s1. In general, s2 will not be assigned s1 itself, so there is no data flow from s1 to s2, but s1 still taints s2.

It is a common pattern that we wish to specify data flow or taint analysis in terms of its sources (where flow starts), sinks (where it should be tracked), and barriers (also called sanitizers) where flow is interrupted. Sanitizers they are very common in security analyses: for example, an analysis that tracks the flow of untrusted user input into, say, a SQL query has to keep track of code that validates the input, thereby making it safe to use. Such a validation step is an example of a sanitizer.

A module implementing the signature DataFlow::ConfigSig may specify a data flow or taint analysis by implementing the following predicates:

Such a module can be passed to DataFlow::Global<...>. This will produce a module with a flow predicate that performs the actual flow tracking, starting at a source and looking for flow to a sink that does not pass through a barrier node.

For example, suppose that we are developing an analysis to find hard-coded passwords. We might write a simple query that looks for string constants flowing into variables named "password".

import javascript

module PasswordConfig implements DataFlow::ConfigSig { predicate isSource(DataFlow::Node nd) { nd.asExpr() instanceof StringLiteral }

predicate isSink(DataFlow::Node nd) { passwordVarAssign(_, nd) }

}

predicate passwordVarAssign(Variable v, DataFlow::Node nd) { v.getAnAssignedExpr() = nd.asExpr() and v.getName().toLowerCase() = "password" }

module PasswordFlow = DataFlow::Global;

Now we can rephrase our query to use PasswordFlow::flow:

from DataFlow::Node source, DataFlow::Node sink, Variable v where PasswordFlow::flow(_, sink) and passwordVarAssign(v, sink) select sink, "Password variable " + v + " is assigned a constant string."

Syntax errors

JavaScript code that contains syntax errors cannot usually be analyzed. For such code, the lexical and syntactic representations are not available, and hence no name binding information, call graph or control and data flow. All that is available in this case is a value of class JSParseError representing the syntax error. It provides information about the syntax error location (JSParseError is a subclass of Locatable) and the error message through predicate JSParseError.getMessage.

Note that for some very simple syntax errors the parser can recover and continue parsing. If this happens, lexical and syntactic information is available in addition to the JSParseError values representing the (recoverable) syntax errors encountered during parsing.

Frameworks

AngularJS

The semmle.javascript.frameworks.AngularJS library provides support for working with AngularJS (Angular 1.x) code. Its most important classes are:

HTTP framework libraries

The library semmle.javacript.frameworks.HTTP provides classes modeling common concepts from various HTTP frameworks.

Currently supported frameworks are Express, the standard Node.js http and https modules, Connect, Koa, Hapi and Restify.

The most important classes include (all in module HTTP):

For each framework library, there is a corresponding CodeQL library (for example semmle.javacript.frameworks.Express) that instantiates the above classes for that framework and adds framework-specific classes.

Node.js

The semmle.javascript.NodeJS library provides support for working with Node.js modules through the following classes:

As an example of the use of these classes, here is a query that counts for every module how many other modules it imports:

import javascript

from NodeModule m select m, count(m.getAnImportedModule())

When you analyze a project, for each module you can see how many other modules it imports.

NPM

The semmle.javascript.NPM library provides support for working with NPM packages through the following classes:

As an example of the use of these classes, here is a query that identifies unused dependencies, that is, module dependencies that are listed in the package.json file, but which are not imported by any require call:

import javascript

from NPMPackage pkg, PackageDependencies deps, string name where deps = pkg.getPackageJSON().getDependencies() and deps.getADependency(name, _) and not exists (Require req | req.getTopLevel() = pkg.getAModule() | name = req.getImportedPath().getValue()) select deps, "Unused dependency '" + name + "'."

React

The semmle.javascript.frameworks.React library provides support for working with React code through the ReactComponent class, which models a React component defined either in the functional style or the class-based style (both ECMAScript 2015 classes and old-style React.createClass classes are supported).

Databases

The class SQL::SqlString represents an expression that is interpreted as a SQL command. Currently, we model SQL commands issued through the following npm packages:mysql, pg, pg-pool, sqlite3, mssql and sequelize.

Similarly, the class NoSQL::Query represents an expression that is interpreted as a NoSQL query by the mongodb or mongoose package.

Finally, the class DatabaseAccess contains all data flow nodes that perform a database access using any of the packages above.

For example, here is a query to find SQL queries that use string concatenation (instead of a templating-based solution, which is usually safer):

import javascript

from SQL::SqlString ss where ss instanceof AddExpr select ss, "Use templating instead of string concatenation."

Miscellaneous

Externs

The semmle.javascript.Externs library provides support for working with externs through the following classes:

Variables and functions declared in an externs file are either globals (represented by class ExternalGlobalDecl), or members (represented by class ExternalMemberDecl).

Members are further subdivided into static members (class ExternalStaticMemberDecl) and instance members (class ExternalInstanceMemberDecl).

For more details on these and other classes representing externs, see the API documentation.

HTML

The semmle.javascript.HTML library provides support for working with HTML documents. They are represented as a tree of HTML::Element nodes, each of which may have zero or more attributes represented by class HTML::Attribute.

Similar to the abstract syntax tree representation, HTML::Element has member predicates getChild(i) and getParent() to navigate from an element to its ith child element and its parent element, respectively. Use predicate HTML::Element.getAttribute(i) to get the ith attribute of the element, and HTML::Element.getAttributeByName(n) to get the attribute with name n.

For HTML::Attribute, predicates getName() and getValue() provide access to the attribute’s name and value, respectively.

Both HTML::Element and HTML::Attribute have a predicate getRoot() that gets the root HTML::Element of the document to which they belong.

JSDoc

The semmle.javascript.JSDoc library provides support for working with JSDoc comments. Documentation comments are parsed into an abstract syntax tree representation closely following the format employed by the Doctrine JSDoc parser.

A JSDoc comment as a whole is represented by an entity of class JSDoc, while individual tags are represented by class JSDocTag. Important member predicates of these two classes include:

Types in JSDoc comments are represented by the class JSDocTypeExpr and its subclasses, which again represent type expressions as abstract syntax trees. Examples of type expressions are JSDocAnyTypeExpr, representing the “any” type *, or JSDocNullTypeExpr, representing the null type.

As an example, here is a query that finds @param tags that do not specify the name of the documented parameter:

import javascript

from JSDocTag t where t.getTitle() = "param" and not exists(t.getName()) select t, "@param tag is missing name."

For full details on these and other classes representing JSDoc comments and type expressions, see the API documentation.

JSX

The semmle.javascript.JSX library provides support for working with JSX code.

Similar to the representation of HTML documents, JSX fragments are modeled as a tree of JSXElements, each of which may have zero or more JSXAttributes.

However, unlike HTML, JSX is interleaved with JavaScript, hence JSXElement is a subclass of Expr. Like HTML::Element, it has predicates getAttribute(i) and getAttributeByName(n) to look up attributes of a JSX element. Its body elements can be accessed by predicate getABodyElement(); note that the results of this predicate are arbitrary expressions, which may either be further JSXElements, or other expressions that are interpolated into the body of the outer element.

JSXAttribute, again not unlike HTML::Attribute, has predicates getName() and getValue() to access the attribute name and value.

JSON

The semmle.javascript.JSON library provides support for working with JSON files that were processed by the JavaScript extractor when building the CodeQL database.

JSON files are modeled as trees of JSON values. Each JSON value is represented by an entity of class JSONValue, which provides the following member predicates:

Note that JSONValue is a subclass of Locatable, so the usual member predicates of Locatable can be used to determine the file in which a JSON value appears, and its location within that file.

Class JSONValue has the following subclasses:

Regular expressions

The semmle.javascript.Regexp library provides support for working with regular expression literals. The syntactic structure of regular expression literals is represented as an abstract syntax tree of regular expression terms, modeled by the class RegExpTerm. Similar to ASTNode, class RegExpTerm provides member predicates getParent() and getChild(i) to navigate the structure of the syntax tree.

Various subclasses of RegExpTerm model different kinds of regular expression constructs and operators; see the API documentation for details.

YAML

The semmle.javascript.YAML library provides support for working with YAML files that were processed by the JavaScript extractor when building the CodeQL database.

YAML files are modeled as trees of YAML nodes. Each YAML node is represented by an entity of class YAMLNode, which provides, among others, the following member predicates:

The various kinds of scalar values available in YAML are represented by classes YAMLInteger, YAMLFloat, YAMLTimestamp, YAMLBool, YAMLNull and YAMLString. Their common superclass is YAMLScalar, which has a member predicate getValue() to obtain the value of a scalar as a string.

YAMLMapping and YAMLSequence represent mappings and sequences, respectively, and are subclasses of YAMLCollection.

Alias nodes are represented by class YAMLAliasNode, while YAMLMergeKey and YAMLInclude represent merge keys and !include directives, respectively.

Predicate YAMLMapping.maps(key, value) models the key-value relation represented by a mapping, taking merge keys into account.