proposal: html/template: add a hardened version of it to the standard library (original) (raw)

Overview

Add a hardened version of html/template to the Go standard library. This new template package will incorporate security engineering best practices employed within Google to guarantee–with high degree of confidence–that the HTML rendered by the template system is safe against code injection.

Background

Package html/template implements data-driven templates for generating HTML output that is safe against code injection. html/template is a "contextually auto-escaping template engine": it treats template data as untrusted plain text and escapes them so that they can be safely embedded in its HTML output. The kind of escaping applied to the data depends on the context that the data appears in (e.g. HTML, JS, CSS, URI).

Issues with `html/template`

While html/template is significantly better than text/template, string-formatting functions (e.g. fmt.Sprintf), and ad-hoc string concatenation for generating HTML safe against code injection, it has several shortcomings which I describe in the following sections.

Typed strings

html/template provides developers a set of typed strings (e.g. template.HTML, template.JS) to flag known-safe template data that are intended to be used without escaping or validation. This mechanism is necessary to accommodate use cases in real-world applications with complex dataflows, where developers want HTML markup, trusted data (e.g. programmer-controlled strings), or already-sanitized content that are generated in one part of their application to be preserved when it is rendered in HTML templates in remote parts of the application.

Unfortunately, these typed strings are easily misused. The most obvious way to misuse these types is create them from arbitrary, dynamic strings. This essentially disables contextual auto-escaping, thereby negating the benefit of using html/template in the first place. For example:

func RenderHTML(head, url string, b *bytes.Buffer) { t := template.Must(template.New("").Parse(`

{{.Head}} Link`)) type data struct { Head template.HTML URL template.URL } // This is dangerous if either head or url are not properly validated or sanitized. t.Execute(b, data{template.HTML(head), template.URL(url)}) }

See here and here for a real-world examples of code that explicitly opt out of auto-escaping.

In other cases, developers make more of an effort to sanitize strings before converting them into typed strings. Unfortunately, this is an error-prone process that often incorrectly duplicates the work that html/template already does under the hood. For example:

func createButton(msg string) template.HTML { jsEscapedMsg := template.JSEscapeString(msg) return template.HTML( fmt.Sprintf(<button onclick="alert('%s')">Click me!</button>, jsEscapedMsg)) }

While the generated button element might appear to be safe since msg is JavaScript-escaped, it is vulnerable to XSS due to the lack of HTML-escaping. When the browser evaluates this markup, it first HTML-unescapes the value of the onclick attribute before evaluating the JavaScript expression. Therefore, a value of msg like ');attackScript();// will be HTML-unescaped and
evaluated as alert('');attackScript();//'), which results in the execution of the attacker's script.1

See here for a real-world example of code that does not validate or escape untrusted URLs before embedding them into a template.HTML value.

The lack of constraints on producing html/template typed string values seems to encourage developers to move more HTML-generation logic outside of templates into error-prone, hand-written routines. See real-world examples here and here.

Each conversion into a html/template typed string represents a potential vulnerability, and therefore must be carefully reviewed by a reviewer knowledgeable about the subtleties of HTML-injection bugs. The reviewer must ensure that the string being converted is in fact safe to use in the type's
corresponding context for all possible values of that string. Asserting this property is difficult when the data flow into the typed string conversion is sufficiently complex. Moreover, these typed string conversions make it possible for future changes in one (upstream) part of the application to cause security bugs in another (downstream, remote) part of the application. Therefore, the more often typed strings are used in a Go program, the more difficult it is to guarantee that it produces HTML that is free of code-injection vulnerabilities.

Single URL context

html/template does not differentiate between URLs that load code and those that do not. This has significant security implications. For example, the template:

Click me!

will produce the following HTML output when URL is "http://www.untrustedsite.com/script.js":

Click me!

html/template did not filter out the URL value because it contains the benign http scheme. While http://www.untrustedsite.com/script.js is safe to navigate to as a link (i.e. since the navigation will not cause untrusted, same-origin script execution in the browser), it is not safe to load an executable script from (i.e. since it will be loaded over HTTP and the contents of the script are not trusted).

JavaScript and CSS parsing

html/template allows template data to be interpolated into JavaScript (JS) and Cascading Style Sheet (CSS) contexts. It parses the JS and CSS surrounding the template data in order to contextually escape the data. For example, the following template:

"}});

is rendered by html/template as:

This functionality is problematic for two reasons. The first is that JS and CSS parsing is error-prone. JS and CSS are rapidly evolving languages that our parser might not always handle correctly; layering parsers for these two languages on top of our mixed HTML-template language parser introduces more
complexity and points of failure to the package.

The second issue is that this feature encourages the security anti-pattern of using inline scripts and stylesheets. This prevents the adoption of strict Content Security Policy (CSP), where all scripts loaded by the browser undergo explicit validation before being executed. See here for more details on why inline scripts are dangerous.

Blacklist-based sanitization

html/template only understands the semantics of a certain subset of HTML elements and attributes. Elements and attributes outside of this set are assumed to have no special semantics.2

This sanitization policy is too permissive. Elements or attributes that are not understood by the html/template escaper may have semantics that are security-sensitive, particularly
custom elements and those introduced in future revisions of the HTML standard. Properly sanitizing these elements and attributes in the future will require backward-incompatible changes to html/template.

Dynamic template sources

html/template allows templates to be parsed from arbitrary strings and filenames. This makes the templates themselves susceptible to injection attacks. For example,

t1 := template.Must(template.New("").Parse(<html>+bodyTmpl+</html>)) t2 := template.Must(template.New("").ParseFiles(filename))

If an attacker can fully or partially control the values of bodyTmpl or filename, then the attacker can control the templates being loaded and hence the HTML output. Such an attack completely undermines the assumption that template authors are trustworthy.

Proposed solution

Add a new safehtml/template library that addresses the above issues. In particular, this package will:

Replace html/template typed strings with a set of types with a richer, but constrained API. These types will live in a separate package safehtml, which will provide a safe-by-design API for constructing values of these types. The following sketch illustrates a subset of this API:
package safehtml
// An HTML is an immutable string-like type that is safe to use in HTML
// contexts in DOM APIs and HTML documents.
type HTML struct { str string }
// HTMLEscaped returns an HTML whose value is text, with the characters [&<>"'] escaped.
func HTMLEscaped(text string) HTML { ... }
// HTMLConcat returns an HTML which contains, in order, the string representations
// of the given htmls.
func HTMLConcat(htmls ...HTML) HTML { ... }
// A URL is an immutable string-like type that is safe to use in URL contexts in
// DOM APIs and HTML documents.
type URL struct { str string }
// URLSanitized returns a URL whose value is url, validating that the input string matches
// a pattern of commonly used safe URLs. If url fails validation, this method returns a
// URL containing InnocuousURL.
func URLSanitized(url string) URL { ... }
// A TrustedResourceURL is an immutable string-like type referencing the
// application’s own, trusted resources. It can be used to safely load scripts,
// CSS and other sensitive resources without the risk of untrusted code execution.
type TrustedResourceURL struct { str string }
// TrustedResourceURLFromConstant constructs a TrustedResourceURL with its underlying
// URL set to the given url, which must be an untyped string constant.
func TrustedResourceURLFromConstant(url stringConstant) TrustedResourceURL { ... }
// stringConstant is an unexported string type. Users of this package cannot
// create values of this type except by passing an untyped string constant to
// functions which expect a stringConstant. This type must be used only in
// function and method parameters.
type stringConstant string
Package safehtml should provide constructors that satisfy most common use cases for constructing known-safe values outside of a template sytem. Values of these types carry strong security guarantees about strings they encapsulate; when passed around a Go program, they enable developers to depend on these properties without having to reason about whole-program dataflows. Not surprisingly, safehtml/template will also produce safehtml.HTML. This allows values from separate-evaluated HTML templates to be composed, while maintaining strong security guarantees.
// ExecuteToHTML applies a parsed template to the specified data object,
// returning the output as a safehtml.HTML value.
// A template may be executed safely in parallel.
func (t *Template) ExecuteToHTML(data interface{}) (safehtml.HTML, error) {
For the minority of use cases that the safehtml API does not accommodate, we will provide a separate safehtml/uncheckedconversions package that converts plain strings into safe types:
// Package uncheckedconversions contains functions that convert arbitrary strings
// into values of package safehtml types.
//
// Using this package may undermine the security guarantees provided by package safehtml.
// Users of this package are responsible for ensuring that the strings being
// converted comply with the respective safehtml type contracts.
package uncheckedconversions
func HTMLFromStringKnownToSatisfyTypeContract(s string) safehtml.HTML { ... }
func URLFromStringKnownToSatisfyTypeContract(s string) safehtml.URL { ... }
func TrustedResourceURLFromStringKnownToSatisfyTypeContract(s string) safehtml.TrustedResourceURL { ... }
These unsafe constructors will live their own separate package, much like how memory-unsafe operations all live in package unsafe, and crypto APIs prone to misuse live in package cypto/subtle. This makes code easier to security-review (i.e. "if the program doesn't import uncheckedconversions, the HTML produced by safehtml/template is definitely safe"), makes it easier to restrict the use of these functions (e.g. build systems like Bazel allow package-level visibility restrictions), and will hopefully discourage developers from unnecessarily reaching for these conversions (i.e. by requiring them to import the package and call a dangerous-sounding functions).
Add different sanitization contexts for URLs that load code and those that do not. These contexts map to the safehtml.TrustedResourceURL and safehtml.URL types described above. The former type of URL will be validated more strictly than the latter.
Disallow template data from being interpolated into JS and CSS contexts. The template parser will no longer attempt to parse JS or CSS. We will allow safehtml.Script and safehtml.StyleSheet values to be used in these contexts, but the constructor API around these types will be deliberately constrained. We might potentially add a switch that causes safehtml/template to disallow inline scripts and stylesheets completely, even if they appear in the programmer-controlled template text. This switch will help developers ensure that all the markup by their template is CSP-compatible.
Use whitelist-based sanitization. Elements or attributes not explicitly understood by the sanitizer will be disallowed by default. Developers must explicitly whitelist these attributes using an API along the lines of:
// Sanitize sanitizes the given element body or attribute value so that
// it is safe to include in the template output. It returns an error only
// if template execution must halt.
type Sanitize func(val interface{}) (string, error)
// The following Allow* methods apply to all templates associated with t.
// AllowElement allows actions to appear in the body of the named element.
// The given function will be called to sanitize the output of these actions.
func (t *Template) AllowElement(elem string, s Sanitize) error { ... }
// AllowAttribute allows actions to appear in value of the named attribute.
// The given function will be called to sanitize the output of these actions.
func (t *Template) AllowAttribute(attr string, s Sanitize) error { ... }
// AllowAttributeInElement allows actions to appear in value of the named
// attribute when that attribute appears in the named element.
// The given function will be called to sanitize the output of these actions.
func (t *Template) AllowAttributeInElement(attr, elem string, s Sanitize) error { ... }
Provide a safe-by-design API for loading template text. This API will only allow templates to be loaded from programmer-controlled strings (i.e. untyped string constants) or resources under application control (e.g. environment variables, command-line flags). The following is a snippet of this new template-loading API:
func (t *Template) ParseFiles(filenames ...stringConstant) (*Template, error) { ... }
func (t *Template) ParseFilesFromTrustedSources(filenames ...TrustedSource) (*Template, error) { ... }
// A TrustedSource is an immutable string-like type referencing trusted template files under application control.
type TrustedSource struct { str string }
func TrustedSourceFromConstant(src stringConstant) TrustedSource { return TrustedSource{string(src)} }
func TrustedSourceFromFlag(value flag.Value) TrustedSource { return TrustedSource{fmt.Sprint(value.Get())} }
func TrustedSourceFromEnvVar(key stringConstant) TrustedSource { return TrustedSource{os.Getenv(string(key))} }
func TrustedSourceJoin(elem ...TrustedSource) TrustedSource { return TrustedSource{filepath.Join(trustedSourcesToStrings(elem)...)} }

The APIs described above are explained in great detail in Christoph Kern's Securing the Tangled Web (see "Strictly Contextually Auto-Escaping Template Engines", "Security Type Contracts", and "Unchecked Conversions"). Within Google, the security team has deployed strict contextual-autoescaping template systems (e.g. Closure templates) and safe HTML type implementations (e.g. Java and JavaScript types) across several different languages and frameworks. These packages have significantly decreased the incidence of XSS bugs without significantly impacting developer workflow.

Internal implementations of package safehtml and safehtml/template were deployed within Google a year ago. Currently, approximately 691 and 1153 Go packages use safehtml/template and safehtml respectively. Only about 15 of these packages use uncheckedconversions, all of which have been manually reviewed by our security team. Consequently, we have developed a high degree of confidence that the current API is usable and meets most developers' use cases.

Since safehtml/template is stricter than html/template by design, the features of the former cannot be easily integrated into the latter without making backward-incompatible changes.

Open Questions

Should safehtml/template even be in the standard library? We already have html/template and text/template. Perhaps adding yet another template library will confuse users and bloat the standard library. safehtml/template could potentially live in golang.org/x/tools or a separate GitHub repo altogether. Yet another option is to add all of this as optional functionality in html/template that can be enabled by a flag.
What does a robust API for creating values of safehtml types look like? The current internal implementation of safehtml is customized for the ways that Google Go programmers generate HTML and HTML-related values, which might not translate well to external use cases. Since adding new constructors to package safehtml is backwards-compatible, we can start by releasing the library with its current, field-tested API and respond to feature requests as external users adopt the package.
Will uncheckedconversions be misused? The functions we propose in package uncheckedconversions are essentially equivalent to the easily-misused typed strings in html/template, except with spookier-sounding names. Within Google, we use our build system to restrict the use of uncheckedconversions; any developers attempting to import the package must receive code-review approval from a security team member. Without these restrictions, will most external Go developers follow the path of least resistance and misuse uncheckedconversions, or will they adopt the principled approach and carefully review each use of those functions, preferring package safehtml constructors wherever possible?
Is the work of migrating from html/template to safehtml/template automatable? safehtml/template is inherently stricter than html/template, so I expect that many migrations will require manual refactoring and reasoning about program dataflows. However, we might be able to write a go fix that performs all automatable migrations, and lists the remaining areas that require manual attention.

See also "A Subtle XSS Bug" in Securing the Tangled Web for a similar example where HTML- and JavaScript-escaping in the wrong order causes the same XSS vulnerability.
html/template actually uses heuristics to infer the semantics of non-blacklisted attributes. If this inference fails, the attribute is assumed to have no special semantics.