Illegal attributes that begin with = (original) (raw)

If we parse an attribute like <test =foo=bar/> Then in the DOM the attribute appears with the = sign, but when re-serialized it is generated without.

Code:

val doc = Jsoup.parse("<test =foo=\"bar\"/>")
for (elem in doc.select("test")) {
    for (attr in  elem.attributes()) {
        println(attr.key)
    }
}
println(doc.html())

Output:

=foo
<html>
 <head></head>
 <body>
  <test foo="bar" />
 </body>
</html>

This is problematic as if an application is doing validation on the key, to prevent XSS attacks, this can be a way to bypass the validation. I discovered this issue (in a lab environment, not a live app) just now.

The key for output can be accessed using getValidKey(). A potential solution to this is to normalise keys during parsing.