The trailing ? in <?xml version="1.0"?> emits an error (original) (raw)
After upgrading from v1.17.1 to v1.19.1 unit tests started to fail on parsing XML files.
Valid XML file (minimum reproducible, not entire file)
<?xml version="1.0"?>
<catalogs xmlns="http://acalog.com/catalog/1.0" xmlns:h="http://www.w3.org/1999/xhtml"
xmlns:a="http://www.w3.org/2005/Atom" xmlns:xi="http://www.w3.org/2001/XInclude">
</catalogs>
The error is
Unexpected character '?' in input state [AfterAttributeValue_quoted]
Please note that there is no whitespace between " and ?> in the first line of XML. Once the whitespace is added no parsing error is returned. Valid beginning of file based on Jsoup XML parser <?xml version="1.0" ?>.
Usage
Parser parser = Parser.xmlParser().setTrackErrors(1).newInstance();
Document httpDoc = Jsoup.parse(fileContent, "", parser);
if (!parser.getErrors().isEmpty()) {
throw new IllegalArgumentException(String.format("Not a valid XML. Error: %s", parser.getErrors()));
}