Configuring Woodstox XML parser: Woodstox-specific properties (original) (raw)

@cowtowncoder

As part 3 of the overview of Woodstox (java, stax) XML parser, let’s have a look at another set of configuration options: Woodstox-specific properties
(first part was “basic Stax”, second “Stax2 extensions”)

Finding Property name definitions

Woodstox-specific property names are defined in 2 classes:

As with all properties, configuration is done using methods XMLInputFactory.setProperty() and XMLOutputFactory.setProperty().

Woodstox-specific input properties: overriding DTD handling

First set of input-side properties are related to handling of DTD. They are undefined by default, but can be set to custom handlers to change default handling of DTD subsets and expansion of entities defined within.

Woodstox-specific input properties: limits

Another set of properties added in Woodstox 4.2 allows specifying maximum limits for certain input constructs. These are typically used to protect against possible Denial-of-Service (DoS) attacks, wherein XML-based web services may be attacked by specifically crafted documents that could cause processing problems by excessive memory or computing power usage.

If one of limits is exceeded during parsing phase, an XMLStreamException will be thrown (in future it might be nice to have a sub-type to allow catching specific type — for now there isn’t separate exception type).

All settings have reasonable default values for normal usage (including some settings as “unlimited”), but they may be changed to stricter (if specific attacks are observed or system has lower resource allocation) or looser (if input documents can legitimately exceed one or more of default limits).

Woodstox-specific input properties: other

And then there are many other input properties that may be configured

Woodstox-specific output properties, validation

On output side, a large group of settings is related to (optional) verification of well-formedness of content; and some related settings that allow working around problems that could occur if output was done exactly as implied by calls (but can be performed in modified form).

Woodstox-specific output properties, formatting

Woodstox-specific output properties, other