QXmlStreamReader Class | Qt Core (original) (raw)

The QXmlStreamReader class provides a fast parser for reading well-formed XML 1.0 documents via a simple streaming API. More...

Detailed Description

QXmlStreamReader provides a simple streaming API to parse well-formed XML 1.0 documents. It is an alternative to first loading the complete XML into a DOM tree (see QDomDocument). QXmlStreamReader reads data either from a QIODevice (see setDevice()), or from a raw QByteArray (see addData()).

Note: QXmlStreamReader supports only XML version 1.0. Documents declaring any other version, such as "1.1", will result in a parsing error.

Qt provides QXmlStreamWriter for writing XML.

The basic concept of a stream reader is to report an XML document as a stream of tokens, similar to SAX. The main difference between QXmlStreamReader and SAX is how these XML tokens are reported. With SAX, the application must provide handlers (callback functions) that receive so-called XML events from the parser at the parser's convenience. With QXmlStreamReader, the application code itself drives the loop and pulls tokens from the reader, one after another, as it needs them. This is done by calling readNext(), where the reader reads from the input stream until it completes the next token, at which point it returns the tokenType(). A set of convenient functions including isStartElement() and text() can then be used to examine the token to obtain information about what has been read. The big advantage of this pulling approach is the possibility to build recursive descent parsers with it, meaning you can split your XML parsing code easily into different methods or classes. This makes it easy to keep track of the application's own state when parsing XML.

A typical loop with QXmlStreamReader looks like this:

QXmlStreamReader xml; ... while (!xml.atEnd()) { xml.readNext(); ... // do processing } if (xml.hasError()) { ... // do error handling }

QXmlStreamReader is a non-validating, forward-only XML 1.0 parser for well-formed documents. It does not process external parsed entities or perform DTD validation. As long as no error occurs, the application can rely on the following guarantees:

In particular, once any token of type StartElement, EndElement, Characters, EntityReference or EndDocument is seen, no tokens of type StartDocument or DTD will be seen. If one is present in the input stream, out of order, an error is raised.

If an error occurs while parsing, atEnd() and hasError() return true, and error() returns the error that occurred. The functions errorString(), lineNumber(), columnNumber(), and characterOffset() are for constructing an appropriate error or warning message. To simplify application code, QXmlStreamReader contains a raiseError() mechanism that lets you raise custom errors that trigger the same error handling described.

The QXmlStream Bookmarks Example illustrates how to use the recursive descent technique to read an XML bookmark file (XBEL) with a stream reader.

Namespaces

QXmlStream understands and resolves XML namespaces. E.g. in case of a StartElement, namespaceUri() returns the namespace the element is in, and name() returns the element's local name. The combination of namespaceUri and name uniquely identifies an element. If a namespace prefix was not declared in the XML entities parsed by the reader, the namespaceUri is empty.

If you parse XML data that does not utilize namespaces according to the XML specification or doesn't use namespaces at all, you can use the element's qualifiedName() instead. A qualified name is the element's prefix() followed by colon followed by the element's local name() - exactly like the element appears in the raw XML data. Since the mapping namespaceUri to prefix is neither unique nor universal, qualifiedName() should be avoided for namespace-compliant XML data.

In order to parse standalone documents that do use undeclared namespace prefixes, you can turn off namespace processing completely with the namespaceProcessing property.

Incremental Parsing

QXmlStreamReader is an incremental parser. It can handle the case where the document can't be parsed all at once because it arrives in chunks (e.g. from multiple files, or over a network connection). When the reader runs out of data before the complete document has been parsed, it reports a PrematureEndOfDocumentError. When more data arrives, either because of a call to addData() or because more data is available through the network device(), the reader recovers from the PrematureEndOfDocumentError error and continues parsing the new data with the next call to readNext().

For example, if your application reads data from the network using a network access manager, you would issue a network request to the manager and receive a network reply in return. Since a QNetworkReply is a QIODevice, you connect its readyRead() signal to a custom slot, e.g. slotReadyRead() in the code snippet shown in the discussion for QNetworkAccessManager. In this slot, you read all available data with readAll() and pass it to the XML stream reader using addData(). Then you call your custom parsing function that reads the XML events from the reader.

Performance and Memory Consumption

QXmlStreamReader is memory-conservative by design, since it doesn't store the entire XML document tree in memory, but only the current token at the time it is reported. In addition, QXmlStreamReader avoids the many small string allocations that it normally takes to map an XML document to a convenient and Qt-ish API. It does this by reporting all string data as QStringView rather than real QString objects. Calling toString() on any of those objects returns an equivalent real QString object.

Member Function Documentation

QXmlStreamReader::QXmlStreamReader()

Constructs a stream reader.

See also setDevice() and addData().

[explicit] QXmlStreamReader::QXmlStreamReader(QAnyStringView data)

Creates a new stream reader that reads from data.

Note: In Qt versions prior to 6.5, this constructor was overloaded for QString and const char*.

See also addData(), clear(), and setDevice().

[explicit] QXmlStreamReader::QXmlStreamReader(QIODevice *device)

Creates a new stream reader that reads from device.

See also setDevice() and clear().

[explicit] QXmlStreamReader::QXmlStreamReader(const QByteArray &data)

This is an overloaded function.

Creates a new stream reader that reads from data.

See also addData(), clear(), and setDevice().

[noexcept] QXmlStreamReader::~QXmlStreamReader()

Destructs the reader.

void QXmlStreamReader::addData(QAnyStringView data)

Adds more data for the reader to read. This function does nothing if the reader has a device().

Note: In Qt versions prior to 6.5, this function was overloaded for QString and const char*.

See also readNext() and clear().

void QXmlStreamReader::addData(const QByteArray &data)

This is an overloaded function.

Adds more data for the reader to read. This function does nothing if the reader has a device().

See also readNext() and clear().

void QXmlStreamReader::addExtraNamespaceDeclaration(const QXmlStreamNamespaceDeclaration &extraNamespaceDeclaration)

Adds an extraNamespaceDeclaration. The declaration will be valid for children of the current element, or - should the function be called before any elements are read - for the entire XML document.

See also namespaceDeclarations(), addExtraNamespaceDeclarations(), and setNamespaceProcessing().

void QXmlStreamReader::addExtraNamespaceDeclarations(const QXmlStreamNamespaceDeclarations &extraNamespaceDeclarations)

Adds a vector of declarations specified by extraNamespaceDeclarations.

See also namespaceDeclarations() and addExtraNamespaceDeclaration().

bool QXmlStreamReader::atEnd() const

Returns true if the reader has read until the end of the XML document, or if an error() has occurred and reading has been aborted. Otherwise, it returns false.

When atEnd() and hasError() return true and error() returns PrematureEndOfDocumentError, it means the XML has been well-formed so far, but a complete XML document has not been parsed. The next chunk of XML can be added with addData(), if the XML is being read from a QByteArray, or by waiting for more data to arrive if the XML is being read from a QIODevice. Either way, atEnd() will return false once more data is available.

See also hasError(), error(), device(), and QIODevice::atEnd().

QXmlStreamAttributes QXmlStreamReader::attributes() const

Returns the attributes of a StartElement.

qint64 QXmlStreamReader::characterOffset() const

Returns the current character offset, starting with 0.

See also lineNumber() and columnNumber().

void QXmlStreamReader::clear()

Removes any device() or data from the reader and resets its internal state to the initial state.

See also addData().

qint64 QXmlStreamReader::columnNumber() const

Returns the current column number, starting with 0.

See also lineNumber() and characterOffset().

QIODevice *QXmlStreamReader::device() const

Returns the current device associated with the QXmlStreamReader, or nullptr if no device has been assigned.

See also setDevice().

QStringView QXmlStreamReader::documentEncoding() const

If the tokenType() is StartDocument, this function returns the encoding string as specified in the XML declaration. Otherwise an empty string is returned.

QStringView QXmlStreamReader::documentVersion() const

If the tokenType() is StartDocument, this function returns the version string as specified in the XML declaration. Otherwise an empty string is returned.

QStringView QXmlStreamReader::dtdName() const

If the tokenType() is DTD, this function returns the DTD's name. Otherwise an empty string is returned.

QStringView QXmlStreamReader::dtdPublicId() const

If the tokenType() is DTD, this function returns the DTD's public identifier. Otherwise an empty string is returned.

QStringView QXmlStreamReader::dtdSystemId() const

If the tokenType() is DTD, this function returns the DTD's system identifier. Otherwise an empty string is returned.

QXmlStreamEntityDeclarations QXmlStreamReader::entityDeclarations() const

If the tokenType() is DTD, this function returns the DTD's unparsed (external) entity declarations. Otherwise an empty vector is returned.

The QXmlStreamEntityDeclarations class is defined to be a QList of QXmlStreamEntityDeclaration.

int QXmlStreamReader::entityExpansionLimit() const

Returns the maximum amount of characters a single entity is allowed to expand into. If a single entity expands past the given limit, the document is not considered well formed.

See also setEntityExpansionLimit.

QXmlStreamEntityResolver *QXmlStreamReader::entityResolver() const

Returns the entity resolver, or nullptr if there is no entity resolver.

See also setEntityResolver().

QXmlStreamReader::Error QXmlStreamReader::error() const

Returns the type of the current error, or NoError if no error occurred.

See also errorString() and raiseError().

QString QXmlStreamReader::errorString() const

Returns the error message that was set with raiseError().

See also error(), lineNumber(), columnNumber(), and characterOffset().

bool QXmlStreamReader::hasError() const

Returns true if an error has occurred, otherwise false.

See also errorString() and error().

[since 6.6] bool QXmlStreamReader::hasStandaloneDeclaration() const

Returns true if this document has an explicit standalone declaration (can be 'yes' or 'no'); otherwise returns false;

If no XML declaration has been parsed, this function returns false.

This function was introduced in Qt 6.6.

See also isStandaloneDocument().

bool QXmlStreamReader::isCDATA() const

Returns true if the reader reports characters that stem from a CDATA section; otherwise returns false.

See also isCharacters() and text().

bool QXmlStreamReader::isCharacters() const

Returns true if tokenType() equals Characters; otherwise returns false.

See also isWhitespace() and isCDATA().

bool QXmlStreamReader::isComment() const

Returns true if tokenType() equals Comment; otherwise returns false.

bool QXmlStreamReader::isDTD() const

Returns true if tokenType() equals DTD; otherwise returns false.

bool QXmlStreamReader::isEndDocument() const

Returns true if tokenType() equals EndDocument; otherwise returns false.

bool QXmlStreamReader::isEndElement() const

Returns true if tokenType() equals EndElement; otherwise returns false.

bool QXmlStreamReader::isEntityReference() const

Returns true if tokenType() equals EntityReference; otherwise returns false.

bool QXmlStreamReader::isProcessingInstruction() const

Returns true if tokenType() equals ProcessingInstruction; otherwise returns false.

bool QXmlStreamReader::isStandaloneDocument() const

Returns true if this document has been declared standalone in the XML declaration; otherwise returns false.

If no XML declaration has been parsed, this function returns false.

See also hasStandaloneDeclaration().

bool QXmlStreamReader::isStartDocument() const

Returns true if tokenType() equals StartDocument; otherwise returns false.

bool QXmlStreamReader::isStartElement() const

Returns true if tokenType() equals StartElement; otherwise returns false.

bool QXmlStreamReader::isWhitespace() const

Returns true if the reader reports characters that only consist of white-space; otherwise returns false.

See also isCharacters() and text().

qint64 QXmlStreamReader::lineNumber() const

Returns the current line number, starting with 1.

See also columnNumber() and characterOffset().

QStringView QXmlStreamReader::name() const

Returns the local name of a StartElement, EndElement, or an EntityReference.

See also namespaceUri() and qualifiedName().

QXmlStreamNamespaceDeclarations QXmlStreamReader::namespaceDeclarations() const

If the tokenType() is StartElement, this function returns the element's namespace declarations. Otherwise an empty vector is returned.

The QXmlStreamNamespaceDeclarations class is defined to be a QList of QXmlStreamNamespaceDeclaration.

See also addExtraNamespaceDeclaration() and addExtraNamespaceDeclarations().

QStringView QXmlStreamReader::namespaceUri() const

Returns the namespaceUri of a StartElement or EndElement.

See also name() and qualifiedName().

QXmlStreamNotationDeclarations QXmlStreamReader::notationDeclarations() const

If the tokenType() is DTD, this function returns the DTD's notation declarations. Otherwise an empty vector is returned.

The QXmlStreamNotationDeclarations class is defined to be a QList of QXmlStreamNotationDeclaration.

QStringView QXmlStreamReader::prefix() const

Returns the prefix of a StartElement or EndElement.

See also name() and qualifiedName().

QStringView QXmlStreamReader::processingInstructionData() const

Returns the data of a ProcessingInstruction.

QStringView QXmlStreamReader::processingInstructionTarget() const

Returns the target of a ProcessingInstruction.

QStringView QXmlStreamReader::qualifiedName() const

Returns the qualified name of a StartElement or EndElement;

A qualified name is the raw name of an element in the XML data. It consists of the namespace prefix, followed by colon, followed by the element's local name. Since the namespace prefix is not unique (the same prefix can point to different namespaces and different prefixes can point to the same namespace), you shouldn't use qualifiedName(), but the resolved namespaceUri() and the attribute's local name().

See also name(), prefix(), and namespaceUri().

void QXmlStreamReader::raiseError(const QString &message = QString())

Raises a custom error with an optional error message.

See also error() and errorString().

QString QXmlStreamReader::readElementText(QXmlStreamReader::ReadElementTextBehaviour behaviour = ErrorOnUnexpectedElement)

Convenience function to be called in case a StartElement was read. Reads until the corresponding EndElement and returns all text in-between. In case of no error, the current token (see tokenType()) after having called this function is EndElement.

The function concatenates text() when it reads either Characters or EntityReference tokens, but skips ProcessingInstruction and Comment. If the current token is not StartElement, an empty string is returned.

The behaviour defines what happens in case anything else is read before reaching EndElement. The function can include the text from child elements (useful for example for HTML), ignore child elements, or raise an UnexpectedElementError and return what was read so far (default).

QXmlStreamReader::TokenType QXmlStreamReader::readNext()

Reads the next token and returns its type.

With one exception, once an error() is reported by readNext(), further reading of the XML stream is not possible. Then atEnd() returns true, hasError() returns true, and this function returns QXmlStreamReader::Invalid.

The exception is when error() returns PrematureEndOfDocumentError. This error is reported when the end of an otherwise well-formed chunk of XML is reached, but the chunk doesn't represent a complete XML document. In that case, parsing can be resumed by calling addData() to add the next chunk of XML, when the stream is being read from a QByteArray, or by waiting for more data to arrive when the stream is being read from a device().

See also tokenType() and tokenString().

bool QXmlStreamReader::readNextStartElement()

Reads until the next start element within the current element. Returns true when a start element was reached. When the end element was reached, or when an error occurred, false is returned.

The current element is the element matching the most recently parsed start element of which a matching end element has not yet been reached. When the parser has reached the end element, the current element becomes the parent element.

This is a convenience function for when you're only concerned with parsing XML elements. The QXmlStream Bookmarks Example makes extensive use of this function.

See also readNext().

void QXmlStreamReader::setDevice(QIODevice *device)

Sets the current device to device. Setting the device resets the stream to its initial state.

See also device() and clear().

void QXmlStreamReader::setEntityExpansionLimit(int limit)

Sets the maximum amount of characters a single entity is allowed to expand into to limit. If a single entity expands past the given limit, the document is not considered well formed.

The limit is there to prevent DoS attacks when loading unknown XML documents where recursive entity expansion could otherwise exhaust all available memory.

The default value for this property is 4096 characters.

See also entityExpansionLimit.

void QXmlStreamReader::setEntityResolver(QXmlStreamEntityResolver *resolver)

Makes resolver the new entityResolver().

The stream reader does not take ownership of the resolver. It's the callers responsibility to ensure that the resolver is valid during the entire life-time of the stream reader object, or until another resolver or nullptr is set.

See also entityResolver().

void QXmlStreamReader::skipCurrentElement()

Reads until the end of the current element, skipping any child nodes. This function is useful for skipping unknown elements.

The current element is the element matching the most recently parsed start element of which a matching end element has not yet been reached. When the parser has reached the end element, the current element becomes the parent element.

QStringView QXmlStreamReader::text() const

Returns the text of Characters, Comment, DTD, or EntityReference.

QString QXmlStreamReader::tokenString() const

Returns the reader's current token as string.

See also tokenType().

QXmlStreamReader::TokenType QXmlStreamReader::tokenType() const

Returns the type of the current token.

The current token can also be queried with the convenience functions isStartDocument(), isEndDocument(), isStartElement(), isEndElement(), isCharacters(), isComment(), isDTD(), isEntityReference(), and isProcessingInstruction().

See also tokenString().