What is Miklos hacking
– Optimizing ODT ↔ XHTML conversion performance for simple documents (original) (raw)
The focus here was really simple documents, like just one sentence with minimal formatting. The use-case is to have thousands of these simple documents, only a minority containing complex formatting, the rest is just that simple.
Performance work usually focuses on one specific complex feature, e.g. lots of bookmarks, lots of document-level user-defined metadata, and so on — this way there were room for improvements when it comes to trivial documents.
I managed to reduce the cost of the conversion to the _fifth of the original_cost in both directions — the chart above shows the impact of my work for the ODT → XHTML direction. The steps that helped:
- Recognize
XHTML
as a value for theFilterOptions
key in theHTML (StarWriter)
export filter, this way avoid the need to go via XSLT, which would be expensive. - Add a new
NoFileSync
flag to theframe::XStorable::storeToURL()
API, so that if you know you’ll read the result after the conversion finished, you can avoid an expensivefsync()
call for each and every file, which helps HDDs a lot, while means no overhead for SSDs. - If you know your input format already, then specifying an explicit
FilterName
key for theframe::XComponentLoader::loadComponentFromURL()
API helps not spending time to detect the file format you already know.
Note that the XHTML mode for the Writer HTML export is still a work in progress, but it already produces valid output for such simple documents.