[CODEC-174] Improve performance of Beider Morse encoder (original) (raw)

I use Beider Morse encoder with Solr. When it indexes a lot of documents using this encoder, the import time is multiplied by 30. So, I have decided to optimize the current implementation in the commons-codec.

Currently, I have created two patch. The first patch delete a "performance hack" about a subsequence cache. This cache doesn't optimize performance and after deleting it, you can win some milliseconds.

The second patch changes the storage of the rules in memory using a Map instead of List. With it, you can access to a rule directly with the beginning of pattern. This patch divide the encoding time by 2.

I will try to find more improvement. If you have any idea, please tell me it.