[Python-Dev] UTF-8 Decoder (original) (raw)
Jeroen Ruigrok van der Werven asmodai at in-nomine.org
Mon Apr 27 20:28:40 CEST 2009
- Previous message: [Python-Dev] UTF-8 Decoder
- Next message: [Python-Dev] UTF-8 Decoder
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
-On [20090414 16:43], Antoine Pitrou (solipsis at pitrou.net) wrote:
If you have some time on your hands, you could try benchmarking it against Python 3.1's (py3k) decoder. There are two cases to consider:
Bjoern actually did it himself already:
http://bjoern.hoehrmann.de/utf-8/decoder/dfa/#performance
(results are Large, Medium, Tiny)
PyUnicode_DecodeUTF8Stateful (3.1a2), Visual C++ 7.1 -Ox -Ot -G7 4523ms 5686ms 3138ms
Manually inlined transcoder (see above), Visual C++ 7.1 -Ox -Ot -G7 4277ms 4998ms 4640ms
So on medium and large datasets the decoder of Bjoern is very interesting, but the tiny case (just Bjoern's name) is quite a tad bit slower. The other cases seems more typical of what the average use in Python would be.
-- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Nobilitas sola est atque unica virtus...
- Previous message: [Python-Dev] UTF-8 Decoder
- Next message: [Python-Dev] UTF-8 Decoder
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]