[Python-Dev] UTF-8 Decoder (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Mon Apr 27 20:48:38 CEST 2009

Previous message: [Python-Dev] UTF-8 Decoder
Next message: [Python-Dev] Why does read() return bytes instead of bytearray?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Jeroen Ruigrok van der Werven <asmodai in-nomine.org> writes:

So on medium and large datasets the decoder of Bjoern is very interesting, but the tiny case (just Bjoern's name) is quite a tad bit slower. The other cases seems more typical of what the average use in Python would be.

Keep in mind what the datasets are:

« The large buffer is a April 2009 Hindi Wikipedia article XML dump, the medium buffer Markus Kuhn's UTF-8-demo.txt, and the tiny buffer my name »

It would be interesting to test with mostly ASCII data to see what that gives. Now the good thing is that, even with wildly non-ASCII data, our current decoder is very efficient.

Regards

Antoine.

Previous message: [Python-Dev] UTF-8 Decoder
Next message: [Python-Dev] Why does read() return bytes instead of bytearray?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list