[Python-Dev] Regression in unicodestr.encode()? (original) (raw)

M.-A. Lemburg mal@lemburg.com
Wed, 10 Apr 2002 21:44:01 +0200


"M.-A. Lemburg" wrote:

"Martin v. Loewis" wrote: > > "M.-A. Lemburg" <mal@lemburg.com> writes: > > > Some debugging with gdb indicates that the codec is indeed writing > > the 'nd', but the final PyStringResize() (which allocates a new > > buffer and copies the data into that buffer) fails to copy the last > > two characters from the string or overwrites it with NULLs. > > > > Looks like a pymalloc problem to me. Tim ? > > It's a UTF-8 codec bug. The codec writes over the end of the buffer, > then invokes resize. Resizing only copies the allocated bytes, hence > the uninitialized bytes at the end. Ah, yes, you're right.

That is... instrumenting the codec I get these results:

(u'\u6b63\u78ba\u306b\u8a00\u3046\u3068\u7ffb\u8a33\u306f' ... u'\u3055\u308c\u3066\u3044\u307e\u305b\u3093\u3002\u4e00' ... u'\u90e8\u306f\u30c9\u30a4\u30c4\u8a9e\u3067\u3059\u304c' ... u'\u3001\u3042\u3068\u306f\u3067\u305f\u3089\u3081\u3067' ... u'\u3059\u3002\u5b9f\u969b\u306b\u306f\u300cWenn ist das' ... u' Nunstuck git und'.encode('utf-8')) cbWritten=0, cbAllocated=144 cbWritten=3, cbAllocated=144 cbWritten=6, cbAllocated=144 cbWritten=9, cbAllocated=144 ... cbWritten=102, cbAllocated=144 cbWritten=105, cbAllocated=144 cbWritten=108, cbAllocated=144 cbWritten=111, cbAllocated=144 cbWritten=114, cbAllocated=144 cbWritten=117, cbAllocated=144 cbWritten=120, cbAllocated=144 cbWritten=123, cbAllocated=144 cbWritten=126, cbAllocated=144 end of string = 'ck git und' '\xe6\xad\xa3\xe7\xa2\xba\xe3....das Nunstuck git u \x8f'

(the last two bytes seem to be random data, they change from run to run)

-- Marc-Andre Lemburg CEO eGenix.com Software GmbH


Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/