[Python-Dev] Regression in unicodestr.encode()?

M.-A. Lemburg mal@lemburg.com
Wed, 10 Apr 2002 21:44:01 +0200


"M.-A. Lemburg" wrote:
> 
> "Martin v. Loewis" wrote:
> >
> > "M.-A. Lemburg" <mal@lemburg.com> writes:
> >
> > > Some debugging with gdb indicates that the codec is indeed writing
> > > the 'nd', but the final _PyString_Resize() (which allocates a new
> > > buffer and copies the data into that buffer) fails to copy the last
> > > two characters from the string or overwrites it with NULLs.
> > >
> > > Looks like a pymalloc problem to me. Tim ?
> >
> > It's a UTF-8 codec bug. The codec writes over the end of the buffer,
> > then invokes resize. Resizing only copies the allocated bytes, hence
> > the uninitialized bytes at the end.
> 
> Ah, yes, you're right.

That is... instrumenting the codec I get these results:

>>> (u'\u6b63\u78ba\u306b\u8a00\u3046\u3068\u7ffb\u8a33\u306f'
...        u'\u3055\u308c\u3066\u3044\u307e\u305b\u3093\u3002\u4e00'
...        u'\u90e8\u306f\u30c9\u30a4\u30c4\u8a9e\u3067\u3059\u304c'
...        u'\u3001\u3042\u3068\u306f\u3067\u305f\u3089\u3081\u3067'
...        u'\u3059\u3002\u5b9f\u969b\u306b\u306f\u300cWenn ist das'
...        u' Nunstuck git und'.encode('utf-8'))
cbWritten=0, cbAllocated=144
cbWritten=3, cbAllocated=144
cbWritten=6, cbAllocated=144
cbWritten=9, cbAllocated=144
...
cbWritten=102, cbAllocated=144
cbWritten=105, cbAllocated=144
cbWritten=108, cbAllocated=144
cbWritten=111, cbAllocated=144
cbWritten=114, cbAllocated=144
cbWritten=117, cbAllocated=144
cbWritten=120, cbAllocated=144
cbWritten=123, cbAllocated=144
cbWritten=126, cbAllocated=144
end of string = 'ck git und'
'\xe6\xad\xa3\xe7\xa2\xba\xe3....das Nunstuck git u \x8f'

(the last two bytes seem to be random data, they change 
from run to run)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/