[Python-Dev] Regression in unicodestr.encode()?

Martin v. Loewis martin@v.loewis.de
10 Apr 2002 21:32:15 +0200


"M.-A. Lemburg" <mal@lemburg.com> writes:

> > It's a UTF-8 codec bug. The codec writes over the end of the buffer,
> > then invokes resize. Resizing only copies the allocated bytes, hence
> > the uninitialized bytes at the end.
> 
> Ah, yes, you're right.

Thanks :-) I think the right fix is to avoid any resizing in the UTF-8
codec; that has bitten way too often now. Instead, it should establish
the size of the string first, then perform the actual encoding.

Regards,
Martin