[Python-3000] C API cleanup str

Sun Aug 5 17:59:38 CEST 2007

On 8/5/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > IMO at the C level all conversions between bytes and Unicode that
> > don't specify a conversion should use UTF-8. That's what most of the
> > changes made so far do.
>
> I agree. We should specify that somewhere, so we have a recorded
> guideline to use in case of doubt.

But where? Time to start a PEP for the C API perhaps?

> One function that misbehaves under this spec is
> PyUnicode_FromString[AndSize], which assumes the input is Latin-1
> (i.e. it performs a codepoint-per-codepoint conversion).

Ouch.

> As a consequence, this now can fail because of encoding errors
> (which it previously couldn't).

You mean if it were fixed it could fail, right? Code calling it should
be checking for errors anyway because it allocates memory.

Have you tried making this particular change and seeing what fails?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)