[Python-3000] C API cleanup str

Guido van Rossum guido at python.org
Sun Aug 5 17:59:38 CEST 2007


On 8/5/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > IMO at the C level all conversions between bytes and Unicode that
> > don't specify a conversion should use UTF-8. That's what most of the
> > changes made so far do.
>
> I agree. We should specify that somewhere, so we have a recorded
> guideline to use in case of doubt.

But where? Time to start a PEP for the C API perhaps?

> One function that misbehaves under this spec is
> PyUnicode_FromString[AndSize], which assumes the input is Latin-1
> (i.e. it performs a codepoint-per-codepoint conversion).

Ouch.

> As a consequence, this now can fail because of encoding errors
> (which it previously couldn't).

You mean if it were fixed it could fail, right? Code calling it should
be checking for errors anyway because it allocates memory.

Have you tried making this particular change and seeing what fails?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list