[pypy-dev] [cpyext] partial fake PEP393 implementation to provide access to single unicode characters in strings

Fri Apr 20 10:47:40 CEST 2012

Hi Stefan,

On Sat, Apr 14, 2012 at 18:44, Stefan Behnel <stefan_ml at behnel.de> wrote:
> PEP393 (the new Unicode type in Py3.3) defines a rather useful C interface
> towards the characters of a Unicode string. I think it would be cool if
> cpyext provided that, so that access to single characters won't require
> copying the unicode buffer into C space anymore.

FWIW, if it makes sense, you can add PyPy-specific API functions not
in the standard CPython C API, too.  I'm thinking about accessing
*string* characters, for example.

> Specifically, the
> intention is to avoid creating a 1-character unicode string copy before
> taking its ord(). Does this happen automatically, or is there a way to make
> sure it does that?

In RPython, indexing a string returns a single char, which is a
different low-level type than a full string (just "char" in C).

A bientôt,

Armin.