[pypy-dev] PyPy 2 unicode class

Armin Rigo arigo at tunes.org
Thu Jan 23 18:13:41 CET 2014


Hi Oscar,

Thanks for explaining the caching in detail :-)

On Thu, Jan 23, 2014 at 2:27 PM, Oscar Benjamin
<oscar.j.benjamin at gmail.com> wrote:
> big saving. If the string comes from anything other than utf-8 the indexing
> cache can be built while decoding (and reencoding as utf-8 under the hood).

Actually, you need to walk the string even to do "u =
s.decode('utf-8')".  The reason is that you need to check if the byte
string is well-formed UTF-8 or not.  So we can build the cache eagerly
in all cases, it seems.


A bientôt,

Armin.


More information about the pypy-dev mailing list