convert Unicode to lower/uppercase?

Peter Otten __peter__ at web.de
Tue Sep 23 13:24:36 EDT 2003


jallan wrote:

> I don't see any particular reason why Python "cannot handle case
> mappings that increase string lengths".

Now that's a long post. I think it essentially boils down to the above
statement.

Looking into stringobject.c (judging from a first impression,
unicodeobject.c has essentially the same algorithm, but with a few
indirections):

static PyObject *
string_upper(PyStringObject *self)
{
        char *s = PyString_AS_STRING(self), *s_new;
        int i, n = PyString_GET_SIZE(self);
        PyObject *new;

        new = PyString_FromStringAndSize(NULL, n);
        if (new == NULL)
                return NULL;
        s_new = PyString_AsString(new);
        for (i = 0; i < n; i++) {
                int c = Py_CHARMASK(*s++);
                if (islower(c)) {
                        *s_new = toupper(c);
                } else
                        *s_new = c;
                s_new++;
        }
        return new;
}

The whole routine builds on the assumption that len(s) == len(s.upper()) and
nothing short of a complete rewrite will fix that. But if you volunteer...

Personally, I think it's a long way to go for a little s, sharp as it may be
:-)

Peter





More information about the Python-list mailing list