unicode, C++, python 2.2

jepler at unpythonic.net jepler at unpythonic.net
Fri Sep 9 10:42:51 EDT 2005


Python comes in two flavors.  In one, sys.maxunicode is 65535 and Py_UNICODE is
a 16-bit type, and in the other, sys.maxunicode is 1114111 and Py_UNICODE is a
32-bit type.  This is selected at compile time, and RedHat has chosen in some
versions to compile for sys.maxunicode == 1114111.

By using the Py_UNICODE typedef, you generally don't have to worry about this
distinction.  Here is some code that works on RedHat 9's Python 2.2
(sys.maxunicode == 1114111) and a manually built Python 2.3 (sys.maxunicode ==
65535).

#include <Python.h>

PyObject *f(PyObject *self, PyObject *o) {
    if(PyString_Check(o)) {
        char *c = PyString_AS_STRING(o);
        int sz = PyString_GET_SIZE(o);
        int i;
        printf("   Byte string: ");
        for(i=0; i<sz; i++) { printf("%4x ", c[i]); }
        printf("\n");
    } else if (PyUnicode_Check(o)) {
        Py_UNICODE *c = PyUnicode_AS_UNICODE(o);
        int sz = PyUnicode_GET_SIZE(o);
        int i;
        printf("Unicode string: ");
        for(i=0; i<sz; i++) { printf("%4x ", c[i]); }
        printf("\n");
    }
    Py_INCREF(Py_None);
    return Py_None;
}

PyMethodDef d[] = {
    { "f", (PyCFunction)f, METH_O, "Print out the values in a string from C" },
    { NULL, NULL, 0, NULL }
};

void initunidemo(void) {
    Py_InitModule("unidemo", d);
}


$ # build unidemo for python2.2
$ python2.2 -c 'import unidemo, sys; print sys.maxunicode; unidemo.f(u"\N{copyright sign}\N{greek capital letter sigma}")'
1114111
Unicode string:   a9  3a3 
$ # rebuild unidemo for python2.3
$ python2.3 -c 'import unidemo, sys; print sys.maxunicode; unidemo.f(u"\N{copyright sign}\N{greek capital letter sigma}")'
65535
Unicode string:   a9  3a3 

Jeff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20050909/af4ebdf4/attachment.sig>


More information about the Python-list mailing list