[Python-Dev] import screwiness

Thu Jul 6 06:53:54 CEST 2006

[Neal Norwitz]
> In import.c starting around line 1210 (I removed a bunch of code that
> doesn't matter for the problem):
>
>                 if (PyUnicode_Check(v)) {
>                         copy = PyUnicode_Encode(PyUnicode_AS_UNICODE(v),
>                                 PyUnicode_GET_SIZE(v),
> Py_FileSystemDefaultEncoding, NULL);
>                         v = copy;
>                 }
>                 len = PyString_GET_SIZE(v);
>                 if (len + 2 + namelen + MAXSUFFIXSIZE >= buflen) {
>                         Py_XDECREF(copy);
>                         continue; /* Too long */
>                 }
>                 strcpy(buf, PyString_AS_STRING(v));
>
> ***
> So if v is originally unicode, then copy is unicode from the second
> line, right?

No.  An encoded unicode string is of type str, and PyUnicode_Encode()
returns an encoded string.  Like so:

>>> u"\u1122".encode('utf-8')
'\xe1\x84\xa2'
>>> type(_)
<type 'str'>

>  Then we assign v to copy, so v is still unicode.

Almost ;-)

> Then later on we do PyString_GET_SIZE and PyString_AS_STRING.  That doesn't
> work, does it?  What am I missing?

The conceptual type of the object returned by PyUnicode_Encode().