[Python-Dev] import screwiness
Tim Peters
tim.peters at gmail.com
Thu Jul 6 06:53:54 CEST 2006
[Neal Norwitz]
> In import.c starting around line 1210 (I removed a bunch of code that
> doesn't matter for the problem):
>
> if (PyUnicode_Check(v)) {
> copy = PyUnicode_Encode(PyUnicode_AS_UNICODE(v),
> PyUnicode_GET_SIZE(v),
> Py_FileSystemDefaultEncoding, NULL);
> v = copy;
> }
> len = PyString_GET_SIZE(v);
> if (len + 2 + namelen + MAXSUFFIXSIZE >= buflen) {
> Py_XDECREF(copy);
> continue; /* Too long */
> }
> strcpy(buf, PyString_AS_STRING(v));
>
> ***
> So if v is originally unicode, then copy is unicode from the second
> line, right?
No. An encoded unicode string is of type str, and PyUnicode_Encode()
returns an encoded string. Like so:
>>> u"\u1122".encode('utf-8')
'\xe1\x84\xa2'
>>> type(_)
<type 'str'>
> Then we assign v to copy, so v is still unicode.
Almost ;-)
> Then later on we do PyString_GET_SIZE and PyString_AS_STRING. That doesn't
> work, does it? What am I missing?
The conceptual type of the object returned by PyUnicode_Encode().
More information about the Python-Dev
mailing list