[Python-Dev] unicode/string asymmetries

M.-A. Lemburg mal@lemburg.com
Thu, 10 Jan 2002 09:49:32 +0100


Jack Jansen wrote:
> 
> Recently, "M.-A. Lemburg" <mal@lemburg.com> said:
> > How about this: we add a wchar_t codec to Python and the "eu#" parser
> > marker. Then you could write:
> >
> >       wchar_t value = NULL;
> >       int len = 0;
> >       if (PyArg_ParseTuple(tuple, "eu#", "wchar_t", &value, &len) < 0)
> >                 return NULL;
> 
> I like it! Even though I have to do the memory management myself (and
> have to think of the error case) it at least looks reasonable. 

Good :-)

> I'm
> assuming here that if I pass a StringObject it will be unicode-encoded
> using the default encoding, and that unicode value will then be
> converted to wchar_t and put in value, right? Or, in other words,
> passing "a.out" will do the same as passing u"a.out"...

Yes.
 
> One minor misgiving is that this call will *always* copy the string,
> even if the internal coding of unicode objects is wchar_t. That's a
> bit of a nuisance, but we can try to fix that later.

Copying will always take place (either into a preallocated buffer
or one which the PyArg_ParseTuple() API allocates), but then: 
that's the cost you have to pay for the simplicity of the approach.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/