[Python-Dev] Unicode support in getargs.c

Jack Jansen jack@oratrix.nl
Fri, 04 Jan 2002 13:22:47 +0100


Sigh, I let myself be drawn in again, despite my previous
assertion....

Recently, "Martin v. Loewis" <martin@v.loewis.de> said:
> > For this it should be as backward-compatible as possible, i.e.  if
> > some API expects a unicode filename and I pass "a.out" it should
> > interpret it as u"a.out".
> 
> That works fine with the current API.

No, it doesn't, that is the whole point of why I started this
thread!!!!

If the Python wrapper around the API uses PyArg_Parse("u") then it
will barf on "a.out", if the wrapper uses "u#" it will not barf but in
stead completely misinterpret the StringObject containing "a.out",
interpreting it as the binary representation of 3 unicode characters
or something far worse!

Yes, there is a workaround with the "O" format and three more function
calls, but I wouldn't call that "works fine"...

> > Using Python StringObjects as binary buffers is also far less common
> > than using StringObjects to store plain old strings, so if either of
> > these uses bites the other it's the binary buffer that needs to
> > suffer.
> 
> This is a conclusion I cannot agree with. Most strings are really
> binary, if you look at them closely enough :-)

I'm not sure I understand this remark. If you made it just for the
smiley: never mind. If you really don't agree: please explain why.

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.cwi.nl/~jack        | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm