[Pythonmac-SIG] CF module oddity

Ronald Oussoren oussoren@cistron.nl
Tue May 6 21:06:43 EDT 2003


On Tuesday, May 6, 2003, at 21:50 Europe/Amsterdam, Jack Jansen wrote:

>
> On dinsdag, mei 6, 2003, at 18:27 Europe/Amsterdam, Ronald Oussoren 
> wrote:
>>> CFStringCreateWithCharacters expects a unicode string. The Python 
>>> format specifier for unicode strings accepts "normal" strings, and 
>>> interpretes them as a binary data stream containing UTF16 unicode > 
>>> data.
>>
>> Very usefull :-( Is this documented anywhere? The documentation in 
>> the section "Extracting Parameters in Extension Functions" of 
>> "Extending and Embedding the Python Interpreter" does note mention 
>> this misfeature.
>
> I'm not sure whether it's documented. If it isn't please file a bug 
> report.
>
> And, about this being a misfeature: in some cases it definitely is, 
> but in others it's definitely a feature. It really depends on whether 
> you want to just pass raw data through (in which case it's a feature) 
> or whether the data is interpreted (think filenames and such), in 
> which case you'd much rather have the 8-bit string converted to 
> unicode with the current default encoding.

The reason I think this is a misfeature is that is behaves completely 
different from the default unicode conversion:

	unicode(val)  is equivalent to val.decode('ascii')
	PyArg_Parse("u",...) is equivalent to val.decode('utf-16')

(both if isistance(val, str)).

I'll file a bugreport.

Ronald





More information about the Pythonmac-SIG mailing list