[Python-Dev] cpython: Issue #16455: On FreeBSD and Solaris, if the locale is C, the

Victor Stinner victor.stinner at gmail.com
Tue Dec 4 09:32:35 CET 2012


Hi,

2012/12/4 Christian Heimes <christian at python.org>:
> Am 04.12.2012 03:23, schrieb victor.stinner:
>> http://hg.python.org/cpython/rev/c25635b137cc
>> changeset:   80718:c25635b137cc
>> parent:      80716:b845901cf702
>> user:        Victor Stinner <victor.stinner at gmail.com>
>> date:        Tue Dec 04 01:34:47 2012 +0100
>> summary:
>>   Issue #16455: On FreeBSD and Solaris, if the locale is C, the
>> ASCII/surrogateescape codec is now used, instead of the locale encoding, to
>> decode the command line arguments. This change fixes inconsistencies with
>> os.fsencode() and os.fsdecode() because these operating systems announces an
>> ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
>>
>> files:
>>   Include/unicodeobject.h          |    2 +-
>>   Lib/test/test_cmd_line_script.py |    9 +-
>>   Misc/NEWS                        |    6 +
>>   Objects/unicodeobject.c          |   24 +-
>>   Python/fileutils.c               |  240 +++++++++++++++++-
>>   5 files changed, 241 insertions(+), 40 deletions(-)
>
> ...
>
>> @@ -3110,7 +3110,8 @@
>>          *surrogateescape = 0;
>>          return 0;
>>      }
>> -    if (strcmp(errors, "surrogateescape") == 0) {
>> +    if (errors == "surrogateescape"
>> +        || strcmp(errors, "surrogateescape") == 0) {
>>          *surrogateescape = 1;
>>          return 0;
>>      }
>
> Victor, That doesn't look right. :) GCC is complaining about the code:
>
> Objects/unicodeobject.c: In function 'locale_error_handler':
> Objects/unicodeobject.c:3113:16: warning: comparison with string literal
> results in unspecified behavior [-Waddress]

Oh, I forgot to commit this change in a separated commit. It's a
micro-optimization.

PyUnicode_EncodeFSDefault() calls PyUnicode_EncodeLocale(unicode,
"surrogateescape"), and PyUnicode_DecodeFSDefaultAndSize() calls
PyUnicode_DecodeLocaleAndSize(s, size, "surrogateescape").

I chose to compare the address because I expect that GCC generates the
same address for "surrogateescape" in PyUnicode_EncodeFSDefault() and
in locale_error_handler(), comparing pointers is faster than comparing
the string content.

I remove this micro-optimization. The code path is only used during
Python startup, and I don't expect any real speedup.

> I'm also getting additional warnings in PyUnicode_Format().
>
> Objects/unicodeobject.c: In function 'PyUnicode_Format':
> Objects/unicodeobject.c:13782:8: warning: 'arg.sign' may be used
> uninitialized in this function [-Wmaybe-uninitialized]
> Objects/unicodeobject.c:13893:33: note: 'arg.sign' was declared here
> Objects/unicodeobject.c:13779:12: warning: 'str' may be used
> uninitialized in this function [-Wmaybe-uninitialized]
> Objects/unicodeobject.c:13894:15: note: 'str' was declared here

These members *are* initialized, but it's even hard to me (author of
this code) to check them. I rewrote how these members are initialized
to make the warnings quiet but also to simplify the code.

Thanks for the review!

Victor

PS: I hope that I really fixed the FreeBSD/Solaris issue :-p


More information about the Python-Dev mailing list