[issue8603] Create a bytes version of os.environ and getenvb()
Marc-Andre Lemburg
report at bugs.python.org
Tue May 4 10:51:20 CEST 2010
Marc-Andre Lemburg <mal at egenix.com> added the comment:
Martin v. Löwis wrote:
>
> Martin v. Löwis <martin at v.loewis.de> added the comment:
>
>> Your name will end up being partially escaped as surrogate:
>>
>> 'L\udcf6wis'
>>
>> Further processing will fail
>
> That depends on the further processing, no?
>
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> UnicodeEncodeError: 'latin-1' codec can't encode character '\udcf6' in position 1: ordinal not in
>> range(256)
>
> Where did you get this error from?
The roundup email interface must have eaten this
first line of the traceback: >>> _.encode('latin-1')
>> It doesn't work if an application tries to work *with* the data,
>> e.g. tries to convert it
>
> Converting it to what?
>
>> parse it
>
> Parsing will work fine.
>
>> decode it
>
> It's a string. You shouldn't decode it.
>
>> The reason is
>> that information included by the use of the 'surrogateescape'
>> error handler is lost along the way and this then causes data
>> corruption.
>
> And how would that not happen if it was bytes? The problems you describe
> were one of the primary motivations to switch to Unicode: it's *byte*
> strings that have these problems.
Martin, it's obvious that you are not even trying to understand
what I'm saying. That's not a good basis for discussion.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8603>
_______________________________________
More information about the Python-bugs-list
mailing list