str(bytes) in Python 3.0
Steve Holden
steve at holdenweb.com
Sat Apr 12 17:51:16 EDT 2008
Dan Bishop wrote:
> On Apr 12, 9:29 am, Carl Banks <pavlovevide... at gmail.com> wrote:
>> On Apr 12, 10:06 am, Kay Schluehr <kay.schlu... at gmx.net> wrote:
>>
>>> On 12 Apr., 14:44, Christian Heimes <li... at cheimes.de> wrote:
>>>> Gabriel Genellina schrieb:
>>>>> On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
>>>>> above. But I get the same as repr(x) - is this on purpose?
>>>> Yes, it's on purpose but it's a bug in your application to call str() on
>>>> a bytes object or to compare bytes and unicode directly. Several months
>>>> ago I added a bytes warning option to Python. Start Python as "python
>>>> -bb" and try it again. ;)
>>> And making an utf-8 encoding default is not possible without writing a
>>> new function?
>> I believe the Zen in effect here is, "In the face of ambiguity, refuse
>> the temptation to guess." How do you know if the bytes are utf-8
>> encoded?
>
> True, you can't KNOW that. Maybe the author of those bytes actually
> MEANT to say '¿Cómo estás?' instead of '¿Cómo estás?'. However,
> it's statistically unlikely for a non-UTF-8-encoded string to just
> happen to be valid UTF-8.
So you propose to perform a statistical analysis on your input to
determine whether it's UTF-8 or some other encoding?
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
More information about the Python-list
mailing list