just a bug

Jarek Zgoda jzgoda at o2.usun.pl
Fri May 25 10:54:22 EDT 2007


Maksim Kasimov napisał(a):

>> 'utf8' codec can't decode bytes in position 176-177: invalid data
>>>>> iMessage[176:178]
>> '\xd1]'
>>
>> And that's your problem. In general you can't just truncate a utf-8
>> encoded string anywhere and expect the result to be valid utf-8. The
>> \xd1 at the very end of your CDATA section is the first byte of a
>> two-byte sequence that represents some unicode code-point between \u0440
>> and \u047f, but it's missing the second byte that says which one.
> 
> 
> in previous message i've explain already that the situation widely
> appears with
> memory limited devices, such as mobile terminals of Nokia, SonyEriccson,
> Siemens and so on.
> 
> and i've notice you that it is a part of a splited string.

No, it is not a part of string. It's a part of byte stream, split in a
middle of multibyte-encoded character.

You cann't get only dot from small letter "i" and ask the parser to
treat it as a complete "i".

-- 
Jarek Zgoda
http://jpa.berlios.de/



More information about the Python-list mailing list