Some questions about decode/encode

Thu Jan 24 02:29:38 EST 2008

En Thu, 24 Jan 2008 04:52:22 -0200, glacier <rong.xian at gmail.com> escribió:

> According to your reply, what will happen if I try to decode a long
> string seperately.
> I mean:
> ######################################
> a='你好吗'*100000
> s1 = u''
> cur = 0
> while cur < len(a):
>     d = min(len(a)-i,1023)
>     s1 += a[cur:cur+d].decode('mbcs')
>     cur += d
> ######################################
>
> May the code above produce any bogus characters in s1?

Don't do that. You might be splitting the input string at a point that is  
not a character boundary. You won't get bogus output, decode will raise a  
UnicodeDecodeError instead.
You can control how errors are handled, see  
http://docs.python.org/lib/string-methods.html#l2h-237

-- 
Gabriel Genellina