Some questions about decode/encode

Sun Jan 27 05:17:05 EST 2008

On 1月24日, 下午3时29分, "Gabriel Genellina" <gagsl-... at yahoo.com.ar> wrote:
> En Thu, 24 Jan 2008 04:52:22 -0200, glacier <rong.x... at gmail.com> escribió:
>
> > According to your reply, what will happen if I try to decode a long
> > string seperately.
> > I mean:
> > ######################################
> > a='你好吗'*100000
> > s1 = u''
> > cur = 0
> > while cur < len(a):
> >     d = min(len(a)-i,1023)
> >     s1 += a[cur:cur+d].decode('mbcs')
> >     cur += d
> > ######################################
>
> > May the code above produce any bogus characters in s1?
>
> Don't do that. You might be splitting the input string at a point that is  
> not a character boundary. You won't get bogus output, decode will raise a  
> UnicodeDecodeError instead.
> You can control how errors are handled, see  http://docs.python.org/lib/string-methods.html#l2h-237
>
> --
> Gabriel Genellina

Thanks Gabriel,

I guess I understand what will happen if I didn't split the string at
the character's boundry.
I'm not sure if the decode method will miss split the boundry.
Can you tell me then ?

Thanks a lot.