Some questions about decode/encode

Thu Jan 24 02:28:31 EST 2008

On 1月24日, 下午1时49分, bbtestin... at gmail.com wrote:
> On Jan 23, 8:49 pm, glacier <rong.x... at gmail.com> wrote:
>
> > I use chinese charactors as an example here.
>
> > >>>s1='你好吗'
> > >>>repr(s1)
>
> > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
>
> > >>>b1=s1.decode('GBK')
>
> > My first question is : what strategy does 'decode' use to tell the way
> > to seperate the words.
>
> decode() uses the GBK strategy you specified to determine what
> constitutes a character in your string.
>
> > My second question is: is there any one who has tested very long mbcs
> > decode? I tried to decode a long(20+MB) xml yesterday, which turns out
> > to be very strange and cause SAX fail to parse the decoded string.
> > However, I use another text editor to convert the file to utf-8 and
> > SAX will parse the content successfully.
>
> > I'm not sure if some special byte array or too long text caused this
> > problem. Or maybe thats a BUG of python 2.5?
>
> That's probably to vague of a description to determine why SAX isn't
> doing what you expect it to.

You mean to post a copy of the XML document?