Some questions about decode/encode

Thu Jan 24 00:03:19 EST 2008

glacier <rong.xian at gmail.com> writes:

> I use chinese charactors as an example here.
> 
> >>>s1='你好吗'
> >>>repr(s1)
> "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
> >>>b1=s1.decode('GBK')
> 
> My first question is : what strategy does 'decode' use to tell the way
> to seperate the words. I mean since s1 is an multi-bytes-char string,
> how did it determine to seperate the string every 2bytes or 1byte?

The codec you specified ("GBK") is, like any character-encoding codec,
a precise mapping between characters and bytes. It's almost certainly
not aware of "words", only character-to-byte mappings.

-- 
 \       "When I get new information, I change my position. What, sir, |
  `\          do you do with new information?"  -- John Maynard Keynes |
_o__)                                                                  |
Ben Finney