How to Split Chinese Character with backslash representation?

Paul McGuire ptmcg at austin.rr._bogus_.com
Fri Oct 27 09:32:24 EDT 2006


"Wijaya Edward" <ewijaya at i2r.a-star.edu.sg> wrote in message 
news:mailman.1319.1161920633.11739.python-list at python.org...
>
> Hi all,
>
> I was trying to split a string that
> represent chinese characters below:
>
>
>>>> str = '\xc5\xeb\xc7\xd5\xbc'
>>>> print str2,
> ???
>>>> fields2 = split(r'\\',str)
>>>> print fields2,
> ['\xc5\xeb\xc7\xd5\xbc']
>
> But why the split function here doesn't seem
> to do the job for obtaining the desired result:
>
> ['\xc5','\xeb','\xc7','\xd5','\xbc']
>

There are no backslash characters in the string str, so split finds nothing 
to split on.  I know it looks like there are, but the backslashes shown are 
part of the \x escape sequence for defining characters when you can't or 
don't want to use plain ASCII characters (such as in your example in which 
the characters are all in the range 0x80 to 0xff).  Look at this example:

>>> s = "\x40"
>>> print s
@

I defined s using the escaped \x notation, but s does not contain any 
backslashes, it contains the '@' character, whose ordinal character value is 
64, or 40hex.

Also, str is not the best name for a string variable, since this masks the 
built-in str type.

-- Paul 





More information about the Python-list mailing list