string / split method on ASCII code?

Wed Mar 12 22:54:53 EDT 2008

Sorry for breaking threading by replying to a reply, but I don't seem to 
have the original post.

On Wed, 2008-03-12 at 15:29 -0500, Michael Wieher wrote:
> Hey all,
> 
> I have these annoying textilfes that are delimited by the ASCII char
> for << (only its a single character) and >> (again a single character)
> 
> Their codes are 174 and 175, respectively.
> 
> My datafiles are in the moronic form
> 
> X<<Y>>Z

The glyph that looks like "<<" is a left quote in some European countries 
(and a right quote in others, sigh...), and similar for ">>", and are 
usually known as left and right "angle quotation mark", chevron or 
guillemet. And yes, that certainly looks like a moronic form for a data 
file.

But whatever the characters are, we can work with them as normal, if you 
don't mind ignoring that they don't display properly everywhere:

>>> lq = chr(174)
>>> rq = chr(175)
>>> s = "x" + lq + "y" + rq + "z"
>>> print s
x�y�z
>>> s.split(lq)
['x', 'y\xafz']
>>> s.split(rq)
['x\xaey', 'z']

And you can use regular expressions as well. Assuming that the quotes are 
never nested:

>>> import re
>>> r = re.compile(lq + '(.*?)' + rq)
>>> r.search(s).group(1)
'y'

If you want to treat both characters the same:

>>> s = s.replace(lq, rq)
>>> s.split(rq)
['x', 'y', 'z']

-- 
Steven