string / split method on ASCII code?
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Wed Mar 12 22:54:53 EDT 2008
Sorry for breaking threading by replying to a reply, but I don't seem to
have the original post.
On Wed, 2008-03-12 at 15:29 -0500, Michael Wieher wrote:
> Hey all,
>
> I have these annoying textilfes that are delimited by the ASCII char
> for << (only its a single character) and >> (again a single character)
>
> Their codes are 174 and 175, respectively.
>
> My datafiles are in the moronic form
>
> X<<Y>>Z
The glyph that looks like "<<" is a left quote in some European countries
(and a right quote in others, sigh...), and similar for ">>", and are
usually known as left and right "angle quotation mark", chevron or
guillemet. And yes, that certainly looks like a moronic form for a data
file.
But whatever the characters are, we can work with them as normal, if you
don't mind ignoring that they don't display properly everywhere:
>>> lq = chr(174)
>>> rq = chr(175)
>>> s = "x" + lq + "y" + rq + "z"
>>> print s
x�y�z
>>> s.split(lq)
['x', 'y\xafz']
>>> s.split(rq)
['x\xaey', 'z']
And you can use regular expressions as well. Assuming that the quotes are
never nested:
>>> import re
>>> r = re.compile(lq + '(.*?)' + rq)
>>> r.search(s).group(1)
'y'
If you want to treat both characters the same:
>>> s = s.replace(lq, rq)
>>> s.split(rq)
['x', 'y', 'z']
--
Steven
More information about the Python-list
mailing list