Regular Expression Help
Alex Martelli
aleax at aleax.it
Tue Aug 14 11:45:40 EDT 2001
"Tino Lange" <tino.lange at isg.de> wrote in message
news:3B793BD7.1A48D776 at isg.de...
...
> I want to parse a continuos file, that contains messages surrounded by
> nonalphanumerical begin- and end-signs.
> (BEGIN sign 0x02, END sign 0x03)
>
> How can I parse this?
> A working perl-script would be
>
> #!/usr/bin/perl
> while(<>) { s/\x02/\n/g; s/\x03//g; print; }
Very fragile, it seems to me -- the \n within a
message are getting confused with the markers.
> pattern=re.compile('([0x02] | [0x03])')
This pattern matches any one of the ASCII characters:
0
x
2
3
although it's chosen a very peculiar way to specify
that:-). Plus, it defines a group, so the splitter
itself would appear in the value from .split, which
is apparently not what you want.
> I could only split by "normal" characters as far as I saw in the
> documentation.
> Is this right?
No, you just have to use the \02 etc escapes to
specify special characters. Try this split.py:
import re
samplestring='able\02baker\03charlie\02delta'
splitter = re.compile('[\02\03]')
print splitter.split(samplestring)
D:\py21>python spli.py
['able', 'baker', 'charlie', 'delta']
This looks like what you want, right?
Alex
More information about the Python-list
mailing list