Regular Expression Help

Alex Martelli aleax at aleax.it
Tue Aug 14 11:45:40 EDT 2001


"Tino Lange" <tino.lange at isg.de> wrote in message
news:3B793BD7.1A48D776 at isg.de...
    ...
> I want to parse a continuos file, that contains messages surrounded by
> nonalphanumerical begin- and end-signs.
> (BEGIN sign 0x02, END sign 0x03)
>
> How can I parse this?
> A working perl-script would be
>
> #!/usr/bin/perl
> while(<>) { s/\x02/\n/g; s/\x03//g; print; }

Very fragile, it seems to me -- the \n within a
message are getting confused with the markers.

> pattern=re.compile('([0x02] | [0x03])')

This pattern matches any one of the ASCII characters:
    0
    x
    2
    3
although it's chosen a very peculiar way to specify
that:-).  Plus, it defines a group, so the splitter
itself would appear in the value from .split, which
is apparently not what you want.

> I could only split by "normal" characters as far as I saw in the
> documentation.
> Is this right?

No, you just have to use the \02 etc escapes to
specify special characters.  Try this split.py:

import re

samplestring='able\02baker\03charlie\02delta'
splitter = re.compile('[\02\03]')
print splitter.split(samplestring)

D:\py21>python spli.py
['able', 'baker', 'charlie', 'delta']

This looks like what you want, right?


Alex






More information about the Python-list mailing list