advanced regex, was: Re: scanf style parsing

Hans-Peter Jansen hpj at urpla.net
Thu Oct 4 11:25:30 EDT 2001


Skip Montanaro <skip at pobox.com> wrote in message news:<mailman.1001607930.24098.python-list at python.org>...
> Tim> It's not usually easy to learn regexps, no matter what your
>     Tim> background.  I come from C/C++ roots (Turbo C++ 3.0) and TRS-80
>     Tim> BASIC before that, and I certainly had no idea what regex's were
>     Tim> really for until I looked at Perl.
> 
> I think the best way to learn about regular expressions is to use
> incremental regular expression searching in Emacs/XEmacs.  Just bind C-s and
> C-r to isearch-forward-regexp and isearch-backward-regexp.  Then, every time
> you search you're using re's.  Initially you'll just use plain strings, but
> eventually start mixing in "." and character classes.  Before you know it
> "*" and "+" will be your buddies too.  Once you start adding "\(", "\|" and
> "\)" to your repertoire, you will attain enlightenment. ;-)
> 
> You'll generally never cook up complex regular rexpressions using
> incremental search because you have no convenient way to correct mistakes
> and retry, but you will use all the pieces and build up more complex stuff
> when you're programming Perl or Python.  Making the leap from Emacs's
> old-style re's to POSIX-style re's as Perl and Python use now is fairly
> straightforward.  Mostly it involves getting rid of backslashes and learning
> about {m,n}, \d, \s and other little shortcuts.  (I still almost never use
> \d.  My fingers just type [0-9] automatically.)
> 
> maybe-the-best-argument-against-vi-ly, yr's

Well, yesterday, I tried to parse some simple hexdump, produced by
tcpdump -xs1500 port 80. The idea was, filter the hexcodes, and display
and 7 bit acsii codes like a little advanced hex monitors do.

As I'm fairly new to advanced regex constructs, would somebody enlight
me, how to efficiently parse lines like:

                 2067 726f 7570 732e 2e2e 3c2f 613e 3c2f
                 666f 6e74 3e3c 2f74 643e 3c2f 7472 3e3c
                 7472 3e3c 7464 2062 6763 6f6c 6f72 3d23
                 6666 6363 3333 2063 6f6c 7370 616e 3d34
                 3e3c 494d 4720 6865 6967 6874 3d31 2073
                 7263 3d22 2f69 6d61 6765 732f 636c 6561
                 7264 6f74 2e67 6966 2220 7769 6474 683d
                 3120 3e3c 2f74 643e 3c2f 7472 3e3c 2f74
                 6162 6c65 3e3c 703e 3c66 6f6e 7420 7369
                 7a65 3d2d 313e 4172 6520 796f 7520 6120

with respect to varying column numbers. I will refrain to 
show my stupid beginnings, but I wasn't able to get that _one_
regex right, with all columns in matchobj.groups() listed.

new-in-regexing-ly, yr's
Hans-Peter

P.S.: I ended up in a "simple" c based filter...
Please CC me



More information about the Python-list mailing list