Two dimensional regexp matching?

William Park opengeometry at NOSPAM.yahoo.ca
Sat Jul 27 13:30:22 EDT 2002


In comp.lang.python Paddy <paddy3118 at tiscali.co.uk> wrote:
> We already have the re module for regular expression matching on a string.
> 
> I am looking for pointers to references/algorithms for regular expression matching for
> files of tabular data, i.e.
> 
>     Table definition
>     ================
>     1) Samples from one point in the system appears in a column of the table.
>     2) Samples encoded as characters
>     3) All points in the system are sampled at the same time to produce successive
>        rows of the table
> 
> So a system sampled at two points in successively may produce the following file:
> 
>     GH
>     DF
>     AS
>     QW
>     FF
>     SD
> 
> I want to be able to do regular expression type searches within the file. Things like
>  Where can I find point1 == (D or G) then point2 == W within three samples and where the
> next sample of point2 != the earlier sample of point1?
> 
> That was a small example, in reality there is usually hundreds of points and tens of
> thousands of samples in multi-megabyte files but I'd first like to see if anyone else has
> considered this kind of 'two dimensional regexp matching'
> 
> Note: I DO NOT have queries in the date on sample points. The queries will always be "Find
> the range of sample times in which 'this' occurs".
> 
> U have tried Google but without success - I don't know enough to think of a suitable
> search phrase, or, (much less likely), Google doesn't have it ;-)
> 
> 
> Thanks in advance, Paddy.

You can extract the sample number (ie. row number) when match occurs,
ie. 
    point1 = (D or G)	-> i = 1, 2
    point2 = W		-> i = 4
Then, do your math,
    i2 - i1 < 3		-> 4 - 2 matches.

On Unix, I would do something like
    grep -n '^[DG]'	-> will give you i = 1, 2
    grep -n '^.W'	-> will give you i = 4
I leave it up to you to code this in Python.  It should be simple enough,
depending on your data structure.

-- 
William Park, Open Geometry Consulting, <opengeometry at yahoo.ca>
8-CPU Cluster, Hosting, NAS, Linux, LaTeX, python, vim, mutt, tin



More information about the Python-list mailing list