Using Regular Expressions to Parse SQL

MRAB google at mrabarnett.plus.com
Tue Feb 5 20:22:11 EST 2008


On Feb 5, 6:18 pm, c... at aol.com wrote:
> Firstly, thanks to those who posted.
>
> I just do not understand how the non-greedy operator works.
>
> Using the following code:
>
> import re
>
> s = "qry_Lookup.desc = CSS_Rpt1.desc AND qry_Lookup.lcdu1 =
> CSS_Rpt1.lcdu"
>
> pat = "(.+=)+?(.+)"
          ^^
          | This is your problem.

The ".+" is greedy and is matching up to the last "=". Try "(.+?=)(.
+)" instead.

>
> m = re.match(pat, s)
>
> if m is None:
>     print "No Match"
> else:
>     for mat in m.groups():
>         print mat
>
> I would expect that the first group would match one or more times with
> the fewest amount of text.  However the result is:
>
> >qry_Lookup.desc = CSS_Rpt1.desc AND qry_Lookup.lcdu1 =
> > CSS_Rpt1.lcdu
>
> The first match of the "=" char is still greedy.  I would have
> expected:
> qry_Lookup.desc =
>
> > CSS_Rpt1.desc AND qry_Lookup.lcdu1
> > =
> > CSS_Rpt1.lcdu
>
> I'm obviously missing something because the non-greedy match seems to
> not be acting as expected.
>
> Any insights would be greatly appreciated.
>
> Thanks in advance,
>
> Chris (c... at aol.com)
>
> On Feb 5, 9:31 am, c... at aol.com wrote:
>
> > Hello again -
>
> > I do not seem to be able to get a handle on non-greedy pattern
> > matching.
>
> > I am trying to parse the following - note that there are no line
> > breaks in the string:
>
> > " FROM ((qry_Scores_Lookup1 INNER JOIN CSS_Rpt1 ON
> > (qry_Scores_Lookup1.desc = CSS_Rpt1.desc) AND
> > (qry_Scores_Lookup1.lastcdu = CSS_Rpt1.lastcdu))"
>
> > What I would like to do is be able to split on the "AND" and the "="
> > to come up with the following:
> > qry_Scores_Lookup1.desc
> > CSS_Rpt1.desc
> > qry_Scores_Lookup1.lastcdu
> > CSS_Rpt1.lastcdu
>
> > The following is one of my many attempts to do this:
>
> > import re
>
> > s= " FROM ((qry_Scores_Lookup1 INNER JOIN CSS_Rpt1 ON
> > (qry_Scores_Lookup1.desc = CSS_Rpt1.desc) AND
> > (qry_Scores_Lookup1.lastcdu = CSS_Rpt1.lastcdu))"
>
> > pat = " FROM .+ (?:INNER|LEFT|RIGHT) JOIN .+ ON (?:AND)*?((.+)=(.+))"
>
> > m = re.match(pat, s)
>
> > if m is None:
> >     print "No Match"
> > else:
> >     for mat in m.groups():
> >         print mat
>
> > My pattern does not even come close.
>
> > Any help would be greatly appreciated.  My goal is to analyse a large
> > number of SQL querys to try to identify the join field and see where
> > indexing might make sense.
>
> > While I am mostly interested in understanding regular expressions, I
> > would also be interested in knowing about any Python SQL parsers out
> > there.
>
> > Thanks in advance.
>
> > Chris (c... at aol.com)



More information about the Python-list mailing list