Using Regular Expressions to Parse SQL

Reedick, Andrew jr9445 at ATT.COM
Tue Feb 5 10:28:23 EST 2008


> -----Original Message-----
> From: python-list-bounces+jr9445=att.com at python.org [mailto:python-
> list-bounces+jr9445=att.com at python.org] On Behalf Of ct60 at aol.com
> Sent: Tuesday, February 05, 2008 9:31 AM
> To: python-list at python.org
> Subject: Using Regular Expressions to Parse SQL
> 
> 
> My pattern does not even come close.
> 
> Any help would be greatly appreciated.  My goal is to analyse a large
> number of SQL querys to try to identify the join field and see where
> indexing might make sense.
> 
> While I am mostly interested in understanding regular expressions, I
> would also be interested in knowing about any Python SQL parsers out
> there.


Python's regexes are a tad odd compared to Perl (or Perl's regexes are
odd.  Relativity.)  You need re.DOTALL to handle newlines in the sql:
	DOTALL
	Make the "." special character match any character at all,
including a newline; without this flag, "." will match anything except a
newline.
 


import re

s= ''' FROM ((qry_Scores_Lookup1 INNER JOIN CSS_Rpt1 ON
(qry_Scores_Lookup1.desc = CSS_Rpt1.desc) AND
(qry_Scores_Lookup1.lastcdu = CSS_Rpt1.lastcdu))'''

pat =
r"(^|\s+)FROM\s.+\s(?:INNER|LEFT|RIGHT)\s+JOIN\s.+\sON\s+((\s*(?:\S+)\s*
=\s*(?:\S+))(?:\s+|\s+AND\s+|$))+"
m = re.compile(pat, re.DOTALL).match(s)
if m:
	print m


('(qry_Scores_Lookup1.desc = CSS_Rpt1.desc) ', '(qry_Scores_Lookup1.desc
= CSS_Rpt1.desc)')


You should be able to tweak the regex to get the exact matches you want.

An alternative to writing a lengthy regex might be to just grab
everything after the ON and then string split on "AND".



*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA623





More information about the Python-list mailing list