Improving my text processing script

pruebauno at latinmail.com pruebauno at latinmail.com
Thu Sep 1 16:21:24 EDT 2005


Paul McGuire wrote:
> match...), this program has quite a few holes.
>
> What if the word "Identifier" is inside one of the quoted strings?
> What if the actual value is "tablename10"?  This will match your
> "tablename1" string search, but it is certainly not what you want.
> Did you know there are trailing blanks on your table names, which could
> prevent any program name from matching?

Good point. I did not think about that. I got lucky because none of the
table names had trailing blanks (google groups seems to add those) the
word identifier is not used inside of quoted strings anywhere and  I do
not have tablename10, but I do have "dba.tablename1" and that one has
to match with tablename1 (and magically did).

>
> So here is an alternative approach using, as many have probably
> predicted by now if they've spent any time on this list, the pyparsing
> module.  You may ask, "isn't a parser overkill for this problem?" and

You had to plug pyparsing! :-). Thanks for the info I did not know
something like pyparsing existed. Thanks for the code too, because
looking at the module it was not totally obvious to me how to use it. I
tried run it though and it is not working for me. The following code
runs but prints nothing at all:

import pyparsing as prs

f=file('tlst'); tlst=[ln.strip() for ln in f if ln]; f.close()
f=file('plst'); plst=f.read()                      ; f.close()

prs.quotedString.setParseAction(prs.removeQuotes)

identLine=(prs.LineStart()
          + 'Identifier'
          + prs.quotedString
          + prs.LineEnd()
          ).setResultsName('prog')

tableLine=(prs.LineStart()
          + 'Value'
          + prs.quotedString
          + prs.LineEnd()
          ).setResultsName('table')

interestingLines=(identLine | tableLine)

for toks,start,end in interestingLines.scanString(plst):
    print toks,start,end




More information about the Python-list mailing list