Regexes: How to handle escaped characters
Paul McGuire
ptmcg at austin.rr.com
Thu May 17 19:46:17 EDT 2007
On May 17, 6:12 pm, John Machin <sjmac... at lexicon.net> wrote:
>
> Note: "must not be *part of* any match" [my emphasis]
>
Ooops, my bad. See this version:
from pyparsing import Regex,ParseException,col,lineno,getTokensEndLoc
# fake (and inefficient) version of any if not yet upgraded to Py2.5
any = lambda lst : sum(list(lst)) > 0
def guardedSearch(pattern, text, forbidden_offsets):
def offsetValidator(strng,locn,tokens):
start,end = locn,getTokensEndLoc()-1
if any( start <= i <= end for i in forbidden_offsets ):
raise ParseException, "can't match at offset %d" % locn
regex = Regex(pattern).setParseAction(offsetValidator)
return [ (tokStart,toks[0]) for toks,tokStart,tokEnd in
regex.scanString(text) ]
print guardedSearch(ur"o\S", u"Hollo how are you", [8,])
def guardedSearchByColumn(pattern, text, forbidden_columns):
def offsetValidator(strng,locn,tokens):
start,end = col(locn,strng), col(getTokensEndLoc(),strng)-1
if any( start <= i <= end for i in forbidden_columns ):
raise ParseException, "can't match at col %d" % start
regex = Regex(pattern).setParseAction(offsetValidator)
return [ (lineno(tokStart,text),col(tokStart,text),toks[0])
for toks,tokStart,tokEnd in regex.scanString(text) ]
text = """\
alksjdflasjf;sa
a;sljflsjlaj
;asjflasfja;sf
aslfj;asfj;dsf
aslf;lajdf;ajsf
aslfj;afsj;sd
"""
print guardedSearchByColumn("[fa];", text, [4,12,13,])
Prints:
[(1, 'ol'), (15, 'ou')]
[(2, 1, 'a;'), (5, 10, 'f;')]
>
> While we're waiting for clarification from the OP, there's a chicken-
> and-egg thought that's been nagging me: if the OP knows so much about
> the searched string that he can specify offsets which search patterns
> should not span, why does he still need to search it?
>
I suspect that this is column/tabular data (a log file perhaps?), and
some columns are not interesting, but produce many false hits for the
search pattern.
-- Paul
More information about the Python-list
mailing list