[ python-Bugs-857676 ] RE engine internal error with LARGE RE:
scalability bug
SourceForge.net
noreply at sourceforge.net
Wed Dec 10 11:21:55 EST 2003
Bugs item #857676, was opened at 2003-12-10 17:21
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=857676&group_id=5470
Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Francisco Dellatorre Borges (fdborges)
Assigned to: Fredrik Lundh (effbot)
Summary: RE engine internal error with LARGE RE: scalability bug
Initial Comment:
I lot's of lines with the format:
(d+ at w[^|]+|)+
I'll call the last bit after the @ of /features/. I
need to delete some of these, so I have this code that
would produce a list matching what I would /not/ use,
pass it over re.escape and them build a re using a
concatenation of the list and delete this from the text
before actually doing any parsing.
Problem is: I have about 220.000 different features and
I need to delete some 200.000 different ones from my
files before doing something.
So I tried to use a list of the 20.000 I want and then
delete anything that matches the <not> of it:
#---------------
# ftrlist is the stuff I *want* to keep:
ftrlist = [re.escape(i) for i in ftrlist ]
re.compile(r'(?!(%s))' %( '|'.join(ftrlist)) )
#--------------
but when I apply it I get something like:
RuntimeError: internal error in regular expression engine
I tried the same thing but with a smaller number of
elements, say 1000 ftrlist[:1000], and then it worked.
So I guess there is a bug on the scalability of the re
engine when doing alternative searches.
Attached I'm sending a tar ball that reproduces this.
I'm gzipping it (hope sourceforge does not have a
problem with the resulting binary file).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=857676&group_id=5470
More information about the Python-bugs-list
mailing list