[Python-bugs-list] [Bug #116136] sre recursion limit hit in tokenize.py
noreply@sourceforge.net
noreply@sourceforge.net
Thu, 5 Oct 2000 22:40:38 -0700
Bug #116136, was updated on 2000-Oct-05 07:21
Here is a current snapshot of the bug.
Project: Python
Category: Library
Status: Open
Resolution: None
Bug Group: Platform-specific
Priority: 6
Summary: sre recursion limit hit in tokenize.py
Details: While running Tim Peters' new reindent script over my code,
I received the following traceback:
checking ./dtcout.py ...
Traceback (most recent call last):
File "/usr/local/bin/reindent", line 258, in ?
main()
File "/usr/local/bin/reindent", line 65, in main
check(arg)
File "/usr/local/bin/reindent", line 77, in check
check(fullname)
File "/usr/local/bin/reindent", line 90, in check
if r.run():
File "/usr/local/bin/reindent", line 135, in run
tokenize.tokenize(self.getline, self.tokeneater)
File "/usr/local/lib/python2.0/tokenize.py", line 152, in tokenize
pseudomatch = pseudoprog.match(line, pos)
RuntimeError: maximum recursion limit exceeded
The pattern compiled into pseudoprog is
[ \f\t]*((\\\r?\n|#[^\r\n]*|([rR]?'''|[rR]?"""))|((0[jJ]|[1-9]\d*[jJ]|((\d+\.\d*|\.\d+)([eE][-+]?\d+)?|[1-9]\d*[eE][-+]?\d+)[jJ])|((\d+\.\d*|\.\d+)([eE][-+]?\d+)?|[1-9]\d*[eE][-+]?\d+)|(0[xX][\da-fA-F]*[lL]?|0[0-7]*[lL]?|[1-9]\d*[lL]?))|((\+=|\-=|\*=|%=|/=|\*\*=|&=|\|=|\^=|>>=|<<=|\+|\-|\*\*|\*|\^|~|/|%|&|\||<<|>>|==|<=|<>|!=|>=|=|<|>)|[][(){}]|(\r?\n|[:;.,`]))|([rR]?'(\\.|[^\n'\\])*('|\\\r?\n)|[rR]?"(\\.|[^\n"\\])*("|\\\r?\n))|[a-zA-Z_]\w*)
I am not even going to try to figure out what part of the re triggers the excessive recursion. Here is the source file reindent was munching on when the exception was raised. Hopefully the bug submission process won't destroy it. If so, let me know (skip@mojam.com) and I'll send it to you.
def dtmlfunc(**__d):
import sys, urllib, string
output = []
append = output.append
append('<!-- Generated: Wed Mar 10 12:13:08 1999 -->')
append('\012<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <!-- -*-html-helper -*- -->\012<HTML>\012<HEAD>\012<TITLE>Musi-Cal Search: ')
append("""%s""" % __d['search_key'])
append('</TITLE>\012<LINK REV="made" HREF="mailto:concertmaster@musi-cal.com">\012</HEAD>\012<BODY bgcolor="#ffffff" text="#000000" link="#CC0000" alink="#FF3300" vlink="#330099"\012>\012\012')
append("""%s""" % __d['banner'])
append('\012\012<H2 align="center">Musi-Cal Search Response</H2>\012\012')
if __d['repeat_msg']:
append('\012 ')
append("""%s""" % __d['repeat_msg'])
append('\012 <hr>\012')
append('\012\012')
if __d['entries']:
append('\012\012 ')
if __d['indirect']:
append('\012 <P>\012 You can invoke this search using the following URL.\012 Why not save it for future reference?\012 <p align=center><code>http://')
append("""%s""" % __d['servername'])
append('/cgi-bin/event-search?')
append("""%s""" % __d['query_string'])
append('</code></p>\012 ')
append('\012 \012 <p>\012 Click <img src="http://www.musi-cal.com/icons/vdb10.gif" alt="[VDB]"> to look up venue information.<br>\012 ')
if __d['next']:
append('\012 <font size="-1">\012 <a href="')
append("""%s""" % __d['next'])
append('">[Next Page]</a>\012 </font>\012 ')
append('\012 ')
if __d['prev']:
append('\012 <font size="-1">\012 <a href="')
append("""%s""" % __d['prev'])
append('">[Previous Page]</a><br>\012 </font>\012 ')
else:
append('\012 ')
if __d['next']:
append('<br>')
append('\012 ')
append('\012\012 <TABLE border=0 cellspacing="5" cellpadding="0" width="100%">\012\012 ')
__i_entries = -1
for __elt_entries in __d['entries']:
__i_entries = __i_entries + 1
append('\012\012\011<tr valign=middle>\012\012\011')
if __d['editor']:
append('\012\011 <TD><A href="http://')
append("""%s""" % __d['servername'])
append('/cgi-bin/edit-entry?editor=')
try: append(urllib.quote(__d['editor']))
except: append(__d['editor'].urllib.quote())
append('&key=')
append("""%s""" % __elt_entries['key'])
append('">[Edit]</A><BR><A href="http://')
append("""%s""" % __d['servername'])
append('/cgi-bin/edit-entry?editor=')
try: append(urllib.quote(__d['editor']))
except: append(__d['editor'].urllib.quote())
append('&key=')
append("""%s""" % __elt_entries['key'])
append('&zap=1">[Delete]</A>\012\011 <br><font size="-1">Submitter: ')
try: append(get_email(__elt_entries))
except: append(__elt_entries.get_email())
append('</font>\012\011')
else:
append('\012\011 ')
if __d['show_vcal']:
append('<TD><A href="http://')
append("""%s""" % __d['servername'])
append('/cgi-bin/gencal?type=vCalendar&key=')
append("""%s""" % __elt_entries['key'])
append('"><IMG SRC="/images/vcal.gif" ALIGN="top" ALT="vCal" width="16" height="16" border="0"></A>')
append('\012\011')
append('\012\012 ')
try: append(html(__elt_entries))
except: append(__elt_entries.html())
append('\012\012 ')
if (__i_entries == 0):
append('\012\011')
if __d['display_static']:
append('\012\011<td width="25%" valign=top rowspan=')
append("""%s""" % len(__d['entries']))
append('>\012\011 ')
if __d['static_info']:
append('\012\011\011 <table width="100%" cellpadding=2>\012\011\011 <tr><th><font size="+1"><a name="staticinfo">Notes Index</a></font></th></tr>\012\011\011 </table>\012\012\011\011')
if __d['stperformers']:
append('\012\011\011 <table width="100%" cellpadding=2>\012\011\011 <tr><th><a name="stperf">Performers</a></th></tr>\012\011\011 <tr><td textcolor="black" bgcolor="white">\012\011\011 <font size="-1">\012\011\011 ')
append("""%s""" % __d['stperformers'])
append('\012\011\011 </font>\012\011\011 </td></tr>\012\011\011 </table>\012\011\011')
append('\012\012\011\011')
if __d['stcontributors']:
append('\012\011\011 <table width="100%" cellpadding=2>\012\011\011 <tr><th>Contributors</th></tr>\012\011\011 <tr><td textcolor="black" bgcolor="white">\012\011\011 <font size="-1">\012\011\011 ')
append("""%s""" % __d['stcontributors'])
append('\012\011\011 </font>\012\011\011 </td></tr>\012\011\011 </table>\012\011\011')
append('\012\012\011\011')
if __d['stvenues']:
append('\012\011\011 <table width="100%" cellpadding=2>\012\011\011 <tr><th><a name="stven">Venues</a></th></tr>\012\011\011 <tr><td textcolor="black" bgcolor="white">\012\011\011 <font size="-1">\012\011\011 ')
append("""%s""" % __d['stvenues'])
append('\012\011\011 </font>\012\011\011 </td></tr>\012\011\011 </table>\012\011\011')
append('\012\012\011\011')
if __d['stevents']:
append('\012\011\011 <table width="100%" cellpadding=2>\012\011\011 <tr><th><a name="stev">Events</a></th></tr>\012\011\011 <tr><td textcolor="black" bgcolor="white">\012\011\011 <font size="-1">\012\011\011 ')
append("""%s""" % __d['stevents'])
append('\012\011\011 </font>\012\011\011 </td></tr>\012\011\011 </table>\012\011\011')
append('\012\012\011\011')
if __d['stcities']:
append('\012\011\011 <table width="100%" cellpadding=2>\012\011\011 <tr><th><a name="stcity">Cities</a></th></tr>\012\011\011 <tr><td textcolor="black" bgcolor="white">\012\011\011 <font size="-1">\012\011\011 ')
append("""%s""" % __d['stcities'])
append('\012\011\011 </font>\012\011\011 </td></tr>\012\011\011 </table>\012\011\011')
append('\012\012\011\011<font size="-1">You can <strong><a\012\011\011href="http://www.musi-cal.com/addstatic.shtml">add</a></strong> or\012\011\011<strong><a href="http://www.musi-cal.com/editstatic.shtml">edit</a></strong>\012\011\011Notes Index items too!</font>\012\012\011 ')
append('\012\011 <p>\012\011 <a href="http://www.musi-cal.com/cgi-bin/redirect/http%3a//www.amazon.com/exec/obidos/redirect-home/musical">\012\011 <img src="http://www.musi-cal.com/icons/amzn-assoc.gif" border="0" alt="In Association with Amazon.com"></a> \012\011</td>\012\011')
append('\012 ')
append('\012\012 ')
append('\012\012 </TABLE>\012\012 ')
if __d['next']:
append('\012 <font size="-1">\012 <a href="')
append("""%s""" % __d['next'])
append('">[Next Page]</a>\012 </font>\012 ')
append('\012 ')
if __d['prev']:
append('\012 <font size="-1">\012 <a href="')
append("""%s""" % __d['prev'])
append('">[Previous Page]</a><br>\012 </font>\012 ')
else:
append('\012 ')
if __d['next']:
append('<br>')
append('\012 ')
append('\012\012 <P>Check with venues before making plans to attend listed events.\012\012 <P>If you notice any errors, <A\012 href="mailto:concertmaster@musi-cal.com">please let us know</A> so we can\012 correct them.\012\012')
else:
append('\012\012 ')
if __d['searchbyperformer']:
append('\012 <p>We were not able to find any matches to your search. It\'s possible\012 the artist you were searching for is not on tour or that we don\'t\012 currently have their dates in our database. You might also try <a\012 href="http://')
append("""%s""" % __d['servername'])
append('/cgi-bin/list-perf">browsing by performer</a>\012 in case you\'ve misspelled their name.\012 If you know of a possible source for their concert schedule,\012 <a href="mailto:concertmaster@musi-cal.com">let us know</a>.\012\012 ')
if __d['stperformers']:
append('\012\011<table width="100%" cellpadding=2>\012\011<tr><th><a name="stperf">Performers</a></th></tr>\012\011<tr><td textcolor="black" bgcolor="white">\012\011 <font size="-1">\012\011 ')
append("""%s""" % __d['stperformers'])
append('\012\011 </font>\012\011 </td></tr>\012\011</table>\012 ')
append('\012\012 ')
else:
append('\012 ')
if __d['searchbyvenue']:
append('\012\011<p>We were not able to find any matches to your search. Venue\012\011names can often spelled several different ways. You might try\012\011searching for just a single word in the venue\'s name, such as\012\011<strong>Passim</strong> instead of <strong>Club Passim</strong>.\012\011Also, some venues, such as church-sponsored coffeehouses, may\012\011be listed either under the name of the coffeehouse or the name\012\011of the hosting venue.\012\011We may also not have their schedule in our database.\012\011If you know of a possible source for their concert schedule,\012\011<a href="mailto:concertmaster@musi-cal.com">let us know</a>.\012 ')
else:
append('\012\011')
if __d['searchbyevent']:
append('\012\011 <p>We were not able to find any matches to your search.\012\011 If you know of a possible source for event schedules,\012\011 <a href="mailto:concertmaster@musi-cal.com">let us know</a>.\012\011')
else:
append('\012\011 ')
if __d['searchbylocation']:
append('\012 <p>We were not able to find any matches to your search.\012 Please check that the city, state and country names are\012 spelled or abbreviated properly. You might also try <a\012 href="http://')
append("""%s""" % __d['servername'])
append('/index-geog.shtml">browsing by location</a>\012 in case you misspelled the name. If you know of a\012 possible source of concert schedules for this region, <a\012 href="mailto:concertmaster@musi-cal.com">let us know</a>.\012\011 ')
else:
append('\012 <p>We were not able to find any matches to your search.\012 If you know of a possible source of concert schedules\012 that would fill a hole in our database, <a\012 href="mailto:concertmaster@musi-cal.com">let us know</a>.\012\011 ')
append('\012\011')
append('\012 ')
append('\012 ')
append('\012\012')
append('\012\012<P ALIGN="center"><CENTER>\012<p>\012<FORM METHOD="GET" ACTION="http://')
append("""%s""" % __d['servername'])
append('/cgi-bin/simple-search">\012<input type=radio name="key" value="performer">performer\012<input type=radio checked name="key" value="city">city\012<input type=radio name="key" value="venue">venue\012<input type=radio name="key" value="event">event<br>\012Search for: <INPUT TYPE="text" NAME="value">\012<input type="submit" value="OK">\012<br>')
append("""%s""" % __d['survey'])
append('\012</FORM></CENTER>\012\012\012<p align=center>\012<strong><font size="-1"><a href="/index-contribute.shtml"\012onMouseOver="window.status=\'Add events or Notes Index entries to the database\'; return true">Contribute</a> | <a href="/index-search.shtml"\012onMouseOver="window.status=\'Search the database in several ways\'; return true">Search</a> | <a href="/index-sponsor.shtml"\012onMouseOver="window.status=\'Advertise in Musi-Cal!\'; return true">Advertise</a> | <a href="/index-contents.shtml"\012onMouseOver="window.status=\'Let your mousey to the walking...\';return true">Contents</a> | <a href="/index-learn.shtml"\012onMouseOver="window.status=\'Stuck? Get help here...\'; return true">Help</a> | <a href="/Faq.shtml"\012onMouseOver="window.status=\'You got questions? We got answers...\'; return true">FAQ</a></font></strong>\012\012\012<p align=center>\012<A HREF="http://www.musi-cal.com/"><IMG border=0 WIDTH=23 HEIGHT=33 SRC="http://www.musi-cal.com/icons/not!
e.gif" ALT="[Musi-Cal Home Page]"></A>\012\012<font size="-1"><a href="mailto:concertmaster@musi-cal.com">Contact Us!</a><br><em>Copyright © 1994-1999, Automatrix, Inc.</em>\012</font>\012\012\012\012</BODY>\012</HTML>\012')
return string.join(output, '')
Follow-Ups:
Date: 2000-Oct-05 07:54
By: gvanrossum
Comment:
After 2.0 final is released.
-------------------------------------------------------
Date: 2000-Oct-05 15:26
By: tim_one
Comment:
Changed Category to Library, Summary to get "reindent" out of it, Group to Platform-specific, assigned to /F, and raised the Priority.
Skip, tell us which platform you're using. It doesn't fail on Windows. Also confirm that this is purely a problem with tokenize.py (e.g., does your test program also fail when run thru tabnanny.py, and/or checkappend.py?).
Since it doesn't fail for me, I can't whittle it down. Strongly *suspect* it's due to your long string literals. tokenize.py uses naive regexps for matching string literals, with bad performance characteristics. If I were on a platform that failed, I'd
a) First whittle this down to a one-liner (pick on the line with the longest string literal).
b) Replace tokenize.py's string-literal regexps with the ones IDLE uses (which worked fine even on multimegabyte string literals, & much faster, in pre).
That wouldn't solve the underlying recursion limit problem, but would make it much harder to provoke.
-------------------------------------------------------
Date: 2000-Oct-05 18:45
By: montanaro
Comment:
Platform: Mandrake Linux 7.1
Python: latest CVS sources
One liner that fails is at
http://www.musi-cal.com/~skip/srelimit.py
That fails when run through tabnanny or reindent
I can't see how to replace the tokenize string re's with IDLE's. reindent and tabnanny both barf on
pseudomatch = pseudoprog.match(line, pos)
As far as I can tell, PseudoToken (which is the re compiled into pseudoprog) doesn't appear to have much to do with matching strings other than recognizing the start of a triple-quoted string, which the problematic source file doesn't contain.
-------------------------------------------------------
Date: 2000-Oct-05 22:40
By: tim_one
Comment:
Thanks, Skip! That confirms everything I suspected.
As to the connection to strings, pseudoprog *does* try to match "the first line" of single-quoted strings (via ContStr), which in your case is your entire 1200+ character single-quoted string literal. That's where, e.g., the
([rR]?'(\\.|[^\n'\\])*('|\\\r?\n)
alternative in your pseudoprog dump comes from, and that's a terrible (for pragmatic reasons) regexp for matching long strings. The IDLE-like alternative would be (re.VERBOSE)
(
[rR]?
'
[^\n'\\]*
(?: \\. [^\n'\\]*)*
(?: ' | \\ \r? \n)
)
This is just a vanilla instance of Friedl's
normal* (special normal*)*
"regexp unrolling", and saves your bacon for reasons spelled out in detail in his book.
-------------------------------------------------------
For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=116136&group_id=5470