[Python-bugs-list] [Bug #116136] sre recursion limit hit in tokenize.py
noreply@sourceforge.net
noreply@sourceforge.net
Thu, 5 Oct 2000 15:26:41 -0700
Bug #116136, was updated on 2000-Oct-05 07:21
Here is a current snapshot of the bug.
Project: Python
Category: Library
Status: Open
Resolution: None
Bug Group: Platform-specific
Priority: 6
Summary: sre recursion limit hit in tokenize.py
Details: While running Tim Peters' new reindent script over my code,
I received the following traceback:
checking ./dtcout.py ...
Traceback (most recent call last):
File "/usr/local/bin/reindent", line 258, in ?
main()
File "/usr/local/bin/reindent", line 65, in main
check(arg)
File "/usr/local/bin/reindent", line 77, in check
check(fullname)
File "/usr/local/bin/reindent", line 90, in check
if r.run():
File "/usr/local/bin/reindent", line 135, in run
tokenize.tokenize(self.getline, self.tokeneater)
File "/usr/local/lib/python2.0/tokenize.py", line 152, in tokenize
pseudomatch = pseudoprog.match(line, pos)
RuntimeError: maximum recursion limit exceeded
The pattern compiled into pseudoprog is
[ \f\t]*((\\\r?\n|#[^\r\n]*|([rR]?'''|[rR]?"""))|((0[jJ]|[1-9]\d*[jJ]|((\d+\.\d*|\.\d+)([eE][-+]?\d+)?|[1-9]\d*[eE][-+]?\d+)[jJ])|((\d+\.\d*|\.\d+)([eE][-+]?\d+)?|[1-9]\d*[eE][-+]?\d+)|(0[xX][\da-fA-F]*[lL]?|0[0-7]*[lL]?|[1-9]\d*[lL]?))|((\+=|\-=|\*=|%=|/=|\*\*=|&=|\|=|\^=|>>=|<<=|\+|\-|\*\*|\*|\^|~|/|%|&|\||<<|>>|==|<=|<>|!=|>=|=|<|>)|[][(){}]|(\r?\n|[:;.,`]))|([rR]?'(\\.|[^\n'\\])*('|\\\r?\n)|[rR]?"(\\.|[^\n"\\])*("|\\\r?\n))|[a-zA-Z_]\w*)
I am not even going to try to figure out what part of the re triggers the excessive recursion. Here is the source file reindent was munching on when the exception was raised. Hopefully the bug submission process won't destroy it. If so, let me know (skip@mojam.com) and I'll send it to you.
def dtmlfunc(**__d):
import sys, urllib, string
output = []
append = output.append
append('<!-- Generated: Wed Mar 10 12:13:08 1999 -->')
append('\012<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <!-- -*-html-helper -*- -->\012<HTML>\012<HEAD>\012<TITLE>Musi-Cal Search: ')
append("""%s""" % __d['search_key'])
append('</TITLE>\012<LINK REV="made" HREF="mailto:concertmaster@musi-cal.com">\012</HEAD>\012<BODY bgcolor="#ffffff" text="#000000" link="#CC0000" alink="#FF3300" vlink="#330099"\012>\012\012')
append("""%s""" % __d['banner'])
append('\012\012<H2 align="center">Musi-Cal Search Response</H2>\012\012')
if __d['repeat_msg']:
append('\012 ')
append("""%s""" % __d['repeat_msg'])
append('\012 <hr>\012')
append('\012\012')
if __d['entries']:
append('\012\012 ')
if __d['indirect']:
append('\012 <P>\012 You can invoke this search using the following URL.\012 Why not save it for future reference?\012 <p align=center><code>http://')
append("""%s""" % __d['servername'])
append('/cgi-bin/event-search?')
append("""%s""" % __d['query_string'])
append('</code></p>\012 ')
append('\012 \012 <p>\012 Click <img src="http://www.musi-cal.com/icons/vdb10.gif" alt="[VDB]"> to look up venue information.<br>\012 ')
if __d['next']:
append('\012 <font size="-1">\012 <a href="')
append("""%s""" % __d['next'])
append('">[Next Page]</a>\012 </font>\012 ')
append('\012 ')
if __d['prev']:
append('\012 <font size="-1">\012 <a href="')
append("""%s""" % __d['prev'])
append('">[Previous Page]</a><br>\012 </font>\012 ')
else:
append('\012 ')
if __d['next']:
append('<br>')
append('\012 ')
append('\012\012 <TABLE border=0 cellspacing="5" cellpadding="0" width="100%">\012\012 ')
__i_entries = -1
for __elt_entries in __d['entries']:
__i_entries = __i_entries + 1
append('\012\012\011<tr valign=middle>\012\012\011')
if __d['editor']:
append('\012\011 <TD><A href="http://')
append("""%s""" % __d['servername'])
append('/cgi-bin/edit-entry?editor=')
try: append(urllib.quote(__d['editor']))
except: append(__d['editor'].urllib.quote())
append('&key=')
append("""%s""" % __elt_entries['key'])
append('">[Edit]</A><BR><A href="http://')
append("""%s""" % __d['servername'])
append('/cgi-bin/edit-entry?editor=')
try: append(urllib.quote(__d['editor']))
except: append(__d['editor'].urllib.quote())
append('&key=')
append("""%s""" % __elt_entries['key'])
append('&zap=1">[Delete]</A>\012\011 <br><font size="-1">Submitter: ')
try: append(get_email(__elt_entries))
except: append(__elt_entries.get_email())
append('</font>\012\011')
else:
append('\012\011 ')
if __d['show_vcal']:
append('<TD><A href="http://')
append("""%s""" % __d['servername'])
append('/cgi-bin/gencal?type=vCalendar&key=')
append("""%s""" % __elt_entries['key'])
append('"><IMG SRC="/images/vcal.gif" ALIGN="top" ALT="vCal" width="16" height="16" border="0"></A>')
append('\012\011')
append('\012\012 ')
try: append(html(__elt_entries))
except: append(__elt_entries.html())
append('\012\012 ')
if (__i_entries == 0):
append('\012\011')
if __d['display_static']:
append('\012\011<td width="25%" valign=top rowspan=')
append("""%s""" % len(__d['entries']))
append('>\012\011 ')
if __d['static_info']:
append('\012\011\011 <table width="100%" cellpadding=2>\012\011\011 <tr><th><font size="+1"><a name="staticinfo">Notes Index</a></font></th></tr>\012\011\011 </table>\012\012\011\011')
if __d['stperformers']:
append('\012\011\011 <table width="100%" cellpadding=2>\012\011\011 <tr><th><a name="stperf">Performers</a></th></tr>\012\011\011 <tr><td textcolor="black" bgcolor="white">\012\011\011 <font size="-1">\012\011\011 ')
append("""%s""" % __d['stperformers'])
append('\012\011\011 </font>\012\011\011 </td></tr>\012\011\011 </table>\012\011\011')
append('\012\012\011\011')
if __d['stcontributors']:
append('\012\011\011 <table width="100%" cellpadding=2>\012\011\011 <tr><th>Contributors</th></tr>\012\011\011 <tr><td textcolor="black" bgcolor="white">\012\011\011 <font size="-1">\012\011\011 ')
append("""%s""" % __d['stcontributors'])
append('\012\011\011 </font>\012\011\011 </td></tr>\012\011\011 </table>\012\011\011')
append('\012\012\011\011')
if __d['stvenues']:
append('\012\011\011 <table width="100%" cellpadding=2>\012\011\011 <tr><th><a name="stven">Venues</a></th></tr>\012\011\011 <tr><td textcolor="black" bgcolor="white">\012\011\011 <font size="-1">\012\011\011 ')
append("""%s""" % __d['stvenues'])
append('\012\011\011 </font>\012\011\011 </td></tr>\012\011\011 </table>\012\011\011')
append('\012\012\011\011')
if __d['stevents']:
append('\012\011\011 <table width="100%" cellpadding=2>\012\011\011 <tr><th><a name="stev">Events</a></th></tr>\012\011\011 <tr><td textcolor="black" bgcolor="white">\012\011\011 <font size="-1">\012\011\011 ')
append("""%s""" % __d['stevents'])
append('\012\011\011 </font>\012\011\011 </td></tr>\012\011\011 </table>\012\011\011')
append('\012\012\011\011')
if __d['stcities']:
append('\012\011\011 <table width="100%" cellpadding=2>\012\011\011 <tr><th><a name="stcity">Cities</a></th></tr>\012\011\011 <tr><td textcolor="black" bgcolor="white">\012\011\011 <font size="-1">\012\011\011 ')
append("""%s""" % __d['stcities'])
append('\012\011\011 </font>\012\011\011 </td></tr>\012\011\011 </table>\012\011\011')
append('\012\012\011\011<font size="-1">You can <strong><a\012\011\011href="http://www.musi-cal.com/addstatic.shtml">add</a></strong> or\012\011\011<strong><a href="http://www.musi-cal.com/editstatic.shtml">edit</a></strong>\012\011\011Notes Index items too!</font>\012\012\011 ')
append('\012\011 <p>\012\011 <a href="http://www.musi-cal.com/cgi-bin/redirect/http%3a//www.amazon.com/exec/obidos/redirect-home/musical">\012\011 <img src="http://www.musi-cal.com/icons/amzn-assoc.gif" border="0" alt="In Association with Amazon.com"></a> \012\011</td>\012\011')
append('\012 ')
append('\012\012 ')
append('\012\012 </TABLE>\012\012 ')
if __d['next']:
append('\012 <font size="-1">\012 <a href="')
append("""%s""" % __d['next'])
append('">[Next Page]</a>\012 </font>\012 ')
append('\012 ')
if __d['prev']:
append('\012 <font size="-1">\012 <a href="')
append("""%s""" % __d['prev'])
append('">[Previous Page]</a><br>\012 </font>\012 ')
else:
append('\012 ')
if __d['next']:
append('<br>')
append('\012 ')
append('\012\012 <P>Check with venues before making plans to attend listed events.\012\012 <P>If you notice any errors, <A\012 href="mailto:concertmaster@musi-cal.com">please let us know</A> so we can\012 correct them.\012\012')
else:
append('\012\012 ')
if __d['searchbyperformer']:
append('\012 <p>We were not able to find any matches to your search. It\'s possible\012 the artist you were searching for is not on tour or that we don\'t\012 currently have their dates in our database. You might also try <a\012 href="http://')
append("""%s""" % __d['servername'])
append('/cgi-bin/list-perf">browsing by performer</a>\012 in case you\'ve misspelled their name.\012 If you know of a possible source for their concert schedule,\012 <a href="mailto:concertmaster@musi-cal.com">let us know</a>.\012\012 ')
if __d['stperformers']:
append('\012\011<table width="100%" cellpadding=2>\012\011<tr><th><a name="stperf">Performers</a></th></tr>\012\011<tr><td textcolor="black" bgcolor="white">\012\011 <font size="-1">\012\011 ')
append("""%s""" % __d['stperformers'])
append('\012\011 </font>\012\011 </td></tr>\012\011</table>\012 ')
append('\012\012 ')
else:
append('\012 ')
if __d['searchbyvenue']:
append('\012\011<p>We were not able to find any matches to your search. Venue\012\011names can often spelled several different ways. You might try\012\011searching for just a single word in the venue\'s name, such as\012\011<strong>Passim</strong> instead of <strong>Club Passim</strong>.\012\011Also, some venues, such as church-sponsored coffeehouses, may\012\011be listed either under the name of the coffeehouse or the name\012\011of the hosting venue.\012\011We may also not have their schedule in our database.\012\011If you know of a possible source for their concert schedule,\012\011<a href="mailto:concertmaster@musi-cal.com">let us know</a>.\012 ')
else:
append('\012\011')
if __d['searchbyevent']:
append('\012\011 <p>We were not able to find any matches to your search.\012\011 If you know of a possible source for event schedules,\012\011 <a href="mailto:concertmaster@musi-cal.com">let us know</a>.\012\011')
else:
append('\012\011 ')
if __d['searchbylocation']:
append('\012 <p>We were not able to find any matches to your search.\012 Please check that the city, state and country names are\012 spelled or abbreviated properly. You might also try <a\012 href="http://')
append("""%s""" % __d['servername'])
append('/index-geog.shtml">browsing by location</a>\012 in case you misspelled the name. If you know of a\012 possible source of concert schedules for this region, <a\012 href="mailto:concertmaster@musi-cal.com">let us know</a>.\012\011 ')
else:
append('\012 <p>We were not able to find any matches to your search.\012 If you know of a possible source of concert schedules\012 that would fill a hole in our database, <a\012 href="mailto:concertmaster@musi-cal.com">let us know</a>.\012\011 ')
append('\012\011')
append('\012 ')
append('\012 ')
append('\012\012')
append('\012\012<P ALIGN="center"><CENTER>\012<p>\012<FORM METHOD="GET" ACTION="http://')
append("""%s""" % __d['servername'])
append('/cgi-bin/simple-search">\012<input type=radio name="key" value="performer">performer\012<input type=radio checked name="key" value="city">city\012<input type=radio name="key" value="venue">venue\012<input type=radio name="key" value="event">event<br>\012Search for: <INPUT TYPE="text" NAME="value">\012<input type="submit" value="OK">\012<br>')
append("""%s""" % __d['survey'])
append('\012</FORM></CENTER>\012\012\012<p align=center>\012<strong><font size="-1"><a href="/index-contribute.shtml"\012onMouseOver="window.status=\'Add events or Notes Index entries to the database\'; return true">Contribute</a> | <a href="/index-search.shtml"\012onMouseOver="window.status=\'Search the database in several ways\'; return true">Search</a> | <a href="/index-sponsor.shtml"\012onMouseOver="window.status=\'Advertise in Musi-Cal!\'; return true">Advertise</a> | <a href="/index-contents.shtml"\012onMouseOver="window.status=\'Let your mousey to the walking...\';return true">Contents</a> | <a href="/index-learn.shtml"\012onMouseOver="window.status=\'Stuck? Get help here...\'; return true">Help</a> | <a href="/Faq.shtml"\012onMouseOver="window.status=\'You got questions? We got answers...\'; return true">FAQ</a></font></strong>\012\012\012<p align=center>\012<A HREF="http://www.musi-cal.com/"><IMG border=0 WIDTH=23 HEIGHT=33 SRC="http://www.musi-cal.com/icons/not!
e.gif" ALT="[Musi-Cal Home Page]"></A>\012\012<font size="-1"><a href="mailto:concertmaster@musi-cal.com">Contact Us!</a><br><em>Copyright © 1994-1999, Automatrix, Inc.</em>\012</font>\012\012\012\012</BODY>\012</HTML>\012')
return string.join(output, '')
Follow-Ups:
Date: 2000-Oct-05 07:54
By: gvanrossum
Comment:
After 2.0 final is released.
-------------------------------------------------------
Date: 2000-Oct-05 15:26
By: tim_one
Comment:
Changed Category to Library, Summary to get "reindent" out of it, Group to Platform-specific, assigned to /F, and raised the Priority.
Skip, tell us which platform you're using. It doesn't fail on Windows. Also confirm that this is purely a problem with tokenize.py (e.g., does your test program also fail when run thru tabnanny.py, and/or checkappend.py?).
Since it doesn't fail for me, I can't whittle it down. Strongly *suspect* it's due to your long string literals. tokenize.py uses naive regexps for matching string literals, with bad performance characteristics. If I were on a platform that failed, I'd
a) First whittle this down to a one-liner (pick on the line with the longest string literal).
b) Replace tokenize.py's string-literal regexps with the ones IDLE uses (which worked fine even on multimegabyte string literals, & much faster, in pre).
That wouldn't solve the underlying recursion limit problem, but would make it much harder to provoke.
-------------------------------------------------------
For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=116136&group_id=5470