[Tutor] Recursion depth exceeded in python web crawler

Steven D'Aprano steve at pearwood.info
Thu Jun 14 21:36:00 EDT 2018


On Thu, Jun 14, 2018 at 02:32:46PM -0400, Daniel Bosah wrote:

> I am trying to modify code from a web crawler to scrape for keywords from
> certain websites. However, Im trying to run the web crawler before  I
> modify it, and I'm running into issues.
> 
> When I ran this code -

[snip enormous code-dump]

> The interpreter returned this error:
> 
> *RuntimeError: maximum recursion depth exceeded while calling a Python
> object*

Since this is not your code, you should report it as a bug to the 
maintainers of the web crawler software. They wrote it, and it sounds 
like it is buggy.

Quoting the final error message on its own is typically useless, because 
we have no context as to where it came from. We don't know and cannot 
guess what object was called. Without that information, we're blind and 
cannot do more than guess or offer the screamingly obvious advice "find 
and fix the recursion error".

When an error does occur, Python provides you with a lot of useful 
information about the context of the error: the traceback. As a general 
rule, you should ALWAYS quote the entire traceback, starting from the 
line beginning "Traceback: ..." not just the final error message.

Unfortunately, in the case of RecursionError, that information can be a 
firehose of hundreds of identical lines, which is less useful than it 
sounds. The most recent versions of Python redacts that and shows 
something similar to this:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in f
  [ previous line repeats 998 times ]
RecursionError: maximum recursion depth exceeded

but in older versions you should manually cut out the enormous flood of 
lines (sorry). If the lines are NOT identical, then don't delete them!

The bottom line is, without some context, it is difficult for us to tell 
where the bug is.

Another point: whatever you are using to post your messages (Gmail?) is 
annoyingly adding asterisks to the start and end of each line. I see 
your quoted code like this:

[direct quote]
*import threading*
*from Queue import Queue*
*from spider import Spider*
*from domain import get_domain_name*
*from general import file_to_set*

Notice the * at the start and end of each line? That makes the code 
invalid Python. You should check how you are posting to the list, and if 
you have "Rich Text" or some other formatting turned on, turn it off.

(My guess is that you posted the code in BOLD or perhaps some colour 
other than black, and your email program "helpfully" added asterisks to 
it to make it stand out.)

Unfortunately modern email programs, especially web-based ones like 
Gmail and Outlook.com, make it *really difficult* for technical forums 
like this. They are so intent on making email "pretty" (generally pretty 
ugly) for regular users, they punish technically minded users who need
to focus on the text not the presentation.



-- 
Steve


More information about the Tutor mailing list