[Tutor] Infinite recursion or too big a problem?

Allan Crooks allan.crooks@btinternet.com
Thu, 05 Jul 2001 17:43:03 +0100


This question was a bit problematic to solve. But I like a challenge. :)

Answer to your question, infinite recursion or too big a problem? Neither. :) It's not infinite recursion, because under most test cases it does quit. It's not too big a problem, because the stack-limit is about 1000 method calls by default, so unless you have a directory structure has really really really deep nested directories, it won't fail.

So what's the problem with your program? Easy. Buggy code which if run under certain conditions will lead to infinite recursion.

Let's have a glimpse at your code.

> #!/usr/bin/env python
> """
> Pylog, version 2.  Written in functional style, just for fun.
> """
> 
> # Imports
> import os, string
> 
> # Globals
> 
> 
> # Force local evaluation of selected module functions
> list_directory = os.listdir
> is_directory = os.path.isdir
> is_file = os.path.isfile
> abs_path = os.path.abspath
> string_find = string.find
> 
> # Functions
> ## my apply functions
> do = lambda func: func()
> doSequence = lambda functionlist: map(do, functionlist)
> 
> ## identity functions
> isDir = lambda fileobj: is_directory(fileobj)
> isFile = lambda fileobj: is_file(fileobj)
> isLogfile = lambda filename: (is_file(filename)) and (filename == 'log')
> 
> ## list generating functions
> concatenate = lambda seq1, seq2: seq1 + seq2
> getFileList = lambda dir: list_directory(dir)
> getDirectories = lambda fileobjs: filter(isDir, fileobjs)
> getLogfiles = lambda fileobjs: filter(isLogfile, fileobjs)
> getAllLogfiles = lambda dirname: reduce(concatenate, [], 
> getLogfiles(getFileList(abs_path(dirname))) +
> map(getAllLogfiles, map(abs_path, getDirectories(getFileList(dirname)))))

So the first thing I did was to run the program. The stack trace however, unhelpfully kept on saying "lambda", so I didn't know where the loop was happening. So I then rewrote all the functions from "lambda" to "def". The loop seemed to indicate that the problem was just happening in "getAllLogfiles".

So I then rewrote the algorithm and shortened it down to this:

def getAllLogfiles (dirname):
   res = map(abs_path, getDirectories(getFileList(dirname)))
   print dirname, res
   return map(getAllLogfiles, map(abs_path, res))

This is the output I got.


>>> getAllLogfiles(".")
 ['F:\\PROGRA~1\\Python\\DLLs', 'F:\\PROGRA~1\\Python\\Doc', 'F:\\PROGRA~1\\Pyt
hon\\include', 'F:\\PROGRA~1\\Python\\Lib', 'F:\\PROGRA~1\\Python\\libs', 'F:\\P
ROGRA~1\\Python\\tcl', 'F:\\PROGRA~1\\Python\\Tools']
F:\PROGRA~1\Python\DLLs []
F:\PROGRA~1\Python\Doc ['F:\\PROGRA~1\\Python\\doc', 'F:\\PROGRA~1\\Python\\lib'
]
F:\PROGRA~1\Python\doc ['F:\\PROGRA~1\\Python\\doc', 'F:\\PROGRA~1\\Python\\lib'
]
F:\PROGRA~1\Python\doc ['F:\\PROGRA~1\\Python\\doc', 'F:\\PROGRA~1\\Python\\lib'
]
F:\PROGRA~1\Python\doc ['F:\\PROGRA~1\\Python\\doc', 'F:\\PROGRA~1\\Python\\lib'
]
F:\PROGRA~1\Python\doc ['F:\\PROGRA~1\\Python\\doc', 'F:\\PROGRA~1\\Python\\lib'
]

<snip>

And then I ran it with getAllLogfiles("c:/"), but that didn't crash.

And then I saw the problem. It's partly your code's fault (well, realistically, it's entirely your code's fault.. :), and the directory you're executing it in.

Your code isn't working properly, because look at the sample execution.

It processes the current directory, which is F:\PROGRA~1\Python. Then it goes into F:\PROGRA~1\Python\DLLs. Then it goes into the F:\PROGRA~1\Python\Doc directory (which has more directories in it than what is listed there, but that's a side effect of the bug I'm about to describe :)

Then look at the third line execution (F:\PROGRA~1\Python\doc). It looks like it's doing the same as the second line, but it isn't. Note the difference in the capitalisation of "doc". Now, in my F:\PROGRA~1\Python\Doc directory, it has several sub-directories, including another one called "doc" (which expands to F:\PROGRA~1\Python\Doc\doc"). The problem is, you're not recreating the path fully.

The absolute path function is documented to do this:

Return a normalized absolutized version of the pathname path. On most platforms, this is equivalent to normpath(join(os.getcwd(), path)). 

Now, os.getcwd() returns the current directory.
os.getcwd() is the same as "."
"." is, at this moment in time, F:\PROGRA~1\Python

So when it moves into the F:\PROGRA~1\Python\Doc directory, and listdir provides all these entries.

about.html *
acks.html *
api
dist
doc
ext
icons
index.html *
inst
lib
mac
modindex.html *
ref
tut

Now, this line only lists two directories:

F:\PROGRA~1\Python\doc ['F:\\PROGRA~1\\Python\\doc', 'F:\\PROGRA~1\\Python\\lib']

But all the directories (the ones with no * next to them) should be listed, but only these two are.

That's because "isdir" will only return true if there is a file of that name, which is a directory.

But you aren't testing to see if F:\PROGRA~1\Python\doc\api exists, you are actually checking to see if F:\PROGRA~1\Python\api exists (since you are using abspath, which joins "F:\PROGRA~1\Python\" with "api", which doesn't exist.

It just so happens that F:\PROGRA~1\Python\doc and F:\PROGRA~1\Python\lib exist, as well as F:\PROGRA~1\Python\doc\doc and F:\PROGRA~1\Python\doc\lib. That's why those two entries are being returned, because "doc" and "lib" are elements of the F:\PROGRA~1\Python\doc directory (and they show up in listdir), and when you use abspath, they also exist as an element of the current directory.

Hopefully that makes sense. I would try to explain it better, but let's just say I'm enough trouble with my wife as it is. ;)

So what are you to do? Well, I'll leave that to you, but I suggest you look into os.path.join function, and pass the current directory you are in to a function, so that you can properly join the path to the element.

HTH,
Allan.