Simple looping question...

Ken Seehof kens at sightreader.com
Tue Apr 3 06:14:59 EDT 2001


"David Allen" <mda at idatar.com> says:
> In article <mailman.986258884.3396.python-list at python.org>, "Ken Seehof"
> <kens at sightreader.com> wrote:
> >>>> f = open(r'd:\qp\var\publish\index.html') z = f.readlines()
> >>>> type(z)
> > <type 'list'>
> >
> > It would appear that readlines() returns a list.  Therefore the entire
file
> > is read in to create that list before returning. On the other hand, I
have
> > written a generator class that does what you are saying.  In order for
> > readlines() to be smart, it would have to return a generator instead of
a
> > list.
>
> I didn't mean in that case.  It's clear that
> foo = file.readlines()
> creates an array and puts it in foo.  I was talking
> about in the case of:
> for line in file.readlines():
>   print line

I know.

Here it is also clear that file.readlines() is a list.

> There, python has two choices.  It could either
> create an entire array holding the entire file, and
> loop through assigning each member of the array to
> line each go through.  Or, it could not allocate
> an array at all, and just treat that statement as
> the equivalent of a long series of file.readline()
> statements.

This could only be true if file.readlines() returns something
other than a list (i.e. a generator or other special iterator
type).

> What I meant was that I think this is the way python
> handles it when file.readlines() is the set of things
> you're talking about in a for loop.
>
> I don't have code proof for that - it's just that
> I've run a program that uses that type of a construct:
>
> for line in file.readlines():
>   # Do something with line
>
> in several programs that get fed 80MB data files,
> and I've never noticed python's memory usage
> encompassing 80 MB, which is what you would expect
> if the interpreter actually allocated an array to
> keep the whole file in memory.

Look closer.  80MB are being used, though perhaps briefly.

> --
> David Allen
> http://opop.nols.com/

Your comments make a certain amount of sense, but are
wrong anyway. :-)

The reason I gave the example ...

>>> f = open(r'd:\qp\var\publish\index.html') z = f.readlines()
>>> type(z)
<type 'list'>

... is simply to demonstrate that f.readlines returns a list.

Since readlines is a function, not a python magic keyword,
there is no way for readlines to know that it is being used in
a for loop, and no way for python to do anything clever.  As
soon as the readlines function returns, you already have your
list.

Presumably you are thinking that readlines is analogous
to xrange.  It isn't.

>>> z = 0L
>>> for i in xrange(2000000):
...  z = z + i
...

In this case python somehow magically avoids creating
a big array of integers.  Look behind the curtain...

>>> xrange(2000000) # silence
xrange(2000000)
>>> type(xrange(2000000)) # silence
<type 'xrange'>
>>> type(range(2000000)) # listen to your hard drive
<type 'list'>
>>> type(xrange(200000000)) # silence
<type 'xrange'>
>>> type(range(200000000))  # Don't do this at home!
              . . . <reboot> . . .

The point is that xrange does not return a list; it returns
an 'xrange' object, which iterates like a list but doesn't
allocate a bunch of memory.

- Ken Seehof
kseehof at neuralintegrator.com
www.neuralintegrator.com







More information about the Python-list mailing list