This is my first program in Python

David Bolen db3l at fitlinxx.com
Fri Jan 19 01:08:23 EST 2001


rzilavec at tcn.net (Richard Zilavec) writes:

> I was not aware of readline()...  my book does not mention it.  I
> tested readline() versus readlines(), very true, readlines() creates
> an array, however 
> for line in file.readline():
> gets an error
> Traceback (innermost last):
>   File "./python.py", line 6, in ?
>     first,second = string.split(line,':',1)
> ValueError: unpack list of wrong size

The readline() method of file objects will return the next line from
the file (as a string object) or a an empty string object on end of
file.  This is different than readlines() which returns a list of
string objects, representing multiple lines from the file.

The confusing part here is that a string can also behave as a sequence
object (one of characters, which in Python are single character
strings), so when you use the for, it iterates through each character
in the string since it can iterate through any sequence object.

For example, the following uses "for" to iterate over a list of
strings, so during each cycle a new element of the list is chosen:

    >>> for value in ['One','Two']:
    ...     print value
    ...
    One
    Two

However, in this case, I just use a single string (which is also a
sequence) so "for" iterates over each element of the string:

    >>> for value in "One":
    ...     print value
    ...
    O
    n
    e

That's what is happening in your case above - readline() is returning
a single line from the file as a string object, and your for loop is
then iterating over the object character by character.

So when you get to the string.split() usage, your "line" variable is
actually pointing to a single character string, and not the entire
line.  Because it is a single character, the split is going to return
a list with only one entry, which Python then complains that it can't
unpack into a tuple of two elements (first and second).  That's
actually a general risk with that string.split() command - it assumes
that your file lines will split into precisely two parts.  That may be
fine if you know your data, or you may want to enclose it in a
try/except block to cover potential erroneous data in the file.

Using readline() has an advantage to not loading the entire file into
memory, but it is not normally coupled with a "for" - instead the
Python idiom for processing a file with readline() is:

    while 1:
        line = file.readline()
        if not line: break

        # Process line

There are loads of discussions on the newsgroup and entries in the FAQ
related to this idiom (who likes it, who doesn't) and there are ways
to wrap the I/O in an object to make it appear more C/Perl like (with
an assignment within the loop construct for example), but the above
is straight-forward and common.

> When using readlines with a for statement, do all the lines in
> /tmp/globs get placed into file.readlines(), or does it loop for each
> line read?  Hope that makes sense. 

I'm not sure I completely follow what you mean, but perhaps explaining
what is happening would help.

The readlines() method of a file object creates a new list object that
contains each of the lines from the file within the list.  It is that
list object that the "for" is iterating over ("for" can iterate over
any sequence object - or technically any object implementing the
__getitem__ method, but that's slightly more advanced).

So if you have something written as:

    for curline in myfile.readlines():
        print curline

What is happening beneath the covers is similar to:

    (1)  __tempobject__ = myfile.readlines()
    (2)  for curline in __tempobject__:
    (3)      print curline

In (1), the readlines() method of the "myfile" object is called.  It
reads in the entire file into memory, creating a list object to hold
the lines.  Note that there is also an optional parameter to
readlines() to set a guide to the maximum memory to use and it will
only read in lines that can fit in that memory - then you'd have to
put the readlines() call into a larger loop to ensure you got all the
lines.  But I've separated (1) from (2) so you can see that with
readlines() the file I/O occurs as part of the method call and the
contents are loaded into the list object before the iteration begins.

In (2) the variable curline is iterated through the list of lines that
was created by readlines().  By iterating, curline is set to reference
each of the individual objects within the list of lines (in this case
the list is a list of string objects, for each file line).

Within the loop, in (3) - which executes once for each entry in the
list of lines from the file, curline can be used to reference the
string object from the list representing a line of the file.

That's probably deeper than you really need to write the code, but I
think it can be helpful to understand how the various objects and
variables (which are really just names bound to objects) interact.

--
-- David
-- 
/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/



More information about the Python-list mailing list