\r for newline in readlines function

Skip Montanaro skip at pobox.com
Fri Sep 19 15:19:22 EDT 2003


(warning - this note contains more than you bargained for...)

    Mark> Hi Skip : Thanks for replying.  I can't find the function file
    Mark> anywhere And the error I get when I type in your statement Is
    Mark> 'str' object is not callable. Do You know where I can find info on
    Mark> the file Function ? Thanks.

"file" is the newly proclaimed replacement for "open".  If the version of
Python you're running doesn't have a file builtin, it probably won't have
universal newline support either.  Try this instead:

    f = open("somefile.txt", "rU")

and be prepared for it to barf.  If it does, you can either upgrade to
Python 2.3 or read the file in one fell swoop and split on "\r" as someone
else recommended:

    f = open("somefile.txt", "rb")
    rawdata = f.read()
    lines = rawdata.split("\r")

If you have a very old version of Python you may not even have string
methods, in which case you will have to change the last line to:

    import string
    lines = string.split(rawdata, "\r")

Note that the last line, if terminated by "\r" will leave you with an empty
string at the end:

    >>> import string
    >>> s = "123\r456\r789\r"
    >>> string.split(s, "\r")
    ['123', '456', '789', '']

You can work around that a couple different ways.  First, you can strip any
trailing "\r" characters from the file prior to splitting:

    >>> string.split(string.rstrip(s, "\r"), "\r")
    ['123', '456', '789']

That's probably not acceptable if the file ends with several blank lines.
Alternatively, you can test explicitly for an empty string at the end of the
list:

    >>> if lst[-1:] == [""]:
    ...     lst = lst[:-1]
    ... 
    >>> lst
    ['123', '456', '789']

Note also that none of these solutions works exactly like the file object's
readlines() function, since they strip the end-of-line sequence instead of
replacing it with a newline.  Here's a quickie comparison from my system
which considers the first five lines of my /etc/hosts file.  First, opening
the file in universal newline mode and using readlines:

    >>> hosts = file("/etc/hosts", "rU")
    >>> hosts.readlines()[:5]
    ['##\n', '# Host Database\n', '# \n', '# Note that this file is consulted when the system is running in single-user\n', '# mode.  At other times this information is handled by lookupd.  By default,\n']

Now, reading the whole file and splitting it:

    >>> hosts = file("/etc/hosts", "rb")
    >>> hosts.read().split("\n")[:5]
    ['##', '# Host Database', '# ', '# Note that this file is consulted when the system is running in single-user', '# mode.  At other times this information is handled by lookupd.  By default,']

If you really want to be anal and split on the "\r" characters, mapping them
to newlines, here's a little function:

    >>> import string
    >>> def readlines(f, spliton="\r"):
    ...   data = f.read()
    ...   lines = string.split(data, spliton)
    ...   for i in range(len(lines)):
    ...     lines[i] = lines[i] + "\n"
    ...   return lines
    ... 
    >>> # /User/skip/tmp/hosts is /etc/hosts with \r line endings
    >>> f = open("/Users/skip/tmp/hosts", "rb")
    >>> readlines(f)[:5]
    ['##\n', '# Host Database\n', '# \n', '# Note that this file is consulted when the system is running in single-user\n', '# mode.  At other times this information is handled by lookupd.  By default,\n']

I tried to be as conservative as possible in the constructs I used so that
you wouldn't have to tweak the readlines() function to run on older versions
of Python.  Hence the lack of string methods or list comprehensions.

Skip





More information about the Python-list mailing list