How to count lines in a text file ?

Andrew Dalke adalke at mindspring.com
Mon Sep 20 13:21:14 EDT 2004


Ling Lee wrote:
> 2) I made the first part like this:
> 
> in_file = raw_input("What is the name of the file you want to open: ")
> in_file = open("test.txt","r")
> text = in_file.read()

You have two different objects related to the file.
One is the filename (the result of calling raw_input) and
the other is the file handle (the result of calling open).
You are using same variable name for both of them.  You
really should make them different.

First you get the file name and reference it by the variable
named 'in_file'.  Next you use another filename ("test.txt")
for the open call.  This returns a file handle, but not
a file handle to the file named in 'in_file'.

You then change things so that 'in_file' no longer refers
to the filename but now refers to the file handle.

A nicer solution is to use one variable name for the name
(like "in_filename") and another for the handle (you can
keep "in_file" if you want to).  In the following I
reformatted it so the example fits in under 80 colums

    in_filename = raw_input("What is the name of the file "
                            "you want to open: ")
    in_file = open(in_filename,"r")
    text = in_file.read()


Now the in_file.read() reads all of the file into memory.  There
are several ways to count the number of lines.  The first is
to count the number of newline characters.  Because the newline
character is special, it's most often written as what's called
an escape code.  In this case, "\n".  Others are backspace ("\b")
and beep ("\g"), and backslash ("\\") since otherwise there's
no way to get the single character "\".

Here's how to cound the number of newlines in the text

num_lines = text.count("\n")

print "There are", num_lines, "in", in_filename


This will work for almost every file except for one where
the last line doesn't end with a newline.  It's rare, but
it does happen.  To fix that you need to see if the
text ends with a newline and if it doesn't then add one
more to the count


num_lines = text.count("\n")
if not text.endswith("\n"):
   num_lines = num_lines + 1

print "There are", num_lines, "in", in_filename


> 3) I think that I have to use a for loop ( something like
> for line in text: count +=1)

Something like that will work.  When you say "for xxxx in string"
it loops through every character in the string, and not
every line.  What you need is some way to get the lines.

One solution is to use the 'splitlines' method of strings.
This knows how to deal with the "final line doesn't end with
a newline" case and return a list of all the lines.  You
can use it like this

   count = 0
   for line in text.splitlines():
     count = count + 1

or, since splitlines() returns a list of lines you can
also do

   count = len(text.splitlines())

It turns out that reading lines from a file is very common.
When you say "for xxx in file" it loops through every line
in the file.  This is not a list so you can't say

   len(open(in_filename, "r"))  # DOES NOT WORK

instead you need to have the explicit loop, like this

   count = 0
   for line in open(in_filename, "r")):
     count = count + 1

An advantage to this approach is that it doesn't read
the whole file into memory.  That's only a problems
if you have a large file.  Try counting the number of
lines in a 1.5 GB file!

By the way, the "r" is the default for the a file open.
Most people omit it from the parameter list and just use

    open(in_filename)

Hope this helped!

By the way, you might want to look at the "Beginner's
Guide to Python" page at http://python.org/topics/learn/ .
It has pointers to resources that might help, including
the tutor mailing list meant for people like you who
are learning to program in Python.

				Andrew
				dalke at dalkescientific.com



More information about the Python-list mailing list