How to count lines in a text file ?
Ling Lee
janimal at mail.trillegaarden.dk
Mon Sep 20 15:58:30 EDT 2004
Thanks for explaining it that well, really makes sense now :)
Cheers....
"Andrew Dalke" <adalke at mindspring.com> wrote in message
news:ekE3d.648$g42.95 at newsread3.news.pas.earthlink.net...
> Ling Lee wrote:
>> 2) I made the first part like this:
>>
>> in_file = raw_input("What is the name of the file you want to open: ")
>> in_file = open("test.txt","r")
>> text = in_file.read()
>
> You have two different objects related to the file.
> One is the filename (the result of calling raw_input) and
> the other is the file handle (the result of calling open).
> You are using same variable name for both of them. You
> really should make them different.
>
> First you get the file name and reference it by the variable
> named 'in_file'. Next you use another filename ("test.txt")
> for the open call. This returns a file handle, but not
> a file handle to the file named in 'in_file'.
>
> You then change things so that 'in_file' no longer refers
> to the filename but now refers to the file handle.
>
> A nicer solution is to use one variable name for the name
> (like "in_filename") and another for the handle (you can
> keep "in_file" if you want to). In the following I
> reformatted it so the example fits in under 80 colums
>
> in_filename = raw_input("What is the name of the file "
> "you want to open: ")
> in_file = open(in_filename,"r")
> text = in_file.read()
>
>
> Now the in_file.read() reads all of the file into memory. There
> are several ways to count the number of lines. The first is
> to count the number of newline characters. Because the newline
> character is special, it's most often written as what's called
> an escape code. In this case, "\n". Others are backspace ("\b")
> and beep ("\g"), and backslash ("\\") since otherwise there's
> no way to get the single character "\".
>
> Here's how to cound the number of newlines in the text
>
> num_lines = text.count("\n")
>
> print "There are", num_lines, "in", in_filename
>
>
> This will work for almost every file except for one where
> the last line doesn't end with a newline. It's rare, but
> it does happen. To fix that you need to see if the
> text ends with a newline and if it doesn't then add one
> more to the count
>
>
> num_lines = text.count("\n")
> if not text.endswith("\n"):
> num_lines = num_lines + 1
>
> print "There are", num_lines, "in", in_filename
>
>
>> 3) I think that I have to use a for loop ( something like
>> for line in text: count +=1)
>
> Something like that will work. When you say "for xxxx in string"
> it loops through every character in the string, and not
> every line. What you need is some way to get the lines.
>
> One solution is to use the 'splitlines' method of strings.
> This knows how to deal with the "final line doesn't end with
> a newline" case and return a list of all the lines. You
> can use it like this
>
> count = 0
> for line in text.splitlines():
> count = count + 1
>
> or, since splitlines() returns a list of lines you can
> also do
>
> count = len(text.splitlines())
>
> It turns out that reading lines from a file is very common.
> When you say "for xxx in file" it loops through every line
> in the file. This is not a list so you can't say
>
> len(open(in_filename, "r")) # DOES NOT WORK
>
> instead you need to have the explicit loop, like this
>
> count = 0
> for line in open(in_filename, "r")):
> count = count + 1
>
> An advantage to this approach is that it doesn't read
> the whole file into memory. That's only a problems
> if you have a large file. Try counting the number of
> lines in a 1.5 GB file!
>
> By the way, the "r" is the default for the a file open.
> Most people omit it from the parameter list and just use
>
> open(in_filename)
>
> Hope this helped!
>
> By the way, you might want to look at the "Beginner's
> Guide to Python" page at http://python.org/topics/learn/ .
> It has pointers to resources that might help, including
> the tutor mailing list meant for people like you who
> are learning to program in Python.
>
> Andrew
> dalke at dalkescientific.com
More information about the Python-list
mailing list