[Tutor] string.count in Windows vs UNIX

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Tue, 23 Jul 2002 15:08:58 -0700 (PDT)


On Mon, 22 Jul 2002, UNIX Guru wrote:

> I've been dabbling with Python for a bit, and use the following
> script-excerpt to go through a large file checking for specific text. On
> UNIX it finds the correct number of occurances (6665 - double-checked
> with grep -e "Subject: Results:" mail.file | wc -l) but when run on
> Windows (2K/XP) it stops finding, consistently, after 4195 occurances.

Hmmm... this is very odd!


> Is there something within this script-excerpt that would break the
> functionality on Windows, or some difference in the Python
> implementations on both OSes that would affect this? WFIW, the mail.file
> is about 72Mb in size.


There may be some weirdness involved with what "end of line"  means.  In a
Windows system, '\r\n' is considered the line terminator, while in Unix,
'\n' is sufficient.  It may be that some of the lines in mail.file are
terminated only by '\n'.  Macintoshes add to the complication by just
using '\r'.  *grin*


If you have enough free memory, can you try the following:

###
text = mailfile.read()
mailfile.seek(0)
print "This file contains ", text.count("\r'), "carriages"
print "and", text.count("\n"), "newlines"
###

Let's check to see if this is a newline issue.




> 	if eachLine.count("Subject: Results:"):
> 		count += 1


We can simplify this as:

    count += eachLine.count("Subject: Results:")

if we know that this particular string doesn't show up more than once per
line.


Best of wishes to you!