[Tutor] Simple Question...

Bill Mill bill.mill at gmail.com
Mon Oct 18 21:58:22 CEST 2004


The fastest linecounting function that I could find, without loading
the entire file into memory, was this function:

def wc(f):
   c = 0
   for line in f: c+=1

This function is about 3 times faster than this one (the next fastest
I could think of):

def wcslow(f):
   line = ' '
   c = 0
   while line:
       line = f.read(1024)
       c += line.count('\n')

Which leads me to question, why is iterator-based file access so much
faster than read?

Also, regarding the random line function, I realized that if you want
to have equal probability of selecting any line, you *must* know how
many lines are in the file beforehand. If you're ok with favoring
longer lines over shorter ones, then you can just pick a random spot
in the file.

Peace
Bill Mill
bill.mill at gmail.com


On Sun, 17 Oct 2004 20:23:00 -0400, Rich Krauter <rmkrauter at yahoo.com> wrote:
> Bill Mill wrote:
> > OK, I posted a fortune file to my webserver. It's at
> > http://llimllib.f2o.org/files/osfortune . I see 2 competitions:
> >
> > 1) fastest function to find a random line from the file; the catch is
> > that this function must be able to pick a random line from anywhere in
> > the file. It must be capable of returning the first line, the last
> > line, and anything in between.
> >
> > 2) fastest function to count the lines in the file.
> >
> 
> I doubt the following is the fastest on either point - just figured I'd
> post it since it uses built-in module linecache, which I haven't seen
> mentioned in this thread yet.
> 
> Just like some of the posted solutions, linecache reads the entire file
> into a list; that module's code may be of interest to those proposing
> that type of solution.
> 
> import random
> import linecache
> 
> def getrandomline(fname,nlines):
>      n = random.randint(0,nlines)
>      return n,linecache.getline(fname,n)
> 
> def wcl(fname):
>      # make sure file has been cached; to do so,
>      # run linecache.getline() and discard result
>      if not linecache.cache[fname]:
>          linecache.getline(fname,1)
>      # return number of lines in file
>      return len(linecache.cache[fname][2])
> 
> if __name__ == '__main__':
>      print getrandomline('junk.txt',wcl(fname))
> 
> The linecache module doesn't have a function to return the number of
> lines in a file; but it very easily could provide one since the cached
> file's lines are available in a list. I used that fact to count the
> number of lines, in wcl() above, rather than opening the file again to
> count its lines.
> 
> Rich
> _______________________________________________
> 
> 
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list