[Tutor] Simple Question...
Bill Mill
bill.mill at gmail.com
Mon Oct 18 21:58:22 CEST 2004
The fastest linecounting function that I could find, without loading
the entire file into memory, was this function:
def wc(f):
c = 0
for line in f: c+=1
This function is about 3 times faster than this one (the next fastest
I could think of):
def wcslow(f):
line = ' '
c = 0
while line:
line = f.read(1024)
c += line.count('\n')
Which leads me to question, why is iterator-based file access so much
faster than read?
Also, regarding the random line function, I realized that if you want
to have equal probability of selecting any line, you *must* know how
many lines are in the file beforehand. If you're ok with favoring
longer lines over shorter ones, then you can just pick a random spot
in the file.
Peace
Bill Mill
bill.mill at gmail.com
On Sun, 17 Oct 2004 20:23:00 -0400, Rich Krauter <rmkrauter at yahoo.com> wrote:
> Bill Mill wrote:
> > OK, I posted a fortune file to my webserver. It's at
> > http://llimllib.f2o.org/files/osfortune . I see 2 competitions:
> >
> > 1) fastest function to find a random line from the file; the catch is
> > that this function must be able to pick a random line from anywhere in
> > the file. It must be capable of returning the first line, the last
> > line, and anything in between.
> >
> > 2) fastest function to count the lines in the file.
> >
>
> I doubt the following is the fastest on either point - just figured I'd
> post it since it uses built-in module linecache, which I haven't seen
> mentioned in this thread yet.
>
> Just like some of the posted solutions, linecache reads the entire file
> into a list; that module's code may be of interest to those proposing
> that type of solution.
>
> import random
> import linecache
>
> def getrandomline(fname,nlines):
> n = random.randint(0,nlines)
> return n,linecache.getline(fname,n)
>
> def wcl(fname):
> # make sure file has been cached; to do so,
> # run linecache.getline() and discard result
> if not linecache.cache[fname]:
> linecache.getline(fname,1)
> # return number of lines in file
> return len(linecache.cache[fname][2])
>
> if __name__ == '__main__':
> print getrandomline('junk.txt',wcl(fname))
>
> The linecache module doesn't have a function to return the number of
> lines in a file; but it very easily could provide one since the cached
> file's lines are available in a list. I used that fact to count the
> number of lines, in wcl() above, rather than opening the file again to
> count its lines.
>
> Rich
> _______________________________________________
>
>
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list