efficient text file search.

noro amit.man at gmail.com
Mon Sep 11 14:34:45 EDT 2006


i'm not sure.

each line in the text file and an index string. i can sort the file,
and use some binary tree search on
it. (I need to do a number of searchs).
there are 1219137 indexs in the file. so maby a memory efficient sort
algorithm is in place.
how can mmap help me?
is there any fbinary search algorithm for text files out there or do i
need to write one?


Steve Holden wrote:
> noro wrote:
> > Bill Scherer wrote:
> >
> >>noro wrote:
> >>
> >>
> >>>Is there a more efficient method to find a string in a text file then:
> >>>
> >>>f=file('somefile')
> >>>for line in f:
> >>>   if 'string' in line:
> >>>        print 'FOUND'
> >>>
> >>>?
> >>>
> >>>BTW:
> >>>does "for line in f: " read a block of line to te memory or is it
> >>>simply calls f.readline() many times?
> >>>
> >>>thanks
> >>>amit
> >>>
> >>>
> >>
> >>If your file is sorted by some key in the data, you can build a very
> >>fast binary search with mmap in Python.
> >
> >
>  > can you add some more info, or point me to a link, i haven't found
>  > anything about binary search in mmap() in python documents.
>  >
>  > the files are very big...
>  >
> [please don't "top-post": add your latest comments at the end so the
> story reads from the beginning].
>
> I think this is probably not going to help you. A binary search is only
> useful if you want to locate a value in an ordered list. Since your
> original posting made it seem like the text you are looking for could
> appear in any position in any line of the file a binary search doesn't
> do you any good at all (in fact it complicates things and slows them
> down unnecessarily) because you'd still need to look at all lines.
>
> Plus, if the lines are of variable length then you'd need to start by
> creating an index of them, meaning you'd have to go right through the
> file anyway.
>
> regards
>   Steve
> --
> Steve Holden       +44 150 684 7255  +1 800 494 3119
> Holden Web LLC/Ltd          http://www.holdenweb.com
> Skype: holdenweb       http://holdenweb.blogspot.com
> Recent Ramblings     http://del.icio.us/steve.holden




More information about the Python-list mailing list