regarding threading

Wed Oct 15 04:14:54 EDT 2003

If your program is network bound there might be some
performance gain to be extracted by using threads, taking
into account GIL and all that.

I/O bound ... cannot say, it depends on how many I/O
writes you do per second, the disk cache and whether
you use multiple disks, too many factors.

But in your case, it does not look as if the program is
network bound. So threading may not help here and in fact
might even slow down performance owing to GIL.

The best option for you might be to speed up your search.
If you are searching for patterns use regexps and not string
search or character search, since that slows up matters 
considerably. If you are using just sub-string search *dont*
use regexps as I found out that the simple string search
is faster in most cases.

Otherwise, think about indexing your data using LuPy or 
some other indexer and searching the index. You can write
a small funciton that will rebuild this index when your
actual data changes. Otherwise, i.e in most normal searches
, use this index as a cache and search there. 

Index searching is a factor of times faster than searching using
strings or regexps and a lot of research has gone into that.

HTH.

-Anand

"Andrew Dalke" <adalke at mindspring.com> wrote in message news:<3e1jb.1308$7a4.1240 at newsread4.news.pas.earthlink.net>...
> Neil Hodgson:
> > In which case splitting the file
> > onto multiple disks and using 1 thread for each split may increase
> > performance.
> 
> But then so would disk striping, or a bigger cache, or .. hmm,
> perhaps the data is on a networked filesystem and the slow
> performance comes from the network?  Hard to know without
> more info from the OP.
> 
>                     Andrew
>                     dalke at dalkescientific.com