help with making my code more efficient

Roy Smith roy at panix.com
Thu Dec 20 22:30:31 EST 2012


In article <cc869959-c568-4490-b45f-7855c6841575 at googlegroups.com>,
 "Larry.Martell at gmail.com" <Larry.Martell at gmail.com> wrote:

> On Thursday, December 20, 2012 5:38:03 PM UTC-7, Chris Angelico wrote:
> > On Fri, Dec 21, 2012 at 11:19 AM, Larry.Martell at gmail.com
> > 
> > <Larry.Martell at gmail.com> wrote:
> > 
> > > This code works, but it takes way too long to run - e.g. when cdata has 
> > > 600,000 elements (which is typical for my app) it takes 2 hours for this 
> > > to run.
> > 
> > >
> > 
> > > Can anyone give me some suggestions on speeding this up?
> > 
> > >
> > 
> > 
> > 
> > It sounds like you may have enough data to want to not keep it all in
> > 
> > memory. Have you considered switching to a database? You could then
> > 
> > execute SQL queries against it.
> 
> It came from a database. Originally I was getting just the data I wanted 
> using SQL, but that was taking too long also. I was selecting just the 
> messages I wanted, then for each one of those doing another query to get the 
> data within the time diff of each. That was resulting in tens of thousands of 
> queries. So I changed it to pull all the potential matches at once and then 
> process it in python. 

If you're doing free-text matching, an SQL database may not be the right 
tool.  I suspect you want to be looking at some kind of text search 
engine, such as http://lucene.apache.org/ or http://xapian.org/.



More information about the Python-list mailing list