Python good for data mining?

Greg Lindstrom gslindstrom at gmail.com
Mon Nov 5 07:24:07 EST 2007


> ---------- Forwarded message ----------
> From: "D.Hering" <vel.accel at gmail.com>
> To: python-list at python.org
> Date: Sun, 04 Nov 2007 19:42:16 -0800
> Subject: Re: Python good for data mining?
> On Nov 3, 9:02 pm, Jens <j3n... at gmail.com> wrote:
> > I'm starting a project indatamining, and I'm considering Python and
> > Java as possible platforms.
> >
> > I'm conserned by performance. Most benchmarks report that Java is
> > about 10-15 times faster than Python, and my own experiments confirms
> > this. I could imagine this to become a problem for very large
> > datasets.


<snip>

>
> > I really like Python for a number of reasons, and would like to avoid
> > Java.


I've been working with databases -- many in the terabyte size -- for over 20
years and my advice to you is to learn how to use SQL to do most of the work
for you.  There is (almost) nothing you can't do with a good database (we
use Oracle and Postgres, but I hear that MySQL is good, too).  We have over
100 stored procedures and some of our queries are a bit long; some with over
30 JOINS, but our queries are fast enough.  We even generate XML and EDI-X12
directly from the database via stored procedures.

I used to pull as much as I could back from the database and them manipulate
it with C using abstract data types and record layouts with lots of
pointers.  Now I use Python to access the base and let the database do most
of the heavy lifting.  Life is good.

--greg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20071105/e02a78f5/attachment.html>


More information about the Python-list mailing list