Python good for data mining?

D.Hering vel.accel at gmail.com
Sun Nov 4 22:42:16 EST 2007


On Nov 3, 9:02 pm, Jens <j3n... at gmail.com> wrote:
> I'm starting a project indatamining, and I'm considering Python and
> Java as possible platforms.
>
> I'm conserned by performance. Most benchmarks report that Java is
> about 10-15 times faster than Python, and my own experiments confirms
> this. I could imagine this to become a problem for very large
> datasets.
>
> How good is the integration with MySQL in Python?
>
> What about user interfaces? How easy is it to use Tkinter for
> developing a user interface without an IDE? And with an IDE? (which
> IDE?)
>
> What if I were to use my Python libraries with a web site written in
> PHP, Perl or Java - how do I intergrate with Python?
>
> I really like Python for a number of reasons, and would like to avoid
> Java.
>
> Sorry - lot of questions here - but I look forward to your replies!


All of my programming is data centric. Data mining is foundational
there in. I started learning computer science via Python in 2003. I
too was concerned about it's performance, especially considering my
need for literally trillions of iterations of financial data tables
with mathematical algorithms.

I then leaned C and then C++. I am now coming home to Python realizing
after my self-eduction, that programming in Python is truly a pleasure
and the performance is not the concern I first considered to be.
Here's why:

Python is very easily extended to near C speed. The Idea that FINALLY
sunk in, was that I should first program my ideas in Python WITHOUT
CONCERN FOR PERFOMANCE. Then, profile the application to find the
"bottlenecks" and extend those blocks of code to C or C++. Cython/
Pyrex/Sip are my preferences for python extension frameworks.

Numpy/Scipy are excellent libraries for optimized mathematical
operations. Pytables is my preferential python database because of
it's excellent API to the acclaimed HDF5 database (used by very many
scientists and government organizations).

As for GUI framework, I have studied Qt intensely and would therefore,
very highly recommend PyQt.

After four years of intense study, I can say that with out a doubt,
Python is most certainly the way to go. I personally don't understand
why, generally, there is any attraction to Java, though I have yet to
study it further.




More information about the Python-list mailing list