Python good for data mining?

Cameron Walsh cameron.walsh at gmail.com
Sat Nov 3 22:49:22 EDT 2007


Jens wrote:
> I'm starting a project in data mining, and I'm considering Python and
> Java as possible platforms.
> 
> I'm concerned by performance. Most benchmarks report that Java is
> about 10-15 times faster than Python, and my own experiments confirms
> this. I could imagine this to become a problem for very large
> datasets.

If most of the processing is done with SQL calls, this shouldn't be an 
issue.  I've known a couple of people at Sydney University who were 
using Python for data mining.  I think they were using sqlite3 and MySQL.

> 
> How good is the integration with MySQL in Python?

Never tried it, but a quick google reveals a number of approaches you 
could try - the MySQLdb module, MySQL for Python, etc.

> 
> What about user interfaces? How easy is it to use Tkinter for
> developing a user interface without an IDE? And with an IDE? (which
> IDE?)

WxPython was recommended to me when I was learning how to create a GUI. 
It has more features than Tkinter and a more native look and feel across 
platforms.  With WxPython it was fairly easy to create a multi-pane, 
tabbed interface for a couple of programs, without using an IDE.  The 
demos/tutorials were fantastic.

> 
> What if I were to use my Python libraries with a web site written in
> PHP, Perl or Java - how do I integrate with Python?

Possibly the simplest way would be python .cgi files.  The cgi and cgitb 
modules allow form data to be read fairly easily.  Cookies are also 
fairly simple.  For a more complicated but more customisable approach, 
you could look in to the BaseHTTPServer module or a socket listener of 
some sort, running that alongside the webserver publicly or privately. 
Publicly you'd have links from the rest of your php/whatever pages to 
the python server.  Privately the php/perl/java backend would request 
data from the local python server before feeding the results back 
through the main server (apache?) to the client.



More information about the Python-list mailing list