Python good for data mining?

Maarten maarten.sneep at knmi.nl
Mon Nov 5 10:29:42 EST 2007


On Nov 5, 1:51 pm, Jens <j3n... at gmail.com> wrote:
> On 5 Nov., 04:42, "D.Hering" <vel.ac... at gmail.com> wrote:
>
> > On Nov 3, 9:02 pm, Jens <j3n... at gmail.com> wrote:
>
> > I then leaned C and then C++. I am now coming home to Python realizing
> > after my self-eduction, that programming in Python is truly a pleasure
> > and the performance is not the concern I first considered to be.
> > Here's why:
>
> > Python is very easily extended to near C speed. The Idea that FINALLY
> > sunk in, was that I should first program my ideas in Python WITHOUT
> > CONCERN FOR PERFOMANCE. Then, profile the application to find the
> > "bottlenecks" and extend those blocks of code to C or C++. Cython/
> > Pyrex/Sip are my preferences for python extension frameworks.
>
> > Numpy/Scipy are excellent libraries for optimized mathematical
> > operations. Pytables is my preferential python database because of
> > it's excellent API to the acclaimed HDF5 database (used by very many
> > scientists and government organizations).
>
> So what you're saying is, don't worry about performance when you start
> coding, but use profiling and optimization in C/C++. Sounds
> reasonable. It's been 10 years ago since I've done any programming in C
> ++, so I have to pick up on that soon I guess.

"Premature optimization is the root of all evil", to quote a famous
person. And he's right, as most people working larger codes will
confirm.

As for pytables: it is the most elegant programming interface for HDF
on any platform that I've encountered so far. Most other platforms
stay close the HDF5 library C-interface, which is low-level, and quite
complex. PyTables was written with the end-user in mind, and it shows.
One correction though: PyTables is not a database: it is a storage for
(large) arrays, datablocks that you don't want in a database. Use a
database for the metadata to find the right file and field within that
file. Keep in mind though that I mostly work with externally created
HDF-5 files, not with files created in pytables. PyTables Pro has an
indexing feature which may be helpful for datamining (if you write the
hdf-5 files from python).

Maarten




More information about the Python-list mailing list