[Tutor] Fwd: RE: Fwd: Re: Sklearn

Alan Gauld alan.gauld at yahoo.co.uk
Sat Mar 11 14:58:16 EST 2017


> I want libraries that contain algorithms to check for relationships
> within a dataset. For example, I want to parse through a SES dataset to
> see any possible connections between student achievement and
> socioeconomic standing, and correlate that to neighborhood wealth.

Ok, With that background I now return to your original
question:

> Can someone explain sklearns to me? I'm a novice at Python,
> and I would like to use machine learning in my coding.
> But aren't there libraries like matplotlib I can already
> use? Why use sklearns?

Starting at the end first...
matplotlib is a plotting library, you give it some raw
data and it plots a nice graphical image in any style
you choose. Think of it like a programmatic version
of the plotting feature in a spreadsheet.

sklearn doesn't do that, it will generate the data
for you to p[lot with matplotlib if you wish. (At least
thats how I interpret the information on the sklearn
web page.)

So its not either/or - you need both. sklearn, as the
sk in the name suggests, is part of SciKit which is a
set of add-ons to SciPy, which includes matplotlib.

What sklearn brings to the picture, again based on
a very quick skim through the introductory material
 - is a framework for doing machine learning. If you
just want to play with its standard datasets then
its very easy to use. If you want to use it on your
own data it gets harder - you need to format your
data into the shape sklearn expects. You then need
to specify/select or write the algorithms needed
for sklearn to do its learning. Don't underestimate
how much preparatory work you will need to do to
feed the engine. Its not magic.

For what you want, Pandas or Rpy might be able to
do it just as easily - but since you don't seem
to already know either of those then sklearn would
seem to be a reasonable alternative/complementary choice.
But if you don't know basic Python well that might
be a bigger challenge.

Given my level of ignorance about both sklearn
and your problem, domain I can't say more than that.
I would suggest asking again on the SciPy forum
since you are likely to find a lot more people
there who have experience of both - and alternatives
like Pandas and Rpy.

> And I know I should learn R, but I'm also learning 
> Python as my primary language now, and R isn't
> really a programming language as Python, Java,

It's not quite as general - I wouldn't try writing
games or GUIs or web apps in R. But you can write
fully self-contained applications in it if you wish.
And for traditional statistical number crunching
it's better than either Python or Java. Fortunately
you can use R from either language via libraries.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list