[scikit-learn] R user trying to learn Python

Sebastian Raschka se.raschka at gmail.com
Sun Jun 18 16:27:37 EDT 2017


Hi, C W,

yeah I'd say that Python is a programming language with lots of packages for scientific computing, whereas R is more of a toolbox for stats. Thus, Python may be a bit weird at first for people who come from the R/stats field and are new to programming. Not sure if it is necessary to learn programming & computer science basics for a person who is primarily interested in in stats and ML, but since so many tools are Python-based and require some sort of basic programming to fit the pieces together, it's maybe not a bad idea :).

There's probably an over-abundance of python intro books out there ... However, I'd maybe recommend a introduction to computer science book that uses Python as a teaching language rather than a book that is just about Python language.

Maybe check out https://www.udacity.com/course/intro-to-computer-science--cs101, which is a Python-based computer science course (and should be free).

Best,
Sebastian


> On Jun 18, 2017, at 4:18 PM, C W <tmrsg11 at gmail.com> wrote:
> 
> Hi Sebastian,
> 
> I looked through your book. I think it is great if you already know Python, and looking to learn machine learning.
> 
> For me, I have some sense of machine learning, but none of Python.
> 
> Unlike R, which is specifically for statistics analysis. Python is broad!
> 
> Maybe some expert here with R can tell me how to go about this. :)
> 
> On Sun, Jun 18, 2017 at 12:53 PM, Sebastian Raschka <se.raschka at gmail.com> wrote:
> Hi,
> 
> > I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
> >
> > code 1:
> > y_sin = np.sin(x)
> > y_cos = np.cos(x)
> >
> > I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
> 
> Because it makes it clear where this function is coming from. Sure, you could do
> 
> from numpy import *
> 
> but this is NOT!!! recommended. The reason why this is not recommended is that it would clutter up your main name space. For instance, numpy has its own sum function. If you do from numpy import *, Python's in-built `sum` will be gone from your main name space and replaced by NumPy's sum. This is confusing and should be avoided.
> 
> > In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
> >
> > Can someone explain the mentality behind this setup?
> 
> This is one way to organize your code and package. Sklearn contains many things, and organizing it by subpackages (linear_model, svm, ...) makes only sense; otherwise, you would end up with code files > 100,000 lines or so, which would make life really hard for package developers.
> 
> Here, scikit-learn tries to follow the core principles of good object oriented program design, for instance, Abstraction, encapsulation, modularity, hierarchy, ...
> 
> > What are some good ways and resources to learn Python for data analysis?
> 
> I think baed on your questions, a good resource would be an introduction to programming book or course. I think that sections on objected oriented programming would make the rationale/design/API of scikit-learn and Python classes as a whole more accessible and address your concerns and questions.
> 
> Best,
> Sebastian
> 
> > On Jun 18, 2017, at 12:02 PM, C W <tmrsg11 at gmail.com> wrote:
> >
> > Dear Scikit-learn,
> >
> > What are some good ways and resources to learn Python for data analysis?
> >
> > I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
> >
> > code 1:
> > y_sin = np.sin(x)
> > y_cos = np.cos(x)
> >
> > I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
> >
> > Code 2:
> > model = LogisticRegression()
> > model.fit(X_train, y_train)
> > model.score(X_test, y_test)
> >
> > In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
> >
> > Code 3:
> > from sklearn import linear_model
> > reg = linear_model.Ridge (alpha = .5)
> > reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
> >
> > In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
> >
> > Can someone explain the mentality behind this setup?
> >
> > Thank you very much!
> >
> > M
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list