[omaha] Group Data Science Competition

Wes Turner wes.turner at gmail.com
Sat Dec 17 14:56:48 EST 2016


On Saturday, December 17, 2016, Luke Schollmeyer via Omaha <omaha at python.org>
wrote:

> Made a quick attempt between middle school basketball games...did nothing
> more than pull in all non-categorical variables (rather than the five we
> used) and filled NAs with the variable mean. Jumped the score to 0.25857 -
> better than Wednesday's but not better than Jeremy's (good job, btw). Not
> really that much there to add to the notebook.


Does Kaggle take the high mark but still give a score for each submission?

Thinking of ways to keep track of which code produced which score; I'll
post about the GitHub setup in a bit.


>
> Next attempt I'll still stick with LR, but will work on the categorical's
> and look at better ways to fill missing values (using the mean is a hack).


https://github.com/rhiever/datacleaner/blob/master/README.md

> Replaces missing values with the mode (for categorical variables) or
median (for continuous variables) on a column-by-column basis


>
> Luke
>
> On Sat, Dec 17, 2016 at 9:24 AM, Bob Haffner via Omaha <omaha at python.org
> <javascript:;>>
> wrote:
>
> > We jumped up 58 spots thanks to Jeremy!   We are allotted 5 submissions
> per
> > day so feel free to give it a go
> >
> > On Wed, Dec 14, 2016 at 10:16 PM, Bob Haffner <bob.haffner at gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Good turnout tonight!  We managed to form a team, explore some data,
> pick
> > > some features, fit a model and make a submission.
> > >
> > > We are currently sitting just outside the top 10 in 2,350th place :-)
> > > We'll be climbing the ranks in no time though!
> > >
> > > Still room for more folks if anyone else is interested.  FYI, we're
> going
> > > to try and meet in January
> > >
> > > On Wed, Dec 14, 2016 at 12:19 PM, Wes Turner via Omaha <
> omaha at python.org <javascript:;>
> > >
> > > wrote:
> > >
> > >> - https://github.com/westurner/house_prices (BSD 3-Clause, TPOT is
> > GPLv3)
> > >> - https://github.com/westurner/house_prices/commits/develop
> > >> - https://github.com/westurner/house_prices/blob/develop/
> > environment.yml
> > >> -
> > >> https://github.com/westurner/house_prices/blob/develop/tests
> > >> /test_house_prices.py
> > >> -
> > >> https://github.com/westurner/house_prices/blob/develop/house
> > >> _prices/analysis.py
> > >> -
> > >> https://github.com/westurner/house_prices/blob/develop/house
> > >> _prices/data.py
> > >>
> > >> cookiecutter, hubflow
> > >>
> > >> data.py loads the data into a sklearn Bunch after pd.do_get_dummies
> (and
> > >> dataclean.autoclean, while I figure out how to use OneHotEncoder).
> > >>
> > >>  - [ ] update the docstrings from load_boston()
> > >>  - https://github.com/rhiever/datacleaner/issues/1#
> > issuecomment-266980937
> > >>    "I think it illogical to e.g. average Exterior1st in the Kaggle
> House
> > >> Prices Dataset: the average of ImStucc and Wd Sdng seems nonsensical?"
> > >>
> > >> analysis.py HEAD-1 (before I wrapped it in a class while waiting)
> looks
> > >> like it'll be another 4-5 hours on this notebook from 2009.
> > >>
> > >>  - PCA? There's probably a better way.
> > >>
> > >> I added an environment.yml but haven't yet determined the minimal set
> > for
> > >> setup.py, so
> > >>
> > >>    conda env update -f=environment.yml  # make condaenvupdate
> > >>
> > >>
> > >> I'm supposed to be at work now. I may not be able to make it this
> > evening;
> > >> if not, good luck.
> > >>
> > >> I'll add the generated pipeline to the github repo and share the URL.
> > >>
> > >>
> > >> On Fri, Dec 9, 2016 at 12:31 PM, Wes Turner <wes.turner at gmail.com
> <javascript:;>>
> > wrote:
> > >>
> > >> > So, we need to mutate and crossover until Mean Squared Error (MSE)
> is
> > >> > optimally minimized?
> > >> >
> > >> > http://rhiever.github.io/tpot/examples/Boston_Example/
> > >> >
> > >> > Looks like we need something like load_boston() in
> > >> > https://github.com/scikit-learn/scikit-learn/blob/
> > >> > master/sklearn/datasets/base.py
> > >> >
> > >> >
> > >> > https://www.kaggle.com/c/house-prices-advanced-regression-
> > >> techniques/data
> > >> >
> > >> >
> > >> > On Friday, December 9, 2016, Steve Young via Omaha <
> omaha at python.org <javascript:;>>
> > >> > wrote:
> > >> >
> > >> >> >
> > >> >> > Sign up for Kaggle - Check.
> > >> >> > Install Anaconda - Check https://docs.continuum.io/
> > anaconda/install
> > >> >> > Basic familiarity - Check. http://conda.pydata.org
> > >> >> > /docs/test-drive.html#managing-conda
> > >> >> > Anaconda cheat sheet - Check.
> > >> >> > http://conda.pydata.org/docs/using/cheatsheet.html
> > >> >> > Pycharm and Anaconda - Check. https://www.jetbrains.
> > >> >> > com/help/pycharm/2016.1/conda-support-creating-conda-environ
> > >> ment.html
> > >> >> >
> > >> >> > Steve
> > >> >> >
> > >> >> > On Thu, Dec 1, 2016 at 8:32 AM, Bob Haffner via Omaha <
> > >> omaha at python.org <javascript:;>
> > >> >> >
> > >> >> > wrote:
> > >> >> >
> > >> >> >> Hi All,
> > >> >> >>
> > >> >> >> We're all set for the 12/14 group Kaggle competition kickoff!
> > >> >> >>
> > >> >> >> All experience levels are welcome.  Bring your laptop if you'd
> > like,
> > >> >> but
> > >> >> >> no
> > >> >> >> biggie if you don't
> > >> >> >>
> > >> >> >> I didn't hear any objections to the Housing Prices competition
> so
> > >> >> let's go
> > >> >> >> with that one
> > >> >> >> https://www.kaggle.com/c/house-prices-advanced-regression-
> > >> techniques
> > >> >> >>
> > >> >> >> Suggested things to do prior to 12/14
> > >> >> >> -- Sign up on Kaggle
> > >> >> >> -- Get your machine set up with some pydata libraries
> > >> >> >> (Pandas, Numpy, SciKit-Learn and Jupyter Notebooks).  I
> recommend
> > >> the
> > >> >> >> Anaconda distribution if you're just starting out
> > >> >> >> -- Get some basic familiarity with the competition problem and
> > data
> > >> >> >>
> > >> >> >> Let me know if you have any questions.
> > >> >> >>
> > >> >> >> Thanks!
> > >> >> >> Bob
> > >> >> >>
> > >> >> >>
> > >> >> >> On Tue, Oct 18, 2016 at 8:32 PM, Bob Haffner <
> > bob.haffner at gmail.com <javascript:;>
> > >> >
> > >> >> >> wrote:
> > >> >> >>
> > >> >> >> > Good deal. That's 3 of us (Naomi, you and me) by my count.
> > >> Hopefully
> > >> >> >> > others will join in!!
> > >> >> >> >
> > >> >> >> >  I would be game for a December meetup.
> > >> >> >> >
> > >> >> >> > Sent from my iPhone
> > >> >> >> >
> > >> >> >> > > On Oct 18, 2016, at 8:13 PM, Steve Young via Omaha <
> > >> >> omaha at python.org <javascript:;>>
> > >> >> >> > wrote:
> > >> >> >> > >
> > >> >> >> > > I would enjoy participating, and learning what you data guys
> > and
> > >> >> gals
> > >> >> >> do.
> > >> >> >> > > (I am not a math guy)
> > >> >> >> > >
> > >> >> >> > > If Hubert does not take December, maybe we could have a
> sprint
> > >> that
> > >> >> >> > night?
> > >> >> >> > >
> > >> >> >> > > Steve
> > >> >> >> > >
> > >> >> >> > > On Mon, Oct 17, 2016 at 3:05 PM, Wes Turner via Omaha <
> > >> >> >> omaha at python.org <javascript:;>>
> > >> >> >> > > wrote:
> > >> >> >> > >
> > >> >> >> > >> On Monday, October 17, 2016, Bob Haffner via Omaha <
> > >> >> omaha at python.org <javascript:;>
> > >> >> >> >
> > >> >> >> > >> wrote:
> > >> >> >> > >>
> > >> >> >> > >>> Hi All,
> > >> >> >> > >>>
> > >> >> >> > >>> A few months ago someone brought up the idea of doing a
> > Kaggle
> > >> >> data
> > >> >> >> > >> science
> > >> >> >> > >>> competition as a group.  Is there still interest in this?
> > >> >> >> > >>>
> > >> >> >> > >>> Some thoughts.
> > >> >> >> > >>> Not sure of the details, but Kaggle allows individuals to
> > form
> > >> >> >> groups.
> > >> >> >> > >> We
> > >> >> >> > >>> could collaborate thru email (or perhaps something like
> > Slack)
> > >> >> and
> > >> >> >> > maybe
> > >> >> >> > >>> meet occasionally.  When it's all said and done, we could
> > >> present
> > >> >> >> at a
> > >> >> >> > >>> monthly meeting.
> > >> >> >> > >>
> > >> >> >> > >>
> > >> >> >> > >> A GitHub (repo, issues, and sphinx docs/ and/or GH wiki)
> > could
> > >> >> also
> > >> >> >> be
> > >> >> >> > >> useful:
> > >> >> >> > >>
> > >> >> >> > >> - gh-pages branch built from docs/ and nb/
> > >> >> >> > >>  - .ipynb in notebooks/ or nb/
> > >> >> >> > >> - https://github.com/audreyr/cookiecutter-pypackage/ has
> > >> >> packaging
> > >> >> >> and
> > >> >> >> > >> ReadTheDocs config
> > >> >> >> > >> -
> > >> >> >> > >> https://github.com/jupyter/docker-stacks/blob/master/
> > >> >> >> > >> scipy-notebook/Dockerfile
> > >> >> >> > >> includes conda
> > >> >> >> > >>
> > >> >> >> > >>
> > >> >> >> > >>
> > >> >> >> > >>>
> > >> >> >> > >>> This one looks good.  Doesn't end till March 1st which
> gives
> > >> us
> > >> >> some
> > >> >> >> > time
> > >> >> >> > >>> and it doesn't look overly complicated.  No prize money,
> > >> though
> > >> >> :-)
> > >> >> >> > >>> https://www.kaggle.com/c/house-prices-advanced-
> regression-
> > >> >> >> techniques
> > >> >> >> > >>
> > >> >> >> > >>
> > >> >> >> > >> - http://rhiever.github.io/tpot/examples/Boston_Example/
> > >> >> >> > >>
> > >> >> >> > >>  - TPOT can utilize XGBoost (as mentioned in the Kaggle
> > >> >> competition
> > >> >> >> > >> description)
> > >> >> >> > >>
> > >> >> >> > >>
> > >> >> >> > >>
> > >> >> >> > >> - https://github.com/donnemartin/data-science-ipython-
> > >> notebooks/
> > >> >> >> > >>
> > >> >> >> > >>
> > >> >> >> > >>> Forming groups
> > >> >> >> > >>> https://www.kaggle.com/wiki/FormingATeam
> > >> >> >> > >>>
> > >> >> >> > >>> Would love to get some feedback on any of this
> > >> >> >> > >>>
> > >> >> >> > >>> Thanks,
> > >> >> >> > >>> Bob
> > >> >> >> > >>> _______________________________________________
> > >> >> >> > >>> Omaha Python Users Group mailing list
> > >> >> >> > >>> Omaha at python.org <javascript:;> <javascript:;>
> > >> >> >> > >>> https://mail.python.org/mailman/listinfo/omaha
> > >> >> >> > >>> http://www.OmahaPython.org
> > >> >> >> > >>>
> > >> >> >> > >> _______________________________________________
> > >> >> >> > >> Omaha Python Users Group mailing list
> > >> >> >> > >> Omaha at python.org <javascript:;>
> > >> >> >> > >> https://mail.python.org/mailman/listinfo/omaha
> > >> >> >> > >> http://www.OmahaPython.org
> > >> >> >> > >>
> > >> >> >> > > _______________________________________________
> > >> >> >> > > Omaha Python Users Group mailing list
> > >> >> >> > > Omaha at python.org <javascript:;>
> > >> >> >> > > https://mail.python.org/mailman/listinfo/omaha
> > >> >> >> > > http://www.OmahaPython.org
> > >> >> >> >
> > >> >> >> _______________________________________________
> > >> >> >> Omaha Python Users Group mailing list
> > >> >> >> Omaha at python.org <javascript:;>
> > >> >> >> https://mail.python.org/mailman/listinfo/omaha
> > >> >> >> http://www.OmahaPython.org
> > >> >> >>
> > >> >> >
> > >> >> >
> > >> >> _______________________________________________
> > >> >> Omaha Python Users Group mailing list
> > >> >> Omaha at python.org <javascript:;>
> > >> >> https://mail.python.org/mailman/listinfo/omaha
> > >> >> http://www.OmahaPython.org
> > >> >>
> > >> >
> > >> _______________________________________________
> > >> Omaha Python Users Group mailing list
> > >> Omaha at python.org <javascript:;>
> > >> https://mail.python.org/mailman/listinfo/omaha
> > >> http://www.OmahaPython.org
> > >>
> > >
> > >
> > _______________________________________________
> > Omaha Python Users Group mailing list
> > Omaha at python.org <javascript:;>
> > https://mail.python.org/mailman/listinfo/omaha
> > http://www.OmahaPython.org
> >
> _______________________________________________
> Omaha Python Users Group mailing list
> Omaha at python.org <javascript:;>
> https://mail.python.org/mailman/listinfo/omaha
> http://www.OmahaPython.org
>


More information about the Omaha mailing list