[omaha] Group Data Science Competition

Luke Schollmeyer luke.schollmeyer at gmail.com
Sat Dec 17 14:19:39 EST 2016


Made a quick attempt between middle school basketball games...did nothing
more than pull in all non-categorical variables (rather than the five we
used) and filled NAs with the variable mean. Jumped the score to 0.25857 -
better than Wednesday's but not better than Jeremy's (good job, btw). Not
really that much there to add to the notebook.

Next attempt I'll still stick with LR, but will work on the categorical's
and look at better ways to fill missing values (using the mean is a hack).

Luke

On Sat, Dec 17, 2016 at 9:24 AM, Bob Haffner via Omaha <omaha at python.org>
wrote:

> We jumped up 58 spots thanks to Jeremy!   We are allotted 5 submissions per
> day so feel free to give it a go
>
> On Wed, Dec 14, 2016 at 10:16 PM, Bob Haffner <bob.haffner at gmail.com>
> wrote:
>
> > Good turnout tonight!  We managed to form a team, explore some data, pick
> > some features, fit a model and make a submission.
> >
> > We are currently sitting just outside the top 10 in 2,350th place :-)
> > We'll be climbing the ranks in no time though!
> >
> > Still room for more folks if anyone else is interested.  FYI, we're going
> > to try and meet in January
> >
> > On Wed, Dec 14, 2016 at 12:19 PM, Wes Turner via Omaha <omaha at python.org
> >
> > wrote:
> >
> >> - https://github.com/westurner/house_prices (BSD 3-Clause, TPOT is
> GPLv3)
> >> - https://github.com/westurner/house_prices/commits/develop
> >> - https://github.com/westurner/house_prices/blob/develop/
> environment.yml
> >> -
> >> https://github.com/westurner/house_prices/blob/develop/tests
> >> /test_house_prices.py
> >> -
> >> https://github.com/westurner/house_prices/blob/develop/house
> >> _prices/analysis.py
> >> -
> >> https://github.com/westurner/house_prices/blob/develop/house
> >> _prices/data.py
> >>
> >> cookiecutter, hubflow
> >>
> >> data.py loads the data into a sklearn Bunch after pd.do_get_dummies (and
> >> dataclean.autoclean, while I figure out how to use OneHotEncoder).
> >>
> >>  - [ ] update the docstrings from load_boston()
> >>  - https://github.com/rhiever/datacleaner/issues/1#
> issuecomment-266980937
> >>    "I think it illogical to e.g. average Exterior1st in the Kaggle House
> >> Prices Dataset: the average of ImStucc and Wd Sdng seems nonsensical?"
> >>
> >> analysis.py HEAD-1 (before I wrapped it in a class while waiting) looks
> >> like it'll be another 4-5 hours on this notebook from 2009.
> >>
> >>  - PCA? There's probably a better way.
> >>
> >> I added an environment.yml but haven't yet determined the minimal set
> for
> >> setup.py, so
> >>
> >>    conda env update -f=environment.yml  # make condaenvupdate
> >>
> >>
> >> I'm supposed to be at work now. I may not be able to make it this
> evening;
> >> if not, good luck.
> >>
> >> I'll add the generated pipeline to the github repo and share the URL.
> >>
> >>
> >> On Fri, Dec 9, 2016 at 12:31 PM, Wes Turner <wes.turner at gmail.com>
> wrote:
> >>
> >> > So, we need to mutate and crossover until Mean Squared Error (MSE) is
> >> > optimally minimized?
> >> >
> >> > http://rhiever.github.io/tpot/examples/Boston_Example/
> >> >
> >> > Looks like we need something like load_boston() in
> >> > https://github.com/scikit-learn/scikit-learn/blob/
> >> > master/sklearn/datasets/base.py
> >> >
> >> >
> >> > https://www.kaggle.com/c/house-prices-advanced-regression-
> >> techniques/data
> >> >
> >> >
> >> > On Friday, December 9, 2016, Steve Young via Omaha <omaha at python.org>
> >> > wrote:
> >> >
> >> >> >
> >> >> > Sign up for Kaggle - Check.
> >> >> > Install Anaconda - Check https://docs.continuum.io/
> anaconda/install
> >> >> > Basic familiarity - Check. http://conda.pydata.org
> >> >> > /docs/test-drive.html#managing-conda
> >> >> > Anaconda cheat sheet - Check.
> >> >> > http://conda.pydata.org/docs/using/cheatsheet.html
> >> >> > Pycharm and Anaconda - Check. https://www.jetbrains.
> >> >> > com/help/pycharm/2016.1/conda-support-creating-conda-environ
> >> ment.html
> >> >> >
> >> >> > Steve
> >> >> >
> >> >> > On Thu, Dec 1, 2016 at 8:32 AM, Bob Haffner via Omaha <
> >> omaha at python.org
> >> >> >
> >> >> > wrote:
> >> >> >
> >> >> >> Hi All,
> >> >> >>
> >> >> >> We're all set for the 12/14 group Kaggle competition kickoff!
> >> >> >>
> >> >> >> All experience levels are welcome.  Bring your laptop if you'd
> like,
> >> >> but
> >> >> >> no
> >> >> >> biggie if you don't
> >> >> >>
> >> >> >> I didn't hear any objections to the Housing Prices competition so
> >> >> let's go
> >> >> >> with that one
> >> >> >> https://www.kaggle.com/c/house-prices-advanced-regression-
> >> techniques
> >> >> >>
> >> >> >> Suggested things to do prior to 12/14
> >> >> >> -- Sign up on Kaggle
> >> >> >> -- Get your machine set up with some pydata libraries
> >> >> >> (Pandas, Numpy, SciKit-Learn and Jupyter Notebooks).  I recommend
> >> the
> >> >> >> Anaconda distribution if you're just starting out
> >> >> >> -- Get some basic familiarity with the competition problem and
> data
> >> >> >>
> >> >> >> Let me know if you have any questions.
> >> >> >>
> >> >> >> Thanks!
> >> >> >> Bob
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Oct 18, 2016 at 8:32 PM, Bob Haffner <
> bob.haffner at gmail.com
> >> >
> >> >> >> wrote:
> >> >> >>
> >> >> >> > Good deal. That's 3 of us (Naomi, you and me) by my count.
> >> Hopefully
> >> >> >> > others will join in!!
> >> >> >> >
> >> >> >> >  I would be game for a December meetup.
> >> >> >> >
> >> >> >> > Sent from my iPhone
> >> >> >> >
> >> >> >> > > On Oct 18, 2016, at 8:13 PM, Steve Young via Omaha <
> >> >> omaha at python.org>
> >> >> >> > wrote:
> >> >> >> > >
> >> >> >> > > I would enjoy participating, and learning what you data guys
> and
> >> >> gals
> >> >> >> do.
> >> >> >> > > (I am not a math guy)
> >> >> >> > >
> >> >> >> > > If Hubert does not take December, maybe we could have a sprint
> >> that
> >> >> >> > night?
> >> >> >> > >
> >> >> >> > > Steve
> >> >> >> > >
> >> >> >> > > On Mon, Oct 17, 2016 at 3:05 PM, Wes Turner via Omaha <
> >> >> >> omaha at python.org>
> >> >> >> > > wrote:
> >> >> >> > >
> >> >> >> > >> On Monday, October 17, 2016, Bob Haffner via Omaha <
> >> >> omaha at python.org
> >> >> >> >
> >> >> >> > >> wrote:
> >> >> >> > >>
> >> >> >> > >>> Hi All,
> >> >> >> > >>>
> >> >> >> > >>> A few months ago someone brought up the idea of doing a
> Kaggle
> >> >> data
> >> >> >> > >> science
> >> >> >> > >>> competition as a group.  Is there still interest in this?
> >> >> >> > >>>
> >> >> >> > >>> Some thoughts.
> >> >> >> > >>> Not sure of the details, but Kaggle allows individuals to
> form
> >> >> >> groups.
> >> >> >> > >> We
> >> >> >> > >>> could collaborate thru email (or perhaps something like
> Slack)
> >> >> and
> >> >> >> > maybe
> >> >> >> > >>> meet occasionally.  When it's all said and done, we could
> >> present
> >> >> >> at a
> >> >> >> > >>> monthly meeting.
> >> >> >> > >>
> >> >> >> > >>
> >> >> >> > >> A GitHub (repo, issues, and sphinx docs/ and/or GH wiki)
> could
> >> >> also
> >> >> >> be
> >> >> >> > >> useful:
> >> >> >> > >>
> >> >> >> > >> - gh-pages branch built from docs/ and nb/
> >> >> >> > >>  - .ipynb in notebooks/ or nb/
> >> >> >> > >> - https://github.com/audreyr/cookiecutter-pypackage/ has
> >> >> packaging
> >> >> >> and
> >> >> >> > >> ReadTheDocs config
> >> >> >> > >> -
> >> >> >> > >> https://github.com/jupyter/docker-stacks/blob/master/
> >> >> >> > >> scipy-notebook/Dockerfile
> >> >> >> > >> includes conda
> >> >> >> > >>
> >> >> >> > >>
> >> >> >> > >>
> >> >> >> > >>>
> >> >> >> > >>> This one looks good.  Doesn't end till March 1st which gives
> >> us
> >> >> some
> >> >> >> > time
> >> >> >> > >>> and it doesn't look overly complicated.  No prize money,
> >> though
> >> >> :-)
> >> >> >> > >>> https://www.kaggle.com/c/house-prices-advanced-regression-
> >> >> >> techniques
> >> >> >> > >>
> >> >> >> > >>
> >> >> >> > >> - http://rhiever.github.io/tpot/examples/Boston_Example/
> >> >> >> > >>
> >> >> >> > >>  - TPOT can utilize XGBoost (as mentioned in the Kaggle
> >> >> competition
> >> >> >> > >> description)
> >> >> >> > >>
> >> >> >> > >>
> >> >> >> > >>
> >> >> >> > >> - https://github.com/donnemartin/data-science-ipython-
> >> notebooks/
> >> >> >> > >>
> >> >> >> > >>
> >> >> >> > >>> Forming groups
> >> >> >> > >>> https://www.kaggle.com/wiki/FormingATeam
> >> >> >> > >>>
> >> >> >> > >>> Would love to get some feedback on any of this
> >> >> >> > >>>
> >> >> >> > >>> Thanks,
> >> >> >> > >>> Bob
> >> >> >> > >>> _______________________________________________
> >> >> >> > >>> Omaha Python Users Group mailing list
> >> >> >> > >>> Omaha at python.org <javascript:;>
> >> >> >> > >>> https://mail.python.org/mailman/listinfo/omaha
> >> >> >> > >>> http://www.OmahaPython.org
> >> >> >> > >>>
> >> >> >> > >> _______________________________________________
> >> >> >> > >> Omaha Python Users Group mailing list
> >> >> >> > >> Omaha at python.org
> >> >> >> > >> https://mail.python.org/mailman/listinfo/omaha
> >> >> >> > >> http://www.OmahaPython.org
> >> >> >> > >>
> >> >> >> > > _______________________________________________
> >> >> >> > > Omaha Python Users Group mailing list
> >> >> >> > > Omaha at python.org
> >> >> >> > > https://mail.python.org/mailman/listinfo/omaha
> >> >> >> > > http://www.OmahaPython.org
> >> >> >> >
> >> >> >> _______________________________________________
> >> >> >> Omaha Python Users Group mailing list
> >> >> >> Omaha at python.org
> >> >> >> https://mail.python.org/mailman/listinfo/omaha
> >> >> >> http://www.OmahaPython.org
> >> >> >>
> >> >> >
> >> >> >
> >> >> _______________________________________________
> >> >> Omaha Python Users Group mailing list
> >> >> Omaha at python.org
> >> >> https://mail.python.org/mailman/listinfo/omaha
> >> >> http://www.OmahaPython.org
> >> >>
> >> >
> >> _______________________________________________
> >> Omaha Python Users Group mailing list
> >> Omaha at python.org
> >> https://mail.python.org/mailman/listinfo/omaha
> >> http://www.OmahaPython.org
> >>
> >
> >
> _______________________________________________
> Omaha Python Users Group mailing list
> Omaha at python.org
> https://mail.python.org/mailman/listinfo/omaha
> http://www.OmahaPython.org
>


More information about the Omaha mailing list