[omaha] Getting data into pandas?

Travis Smith travis42 at gmail.com
Wed Apr 5 15:45:21 EDT 2017


Thanks guys, I'm going to read this over a few times. :)

Maybe next meeting I can show anyone interested in what I'm up to.  Bob, I
think you're right about the formatting--maybe I should think about it
differently.  I am used to seeing rows and feature columns in 2d as a rule,
with the features becoming different dimensions in the ML technique
(usually).  Maybe just multiple 2d DataFrames instead...

Travis

GPG Key: BFEB 7E65 04EB 184B A150 2E2C CC11 933F EE27 D86E

On Wed, Apr 5, 2017 at 9:59 AM, Bob Haffner via Omaha <omaha at python.org>
wrote:

> I thought this would be easier than cleaning messy data, but as it turns
> out, this is not the case--pandas has some pretty cryptic formatting
> requirements (at least in my mind).  I'm actually contemplating throwing my
> data into a spreadsheet or SQL db and *then* importing it.
> >>Yeah, Pandas makes certain things pretty simple, but other things can be
> a real pain.  Assuming your model is a scikit model, you're dealing with
> numpy arrays which should transform into a pandas series (or frame).  On a
> related note, here's a link to a blog post that talks about converting
> common python data structures into data frames
> http://pbpython.com/pandas-list-dict.html
>
> Specifically, I'm messing around with pandas.MultiIndex.  I want my data to
> be in several layers, which will represent dimensions when I throw it at
> sci-kit learn.  Or something.
> >>Not sure about this one.  I think most ML libs like tidy data (rows =
> observations, columns = features)
>
> Not sure I could help, but I'm happy to take a look.  Maybe post some data
> or perhaps we could do a screen share of some kind
>
> On Tue, Apr 4, 2017 at 1:24 PM, Travis Smith via Omaha <omaha at python.org>
> wrote:
>
> > Hey everyone,
> >
> > I'm having one of those programming challenges that makes me question my
> > sanity.
> >
> > I've used Pandas before, but always been some existing CSV or other data
> > format that I then go in and clean up so as to get down to business.
> > However, this time I'm doing a project where I created data from a model
> I
> > made, and I want to use pandas to analyze the results.
> >
> > I thought this would be easier than cleaning messy data, but as it turns
> > out, this is not the case--pandas has some pretty cryptic formatting
> > requirements (at least in my mind).  I'm actually contemplating throwing
> my
> > data into a spreadsheet or SQL db and *then* importing it.
> >
> > Is there a better way?  Is there some tool that helps shape or guide self
> > created data into pandas?  I'll keep reading the docs and trying to
> figure
> > out what it wants...
> >
> > Specifically, I'm messing around with pandas.MultiIndex.  I want my data
> to
> > be in several layers, which will represent dimensions when I throw it at
> > sci-kit learn.  Or something.
> >
> > Thanks guys,
> >
> > Travis
> >
> > GPG Key: BFEB 7E65 04EB 184B A150 2E2C CC11 933F EE27 D86E
> > _______________________________________________
> > Omaha Python Users Group mailing list
> > Omaha at python.org
> > https://mail.python.org/mailman/listinfo/omaha
> > http://www.OmahaPython.org
> >
> _______________________________________________
> Omaha Python Users Group mailing list
> Omaha at python.org
> https://mail.python.org/mailman/listinfo/omaha
> http://www.OmahaPython.org
>


More information about the Omaha mailing list