[omaha] Getting data into pandas?
Travis Smith
travis42 at gmail.com
Wed Apr 5 15:45:21 EDT 2017
Thanks guys, I'm going to read this over a few times. :)
Maybe next meeting I can show anyone interested in what I'm up to. Bob, I
think you're right about the formatting--maybe I should think about it
differently. I am used to seeing rows and feature columns in 2d as a rule,
with the features becoming different dimensions in the ML technique
(usually). Maybe just multiple 2d DataFrames instead...
Travis
GPG Key: BFEB 7E65 04EB 184B A150 2E2C CC11 933F EE27 D86E
On Wed, Apr 5, 2017 at 9:59 AM, Bob Haffner via Omaha <omaha at python.org>
wrote:
> I thought this would be easier than cleaning messy data, but as it turns
> out, this is not the case--pandas has some pretty cryptic formatting
> requirements (at least in my mind). I'm actually contemplating throwing my
> data into a spreadsheet or SQL db and *then* importing it.
> >>Yeah, Pandas makes certain things pretty simple, but other things can be
> a real pain. Assuming your model is a scikit model, you're dealing with
> numpy arrays which should transform into a pandas series (or frame). On a
> related note, here's a link to a blog post that talks about converting
> common python data structures into data frames
> http://pbpython.com/pandas-list-dict.html
>
> Specifically, I'm messing around with pandas.MultiIndex. I want my data to
> be in several layers, which will represent dimensions when I throw it at
> sci-kit learn. Or something.
> >>Not sure about this one. I think most ML libs like tidy data (rows =
> observations, columns = features)
>
> Not sure I could help, but I'm happy to take a look. Maybe post some data
> or perhaps we could do a screen share of some kind
>
> On Tue, Apr 4, 2017 at 1:24 PM, Travis Smith via Omaha <omaha at python.org>
> wrote:
>
> > Hey everyone,
> >
> > I'm having one of those programming challenges that makes me question my
> > sanity.
> >
> > I've used Pandas before, but always been some existing CSV or other data
> > format that I then go in and clean up so as to get down to business.
> > However, this time I'm doing a project where I created data from a model
> I
> > made, and I want to use pandas to analyze the results.
> >
> > I thought this would be easier than cleaning messy data, but as it turns
> > out, this is not the case--pandas has some pretty cryptic formatting
> > requirements (at least in my mind). I'm actually contemplating throwing
> my
> > data into a spreadsheet or SQL db and *then* importing it.
> >
> > Is there a better way? Is there some tool that helps shape or guide self
> > created data into pandas? I'll keep reading the docs and trying to
> figure
> > out what it wants...
> >
> > Specifically, I'm messing around with pandas.MultiIndex. I want my data
> to
> > be in several layers, which will represent dimensions when I throw it at
> > sci-kit learn. Or something.
> >
> > Thanks guys,
> >
> > Travis
> >
> > GPG Key: BFEB 7E65 04EB 184B A150 2E2C CC11 933F EE27 D86E
> > _______________________________________________
> > Omaha Python Users Group mailing list
> > Omaha at python.org
> > https://mail.python.org/mailman/listinfo/omaha
> > http://www.OmahaPython.org
> >
> _______________________________________________
> Omaha Python Users Group mailing list
> Omaha at python.org
> https://mail.python.org/mailman/listinfo/omaha
> http://www.OmahaPython.org
>
More information about the Omaha
mailing list