[CentralOH] Summarizing Data From Excel Spreadsheet

pybokeh pybokeh at gmail.com
Tue Nov 3 20:06:56 EST 2015


Didn't realize David posted at the meetup site and he may never know about
the mailing list, so I emailed him also.

Below is my email to him just in case someone else may find useful info out
of it or perhaps disagrees with me which is fine :-)

David wrote:
"Not sure if this would be the right forum but I'll give it a shot. I'm
looking for a program that will take a csv or Excel file and profile the
data in it: # of rows, list of column names, worksheet tab (if Excel), max
and min in a column, whether a column has all unique values or a histogram
of the values (value and count) when there are less than <x> unique values
(parameter input into program), whether a column is all numeric, all alpha
or alpha numeric, if numeric then average and median values. Thanks!
daddav at gmail.com If this isn't the right forum, I apologize in advance."

Hi David,
Saw your post on the meetup site.  I replied there but wasn't sure if you
would get notified of the reply so I thought I would go ahead and email you
too.

I would look into using the pandas library for data manipulation or data
analysis.  It's documentation sort of has a quick start guide at
http://pandas.pydata.org/pandas-docs/stable/10min.html
Also note the table of contents over on the left, there is a tutorial and
cookbook also.

You can also do some basic visualizations with it which uses MATPLOTLIB
behind the scenes:
http://pandas.pydata.org/pandas-docs/stable/visualization.html

If you want do some serious plotting with MATPLOTLIB, then I would
recommend this tutorial versus the one at their official web site:
http://nbviewer.ipython.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-4-Matplotlib.ipynb

Once you get past the basics of pandas, perhaps my pandas cheat sheet would
be of some use:
http://nbviewer.jupyter.org/github/pybokeh/jupyter_notebooks/blob/master/pandas/PandasCheatSheet.ipynb

If you have a strong statistics background, you may prefer the seaborn
visualization library:
http://stanford.edu/~mwaskom/software/seaborn/

Hope this helps!

Regards,
Daniel

On Tue, Nov 3, 2015 at 7:35 PM, <jep200404 at columbus.rr.com> wrote:

> David commented[1]:
>
> > I'm looking for a program that will ...
>
> I don't know if a program that does everything you want
> already exists. One can make a Python program to do it.
>
> > ... take a csv or Excel file ...
>
> Python's csv module groks csv files and
> csv-ish Excel files with aplomb.
>
> > ... and profile the data in it:
> > # of rows, ...
>
> easy
>
> > ... list of column names, ...
>
> easy _if_ always done the same way
>
> > ... worksheet tab (if Excel), ...
>
> I don't know what that is.
>
> > ... max and min in a column, ...
>
> easy
> might also play with pandas library
>
> > ... whether a column has all unique values ...
>
> easy with sets and len()
>
> > ... or a histogram of the values (value and count) when there
> > are less than <x> unique values (parameter input into program), ...
>
> not hard if you want text histogram
> something ala sort | uniq -c | sort -n can be done in Python
> if you need graphics histogram,
> matplotlib and maybe ipython notebook/jupyter
>
> > whether a column is all numeric, all alpha or alpha numeric,
> > if numeric then average and median values.
>
> not hard
>
> > Thanks! If this isn't the right forum, I apologize in advance.
>
> The meetup web page and meetup mailing list are mediocre
> for technical discussions. Good forums for technical discussions are:
>     The weekly Python lunches.
>     The weekly dojos.
>     COhPy's mailing list[3]
>         (distinctly different from the meetup mailing list)
>     cohpy monthly meetings and after meetings
>
>     Also, Stauf's Coffee Roasters in Grandview around 8:30 to 9:15[2]
>     esp. weekdays
>
>     Also visit cohpy.org.
>
>     Someone might need to let David know that there is a response
>     on cohpy's mailing list to his meetup comment.
>
> [1]
> http://www.meetup.com/Central-Ohio-Python-Users-Group/events/226471478/
>
> [2] Stauf's Coffee Roasters 1277 Grandview Ave
>     http://lists.colug.net/pipermail/colug-432/2015-February/003564.html
>     http://lists.colug.net/pipermail/colug-432/2015-May/003814.html
>     You never know who is going to show up.
>     Doughnuts from DK Diner and
>     anything from Thurn's are great lubrication for discussion.
>
> [3] cohpy's (technical) mailing list
>     https://mail.python.org/mailman/listinfo/centraloh
>     To get good answers, consider following the advice in the links below.
>     http://catb.org/~esr/faqs/smart-questions.html
>
> http://web.archive.org/web/20090627155454/www.greenend.org.uk/rjk/2000/06/14/quoting.html
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> https://mail.python.org/mailman/listinfo/centraloh
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/centraloh/attachments/20151103/ca4e4405/attachment.html>


More information about the CentralOH mailing list