[AstroPy] AstroPy Digest, Vol 81, Issue 12

Éric Depagne eric at depagne.org
Thu Jun 20 13:47:24 EDT 2013


Just a quick question:

Since the discussion is about what pandas, wouldn't it be possible to add in 
cc some pandas devs, to let them know?

Éric.
> On Thu, Jun 20, 2013 at 12:50 PM, Chris Beaumont <beaumont at hawaii.edu>wrote:
> > I thought I'd chime in on the pandas discussion :)
> > 
> > I'm starting to use pandas a bit more in my day-to-day work. The two
> > features most useful to me are:
> > 
> > 1) Its file parsers are pretty robust and fast. I always try parsing CSV
> > with pandas first
> 
> I've wondered how hard it would be to incorporate some of the pandas CSV
> fast reading functions for the easy cases.  I'm assuming it is licensed so
> that would be an option.
> 
> > 2) For tables tables with lots of categorical data, the grouping
> > functionality is very nice. For example, calculations like "the mean age
> > of each spectral type of star in my catalog" are usually one liners like
> > df.groupby(['spectral_type']).age.mean. I spend a lot of time on the
> > "split-apply-combine" page on the pandas docs (
> > http://pandas.pydata.org/pandas-docs/stable/groupby.html).
> 
> Group-by and related functionality is top on my list of priorities for
> astropy.table (in fact I see it every day on my google keep app...).  Join
> and merging are in master now.  In my tests the astropy table join is
> within a factor of 2 to 3 in speed relative to pandas, so in most use cases
> it should be good enough.
> 
> It's probably worth pointing out to the community that it was not a
> lightly-taken decision to reject pandas for use as the base data storage
> container.  For the case of tables there is one show-stopper which is that
> pandas DataFrame does not support arbitrary multi-dimensional columns, i.e.
> column where each element is itself an N-d array.  These occur enough in
> astronomy and are supported by FITS and VO standards, so the astropy Table
> must be able to represent that.  The lack of support for table and column
> metadata is a smaller but still important issue.
> 
> Having said that, there is no question pandas has a ton of highly-efficient
> and useful machinery and we are working on ways to improve
> inter-operability.  This includes being able convert between Table and
> DataFrame easily.  Suggestions and (especially) pull requests welcome.
> 
> > I won't speculate about whether that's enough an asset to warrant a
> > dependency in astropy. I do agree that lots of other pandas features
> > don't translate as well into astronomy use.
> > 
> > On Thu, Jun 20, 2013 at 12:34 PM, Erik Tollerud 
<erik.tollerud at gmail.com>wrote:
> >> I'm of mixed minds about traits UI because once you know it you can make
> >> great GUIs with it, but I've spent a lot of time troubleshooting
> >> people's python installations to get traits to work.  That is, in
> >> general it can be tricky to get installed because of all the
> >> dependencies.  Maybe this has improved recently with Enthought's Canopy
> >> (or other new python distros), but that's been my past experience.
> >> 
> >> More generally, the view in the astropy core package is that we don't
> >> want to put GUIs in the core because GUIs always carry lots of
> >> dependencies, which we don't want to be forced to deal with.  But part
> >> of the whole reason for affiliated packages was to get around this, so
> >> we're happy to see GUI-based affiliated packages.
> >> 
> >> 
> >> As for Pandas, to be totally honest, I don't see a huge amount to be
> >> gained from adding a Pandas dependency Astropy.  It's honestly not clear
> >> what it gives the astronomy community that numpy does not already have.
> >> 
> >>  The following quote from the Pandas web site has guided me to that
> >> 
> >> conclusion: "*pandas* helps fill this gap, enabling you to carry out
> >> your entire data analysis workflow in Python without having to switch to
> >> a more domain specific language like R."
> >> 
> >> I have been carrying out my entire data analysis workflow for some time
> >> now in python without using Pandas.  It looks to me like Pandas is a
> >> tool that was written by and for statisticians who use R.  While we can
> >> take lessons from this, it's not clear we get much out of it in an
> >> astronomy context. For example, how would it make astropy's NDData,
> >> Quantity, or Table better to use a Pandas DataFrame vs. a numpy array?
> >> Most of what we are doing is building astronomy-convenient interfaces,
> >> and I'm not sure what Pandas adds there, at the cost of a pretty
> >> heavy-weight dependency.
> >> 
> >> It could just be that I don't know enough about Pandas, though.  So if
> >> someone who knows Pandas better can speak to this, I'm all ears.
> >> 
> >> 
> >> 
> >> 
> >> On Tue, Jun 18, 2013 at 3:35 PM, Thøger Rivera-Thorsen
> >> <trive at astro.su.se
> >> 
> >> > wrote:
> >>>  Pandas is a part of the newly-defined SciPy stack, after all, so that
> >>> 
> >>> would be part of any science-oriented distribution worth its salt. In
> >>> fact, I think it could be a good idea for astropy in general to use
> >>> under the hood, but again, could clash with the philosophy of the
> >>> project and possibly also maintainabillity.
> >>> 
> >>> As for offering my code or just my experience, I'll have to square it
> >>> with my supervisor first, and I also think it depends on what direction
> >>> the project in question will take. I'm positive about the idea (which
> >>> is why I wrote in the first place), but supervisor might think it is a
> >>> better idea to actually get my paper in the project wrapped up before
> >>> sending the code out there. Will get back about that one!
> >>> 
> >>> /Emil
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> On 2013-06-18 20:53, Slavin, Jonathan wrote:
> >>> 
> >>> Hi Emil,
> >>> 
> >>>  That looks very nice!  I don't see Pandas as a big issue in terms of
> >>> 
> >>> dependencies.  I don't know that much about traits, etc.  My thought
> >>> about the gui was just based on my experience with matplotlib, and the
> >>> fact that it is widely used -- though I would agree that too many
> >>> dependencies can be a deterrent to people using something.  Are you
> >>> offering your code as a starting point for the project?  It strikes me
> >>> that many have gotten some sort of fitting package to a point of
> >>> personal usability but no one has the time/interest/motivation to make
> >>> a more generally usable package.
> >>> 
> >>>  Jon
> >>>  
> >>>  On Tue, Jun 18, 2013 at 2:34 PM, <astropy-request at scipy.org> wrote:
> >>>> Date: Tue, 18 Jun 2013 20:39:55 +0200
> >>>> From: Th?ger Rivera-Thorsen <thoger.emil at gmail.com>
> >>>> Subject: Re: [AstroPy] ESA Summer of Code in Space 2013
> >>>> To: astropy at scipy.org
> >>>> Message-ID: <51C0A97B.8090703 at gmail.com>
> >>>> Content-Type: text/plain; charset="iso-8859-1"
> >>>> 
> >>>> I have been working on a fitting GUI for a while, although it is made
> >>>> with a specific task in mind.
> >>>> However, it is not based on Matplotlib but on Traits/Traitsui/Chaco
> >>>> and Pandas. It is made for a specific projhect I'm working and as
> >>>> such not yet usable for more general cases, but it could be a
> >>>> starting point, if the dependencies don't conflict with astropy
> >>>> politics.
> >>>> 
> >>>> Especially, I am happy about the choice of Pandas for managing a quite
> >>>> complex data structure (the fitted and/or guessed values of an
> >>>> arbitrary number of transitions for an arbitrary number of rows or
> >>>> collapsed rows of a spatially resolved spectrum) of a), but also with
> >>>> the Traits-based interactive interface to build complex line profiles
> >>>> from single gaussians, good for fitting-by-eye and giving good
> >>>> initial guesses for fitting of complex line profiles. It hooks
> >>>> directly up to a wrapper I've made for lmfit, but given the
> >>>> modularity, it should be relatively easy to change to other backends.
> >>>> 
> >>>> It's still a work-in-progress, but there are some screenshots here:
> >>>> http://flic.kr/s/aHsjGaEMGg .
> >>>> I know the choice and number of dependencies may be prohibitive but it
> >>>> saved a lot of work on the GUI, and Pandas means the difference
> >>>> between sanity and madness when it comes to keeping track of so many
> >>>> parameters.
> >>>> 
> >>>> Cheers,
> >>>> Emil
> >>>> 
> >>>  ________________________________________________________
> >>> 
> >>> Jonathan D. Slavin                 Harvard-Smithsonian CfA
> >>> jslavin at cfa.harvard.edu       60 Garden Street, MS 83
> >>> phone: (617) 496-7981       Cambridge, MA 02138-1516
> >>> fax: (617) 496-7577            USA
> >>> ________________________________________________________
> >>> 
> >>> 
> >>> 
> >>> _______________________________________________
> >>> AstroPy mailing
> >>> listAstroPy at scipy.orghttp://mail.scipy.org/mailman/listinfo/astropy
> >>> 
> >>> 
> >>> 
> >>> _______________________________________________
> >>> AstroPy mailing list
> >>> AstroPy at scipy.org
> >>> http://mail.scipy.org/mailman/listinfo/astropy
> >> 
> >> --
> >> Erik
> >> 
> >> _______________________________________________
> >> AstroPy mailing list
> >> AstroPy at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/astropy
> > 
> > --
> > ************************************
> > Chris Beaumont
> > Graduate Student
> > Institute for Astronomy
> > University of Hawaii at Manoa
> > 2680 Woodlawn Drive
> > Honolulu, HI 96822
> > www.ifa.hawaii.edu/~beaumont
> > ************************************
> > 
> > _______________________________________________
> > AstroPy mailing list
> > AstroPy at scipy.org
> > http://mail.scipy.org/mailman/listinfo/astropy
Un clavier azerty en vaut deux
----------------------------------------------------------
Éric Depagne                            eric at depagne.org



More information about the AstroPy mailing list