[Pandas-dev] pandas or new project

Wes McKinney wesmckinn at gmail.com
Thu Sep 13 21:55:56 EDT 2018


hi David,

There's nothing really wrong with injecting a bunch of custom methods into
the DataFrame.* namespace. If you wanted, you could release your package as
like

import pandas_stata

and then the new methods would be available. This is pretty common in large
corporate environments that use pandas AFAICT. You can also propose your
changes in pull requests to pandas.

- Wes



On Thu, Sep 13, 2018 at 9:41 PM Tom Augspurger <tom.augspurger88 at gmail.com>
wrote:

> With respect to your `sdrop` and `skeep`,  that's the goal of
> DataFrame.filter, though the name isn't the best so it'll
> maybe be deprecated in favor of something better.
>
> The rest sound interesting, but likely out of scope for pandas. If you
> build an open source library then we'd be
> happy to include in pandas' ecosystem page:
> http://pandas.pydata.org/pandas-docs/stable/ecosystem.html
>
> Tom
>
>
> On Thu, Sep 13, 2018 at 7:58 PM David M Rashty <David.Rashty at flagstar.com>
> wrote:
>
>> Dear pandas team,
>>
>> I am a long time Stata user and I started using pandas about a year ago
>> in order to build web applications using an in memory dataframe structure.
>> As a business user, I’ve found Stata to have a key advantage over pandas
>> that many others have also noted: much faster development time.  Examples
>> in Stata:
>>
>>
>>
>> drop myvar*       // drops all columns starting with myvar
>>
>> keep myvar*       // drops all columns except those starting with myvar
>>
>> reg z y x               // runs the regression z = a+bx+cy + error
>>
>>
>>
>> In order to use pandas in a Stata-like fashion, I’ve had to monkey patch
>> large parts of the library e.g.,
>>
>>
>>
>> df = df.sdrop(‘myvar*’)     # same as above
>>
>> df = df.skeep(‘myvar*’)     # same as above
>>
>> df = df.sreg(‘z y x’)              # same as above
>>
>> df = df.squery(‘a>80 & b.str.contains(“hello”) & c.isin([1,2,3])’)   #
>> df.query doesn’t support str.contains and isin to my knowledge
>>
>>
>>
>> I put an “s” in front of my methods to mean either “stata” or “sugar”.
>>
>>
>>
>> Additionally, I’ve built a system to:
>>
>> a)      Automatically load new DataFrame methods into memory (no
>> additional imports required)
>>
>> b)      A caching system to make loading data blazing fast along with a
>> much tighter syntax e.g., pd.read_stata(‘mydata.dta’) (6 secs load time) vs
>> use.mydata (0.001 secs load time after the first read from file)
>>
>> c)      A system of column “labels” and formats to prettify various
>> reports e.g., df.sscatter(‘rate score’) produces a scatter plot with labels
>> “Interest Rate, %” and “Credit Score”, respectively.
>>
>> d)      A reactive web app (using Flask/Redis) to quickly view the full
>> DataFrame content in a browser:
>>
>>
>>
>> Basically, I’ve tried to eliminate any obvious advantages Stata has over
>> pandas.
>>
>>
>>
>> I’m potentially interested in developing this project into something
>> bigger.   Would you like me to share my work in the context of pandas or
>> should it be a completely separate project with a different scope?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> David Rashty | Flagstar Bank | Whole Loan Trading | 248-312-6692 |
>> david.rashty at flagstar.com
>>
>>
>> This e-mail may contain data that is confidential, proprietary or
>> non-public personal information, as that term is defined in the
>> Gramm-Leach-Bliley Act (collectively, Confidential Information). The
>> Confidential Information is disclosed conditioned upon your agreement that
>> you will treat it confidentially and in accordance with applicable law,
>> ensure that such data isn't used or disclosed except for the limited
>> purpose for which it's being provided and will notify and cooperate with us
>> regarding any requested or unauthorized disclosure or use of any
>> Confidential Information.
>> By accepting and reviewing the Confidential information, you agree to
>> indemnify us against any losses or expenses, including attorney's fees that
>> we may incur as a result of any unauthorized use or disclosure of this data
>> due to your acts or omissions. If a party other than the intended recipient
>> receives this e-mail, he or she is requested to instantly notify us of the
>> erroneous delivery and return to us all data so delivered.
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20180913/7b563c4b/attachment.html>


More information about the Pandas-dev mailing list