[Pandas-dev] [EXTERNAL] Re: pandas or new project

David M Rashty David.Rashty at flagstar.com
Sat Jan 19 18:32:55 EST 2019


Tom/Wes,
Here’s the open source project I started:
https://github.com/pandichef/sugarbears

It’s not quite ripe for the pandas ecosystem page, but I wanted to share what I’ve been working on and get your thoughts on the idea before I go far down the rabbit hole.

At a high level, the goal is to wrap pandas in a way to enable comparable development speed to Stata or even MS Excel.

Thanks!
Dave


From: Wes McKinney [mailto:wesmckinn at gmail.com]
Sent: Thursday, September 13, 2018 9:56 PM
To: Tom Augspurger <tom.augspurger88 at gmail.com>
Cc: David M Rashty <David.Rashty at flagstar.com>; pandas-dev at python.org
Subject: [EXTERNAL] Re: [Pandas-dev] pandas or new project

Flagstar Security Warning: External Email. Please make sure you trust this source before clicking links or opening attachments.
hi David,

There's nothing really wrong with injecting a bunch of custom methods into the DataFrame.* namespace. If you wanted, you could release your package as like

import pandas_stata

and then the new methods would be available. This is pretty common in large corporate environments that use pandas AFAICT. You can also propose your changes in pull requests to pandas.

- Wes



On Thu, Sep 13, 2018 at 9:41 PM Tom Augspurger <tom.augspurger88 at gmail.com<mailto:tom.augspurger88 at gmail.com>> wrote:
With respect to your `sdrop` and `skeep`,  that's the goal of DataFrame.filter, though the name isn't the best so it'll
maybe be deprecated in favor of something better.

The rest sound interesting, but likely out of scope for pandas. If you build an open source library then we'd be
happy to include in pandas' ecosystem page: http://pandas.pydata.org/pandas-docs/stable/ecosystem.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__pandas.pydata.org_pandas-2Ddocs_stable_ecosystem.html&d=DwMFaQ&c=6071WI5hme3qubAgsPInwSFFJUptGl1Ret_NIv4f0FM&r=IInR9ts5zJa2y9TCv1xkCBiNMNvWYuB88s6FL4QdKPQ&m=Yh52B0HOnjdaEtHlGjuSmivYPHIGG_RYsuh0b-93ELY&s=381O1pJzOg_Mvrmgl5CKUUTR9CSFh1VXi5zX4w33Kbc&e=>

Tom


On Thu, Sep 13, 2018 at 7:58 PM David M Rashty <David.Rashty at flagstar.com<mailto:David.Rashty at flagstar.com>> wrote:
Dear pandas team,
I am a long time Stata user and I started using pandas about a year ago in order to build web applications using an in memory dataframe structure.  As a business user, I’ve found Stata to have a key advantage over pandas that many others have also noted: much faster development time.  Examples in Stata:

drop myvar*       // drops all columns starting with myvar
keep myvar*       // drops all columns except those starting with myvar
reg z y x               // runs the regression z = a+bx+cy + error

In order to use pandas in a Stata-like fashion, I’ve had to monkey patch large parts of the library e.g.,

df = df.sdrop(‘myvar*’)     # same as above
df = df.skeep(‘myvar*’)     # same as above
df = df.sreg(‘z y x’)              # same as above
df = df.squery(‘a>80 & b.str.contains(“hello”) & c.isin([1,2,3])’)   # df.query doesn’t support str.contains and isin to my knowledge

I put an “s” in front of my methods to mean either “stata” or “sugar”.

Additionally, I’ve built a system to:

a)      Automatically load new DataFrame methods into memory (no additional imports required)

b)      A caching system to make loading data blazing fast along with a much tighter syntax e.g., pd.read_stata(‘mydata.dta’) (6 secs load time) vs use.mydata (0.001 secs load time after the first read from file)

c)      A system of column “labels” and formats to prettify various reports e.g., df.sscatter(‘rate score’) produces a scatter plot with labels “Interest Rate, %” and “Credit Score”, respectively.

d)      A reactive web app (using Flask/Redis) to quickly view the full DataFrame content in a browser:

Basically, I’ve tried to eliminate any obvious advantages Stata has over pandas.

I’m potentially interested in developing this project into something bigger.   Would you like me to share my work in the context of pandas or should it be a completely separate project with a different scope?

Thanks,

David Rashty | Flagstar Bank | Whole Loan Trading | 248-312-6692 | david.rashty at flagstar.com<mailto:david.rashty at flagstar.com>

This e-mail may contain data that is confidential, proprietary or non-public personal information, as that term is defined in the Gramm-Leach-Bliley Act (collectively, Confidential Information). The Confidential Information is disclosed conditioned upon your agreement that you will treat it confidentially and in accordance with applicable law, ensure that such data isn't used or disclosed except for the limited purpose for which it's being provided and will notify and cooperate with us regarding any requested or unauthorized disclosure or use of any Confidential Information.
By accepting and reviewing the Confidential information, you agree to indemnify us against any losses or expenses, including attorney's fees that we may incur as a result of any unauthorized use or disclosure of this data due to your acts or omissions. If a party other than the intended recipient receives this e-mail, he or she is requested to instantly notify us of the erroneous delivery and return to us all data so delivered.
_______________________________________________
Pandas-dev mailing list
Pandas-dev at python.org<mailto:Pandas-dev at python.org>
https://mail.python.org/mailman/listinfo/pandas-dev<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_pandas-2Ddev&d=DwMFaQ&c=6071WI5hme3qubAgsPInwSFFJUptGl1Ret_NIv4f0FM&r=IInR9ts5zJa2y9TCv1xkCBiNMNvWYuB88s6FL4QdKPQ&m=Yh52B0HOnjdaEtHlGjuSmivYPHIGG_RYsuh0b-93ELY&s=bLEIk941oO-TPAw9RBlbPeNXj8CTho6oZ91eR_Q9jyI&e=>
_______________________________________________
Pandas-dev mailing list
Pandas-dev at python.org<mailto:Pandas-dev at python.org>
https://mail.python.org/mailman/listinfo/pandas-dev<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_pandas-2Ddev&d=DwMFaQ&c=6071WI5hme3qubAgsPInwSFFJUptGl1Ret_NIv4f0FM&r=IInR9ts5zJa2y9TCv1xkCBiNMNvWYuB88s6FL4QdKPQ&m=Yh52B0HOnjdaEtHlGjuSmivYPHIGG_RYsuh0b-93ELY&s=bLEIk941oO-TPAw9RBlbPeNXj8CTho6oZ91eR_Q9jyI&e=>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190119/3f1cf4c7/attachment-0001.html>


More information about the Pandas-dev mailing list