[SciPy-User] R vs Python for simple interactive data analysis
Skipper Seabold
jsseabold at gmail.com
Sun Aug 28 15:07:19 EDT 2011
On Sat, Aug 27, 2011 at 2:44 PM, Christopher Jordan-Squire
<cjordan1 at uw.edu> wrote:
> On Sat, Aug 27, 2011 at 2:27 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Sat, Aug 27, 2011 at 11:19 AM, Christopher Jordan-Squire
>> <cjordan1 at uw.edu> wrote:
>>> Hi--I've been a moderately heavy R user for the past two years, so
>>> about a month ago I took an (abbreviated) version of a simple data
>>> analysis I did in R and tried to rewrite as much of it as possible,
>>> line by line, into python using numpy and statsmodels. I didn't use
>>> pandas, and I can't comment on how much it might have simplified
>>> things.
>>>
>>> This comparison might be useful to some people, so I stuck it up on a
>>> github repo. My overall impression is that R is much stronger for
>>> interactive data analysis. Click on the link for more details why,
>>> which are summarized in the README file.
>>>
>>> https://github.com/chrisjordansquire/r_vs_py
>>>
>>> The code examples should run out of the box with no downloads (other
>>> than R, Python, numpy, scipy, and statsmodels) required.
>>
>> Thank you very much for doing that - it's a very useful exercise. I
>> hope we can make use of it to discuss how to get better, in the true
>
> Hopefully. I suppose I should also mention, for those that don't want
> to click on the link, that the two largest reasons R was much simpler
> to use were because it was easier to construct models and easier to
> view entries I'd stuck into matrices. R's graphing capabilities seemed
> slightly more friendly, but that might have just been my familiarity
> with them.
>
> (As an aside, numpy arrays' print method don't make them friendly for
> interactive viewing. Even ipython couldn't make a few of the matrices
> I made very intelligible, and it's easy to construct examples that
> make numpy arrays hideous to behold. For example,
>
> x = np.arange(5).reshape(5,1)
> y = np.ones(5).reshape(1,5)
> z = x*y
> z[0,0] += 0.0001
> print z
>
> [[ 1.00000000e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00
> 0.00000000e+00]
> [ 1.00000000e+00 1.00000000e+00 1.00000000e+00 1.00000000e+00
> 1.00000000e+00]
> [ 2.00000000e+00 2.00000000e+00 2.00000000e+00 2.00000000e+00
> 2.00000000e+00]
> [ 3.00000000e+00 3.00000000e+00 3.00000000e+00 3.00000000e+00
> 3.00000000e+00]
> [ 4.00000000e+00 4.00000000e+00 4.00000000e+00 4.00000000e+00
> 4.00000000e+00]]
>
My default
[~/statsmodels/]
[1]:
[~/statsmodels/]
[1]: x = np.arange(5).reshape(5,1)
[~/statsmodels/]
[2]: y = np.ones(5).reshape(1,5)
[~/statsmodels/]
[3]: z = x*y
[~/statsmodels/]
[4]: z[0,0] += 0.0001
[~/statsmodels/]
[5]: print z
[[ 0.0001 0. 0. 0. 0. ]
[ 1. 1. 1. 1. 1. ]
[ 2. 2. 2. 2. 2. ]
[ 3. 3. 3. 3. 3. ]
[ 4. 4. 4. 4. 4. ]]
[~/statsmodels/]
[6]: np.set_printoptions(suppress=False)
[~/statsmodels/]
[7]: print z
[[ 1.00000000e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00]
[ 1.00000000e+00 1.00000000e+00 1.00000000e+00 1.00000000e+00
1.00000000e+00]
[ 2.00000000e+00 2.00000000e+00 2.00000000e+00 2.00000000e+00
2.00000000e+00]
[ 3.00000000e+00 3.00000000e+00 3.00000000e+00 3.00000000e+00
3.00000000e+00]
[ 4.00000000e+00 4.00000000e+00 4.00000000e+00 4.00000000e+00
4.00000000e+00]]
Skipper
> (Strangely, it looks much more tolerable if x =
> np.arange(1,6).reshape(5,1) instead.)
>
> If you do the same thing in R,
>
> x = rep(0:4,5)
> x = matrix(x,ncol=5)
> x[1,1] = 0.000001
> x
>
> you get
>
> [,1] [,2] [,3] [,4] [,5]
> [1,] 1e-06 0 0 0 0
> [2,] 1e+00 1 1 1 1
> [3,] 2e+00 2 2 2 2
> [4,] 3e+00 3 3 3 3
> [5,] 4e+00 4 4 4 4
>
> much more readable.)
>
>
> As a simple metric, my .r file was about 1/2 the size of the .py file,
> even though I couldn't do everything in python that I could in R.
> (These commands were meant to be entered interactively, so the length
> of the length of the file is, perhaps, a more valid metric then usual
> to be concerned about.)
>
> -Chris Jordan-Squire
>
>
>> spirit of:
>>
>> Confront the Brutal Facts
>> http://en.wikipedia.org/wiki/Good_to_Great
>>
>> See you,
>>
>> Matthew
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list