[SciPy-User] R vs Python for simple interactive data analysis

Sun Aug 28 15:07:19 EDT 2011

On Sat, Aug 27, 2011 at 2:44 PM, Christopher Jordan-Squire
<cjordan1 at uw.edu> wrote:
> On Sat, Aug 27, 2011 at 2:27 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Sat, Aug 27, 2011 at 11:19 AM, Christopher Jordan-Squire
>> <cjordan1 at uw.edu> wrote:
>>> Hi--I've been a moderately heavy R user for the past two years, so
>>> about a month ago I took an (abbreviated) version of a simple data
>>> analysis I did in R and tried to rewrite as much of it as possible,
>>> line by line, into python using numpy and statsmodels. I didn't use
>>> pandas, and I can't comment on how much it might have simplified
>>> things.
>>>
>>> This comparison might be useful to some people, so I stuck it up on a
>>> github repo. My overall impression is that R is much stronger for
>>> interactive data analysis. Click on the link for more details why,
>>> which are summarized in the README file.
>>>
>>> https://github.com/chrisjordansquire/r_vs_py
>>>
>>> The code examples should run out of the box with no downloads (other
>>> than R, Python, numpy, scipy, and statsmodels) required.
>>
>> Thank you very much for doing that - it's a very useful exercise.  I
>> hope we can make use of it to discuss how to get better, in the true
>
> Hopefully. I suppose I should also mention, for those that don't want
> to click on the link, that the two largest reasons R was much simpler
> to use were because it was easier to construct models and easier to
> view entries I'd stuck into matrices. R's graphing capabilities seemed
> slightly more friendly, but that might have just been my familiarity
> with them.
>
> (As an aside, numpy arrays' print method don't make them friendly for
> interactive viewing. Even ipython couldn't make a few of the matrices
> I made very intelligible, and it's easy to construct examples that
> make numpy arrays hideous to behold. For example,
>
> x = np.arange(5).reshape(5,1)
> y = np.ones(5).reshape(1,5)
> z = x*y
> z[0,0] += 0.0001
> print z
>
> [[  1.00000000e-04   0.00000000e+00   0.00000000e+00   0.00000000e+00
>    0.00000000e+00]
>  [  1.00000000e+00   1.00000000e+00   1.00000000e+00   1.00000000e+00
>    1.00000000e+00]
>  [  2.00000000e+00   2.00000000e+00   2.00000000e+00   2.00000000e+00
>    2.00000000e+00]
>  [  3.00000000e+00   3.00000000e+00   3.00000000e+00   3.00000000e+00
>    3.00000000e+00]
>  [  4.00000000e+00   4.00000000e+00   4.00000000e+00   4.00000000e+00
>    4.00000000e+00]]
>

My default

[~/statsmodels/]
[1]:

[~/statsmodels/]
[1]: x = np.arange(5).reshape(5,1)

[~/statsmodels/]
[2]: y = np.ones(5).reshape(1,5)

[~/statsmodels/]
[3]: z = x*y

[~/statsmodels/]
[4]: z[0,0] += 0.0001

[~/statsmodels/]
[5]: print z
[[ 0.0001  0.      0.      0.      0.    ]
 [ 1.      1.      1.      1.      1.    ]
 [ 2.      2.      2.      2.      2.    ]
 [ 3.      3.      3.      3.      3.    ]
 [ 4.      4.      4.      4.      4.    ]]

[~/statsmodels/]
[6]: np.set_printoptions(suppress=False)

[~/statsmodels/]
[7]: print z
[[  1.00000000e-04   0.00000000e+00   0.00000000e+00   0.00000000e+00
    0.00000000e+00]
 [  1.00000000e+00   1.00000000e+00   1.00000000e+00   1.00000000e+00
    1.00000000e+00]
 [  2.00000000e+00   2.00000000e+00   2.00000000e+00   2.00000000e+00
    2.00000000e+00]
 [  3.00000000e+00   3.00000000e+00   3.00000000e+00   3.00000000e+00
    3.00000000e+00]
 [  4.00000000e+00   4.00000000e+00   4.00000000e+00   4.00000000e+00
    4.00000000e+00]]

Skipper

> (Strangely, it looks much more tolerable if x  =
> np.arange(1,6).reshape(5,1) instead.)
>
> If you do the same thing in R,
>
> x = rep(0:4,5)
> x = matrix(x,ncol=5)
> x[1,1] = 0.000001
> x
>
> you get
>
>      [,1] [,2] [,3] [,4] [,5]
> [1,] 1e-06    0    0    0    0
> [2,] 1e+00    1    1    1    1
> [3,] 2e+00    2    2    2    2
> [4,] 3e+00    3    3    3    3
> [5,] 4e+00    4    4    4    4
>
> much more readable.)
>
>
> As a simple metric, my .r file was about 1/2 the size of the .py file,
> even though I couldn't do everything in python that I could in R.
> (These commands were meant to be entered interactively, so the length
> of the length of the file is, perhaps, a more valid metric then usual
> to be concerned about.)
>
> -Chris Jordan-Squire
>
>
>> spirit of:
>>
>> Confront the Brutal Facts
>> http://en.wikipedia.org/wiki/Good_to_Great
>>
>> See you,
>>
>> Matthew
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>