Pandas or Numpy

Sun Jan 23 12:21:26 EST 2022

On Sun, 23 Jan 2022 07:34:26 -0800, Tobiah <toby at tobiah.org> declaimed the
following:

	I'm going to do a little rearranging of your paragraphs, since most of
them are domain specific, whereas the last (original) paragraph actually
gets to a core...

	Caveat: I've not written anything making use of either package, so my
only basis for commenting is what I've read on web sites (like the pandas
documentation site)

>
>It seems like both libraries are possible choices.  Would one
>be the obvious choice for me?
>
	pandas USES numpy internally but expands on it...

https://www.geeksforgeeks.org/difference-between-pandas-vs-numpy/
"""
A numpy array is a grid of values (of the same type) that are indexed by a
tuple of positive integers,
"""
Pandas provide high performance, fast, easy to use data structures and data
analysis tools for manipulating numeric data and time series.
"""

	Pandas, I believe, might get closer to what one might find in
statistical packages (like R) in that it supports tables/data-frames in
which each column may be a different data type. I don't know if it actually
has the statistics concepts of "factors" (eg: a column containing
"male"/"female" is not really a text column but closer to an enumeration
type).

>I need to compose large (hundreds, thousands, maybe millions) lists
>and be able to do math on, or possibly sort by various columns, among other
>operations.  A common requirement would be to do the same math operation
>on each value in a column, or redistribute the values according to an
>exponential curve, etc.

	En-mass operations should be supported; not sure about the
"redistribute" -- if you can define a function that takes one input
parameter (the existing value) and returns the redistributed value, I'd
think it should be feasible.

>
>One wrinkle is that the first column of a Csound score is actually a
>single character.  I was thinking if the data types all had to be the
>same, then I'd make a translation table or just use the ascii value
>of the character, but if I could mix types that might be a smidge better.
>

	Based upon the comparison I linked, pandas should be applicable for
this. For pure numpy, you'd likely be better off maintaining a separate
list (though sorting will require some tricks to keep the numpy array in
sync with the character list).

	Note that the comparison warns that /indexing/ in pandas can be slow.
If your manipulation is always "apply operationX to columnY" it should be
okay -- but "apply operationX to the nth row of columnY", and repeat for
other rows, is going to be slow.

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
	wlfraed at ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/