[Tutor] toy program to find standard deviation of 2 columns of a sqlite3 database

Dennis Lee Bieber wlfraed at ix.netcom.com
Sun Jul 3 16:41:59 EDT 2022


On Sun, 3 Jul 2022 22:29:41 +0530, Manprit Singh
<manpritsinghece at gmail.com> declaimed the following:

>Now as sum is an aggregate function, same way population standard
>deviation is also an aggregate function. We should be able to make a
>user defined function
>

	It is also superfluous: SQLite3 already has count(), sum() and even
avg() built-in (though it lacks many of the bigger statistical computations
-- variance, std. dev, covariance, correlation, linear regression -- that
many of the bigger client/server RDBMs support).

>
>for getting mean.I am not getting the mechanism to subtract the mean
>from each value of the column in the same step method or by any other
>way
>
	Note that the definition for creating aggregates includes something for
number of arguments. Figure out how to specify multiple arguments and you
might be able to have SQLite3 provide "current item" and "mean" (avg) to
the step() method. I'm not going to take the time to experiment (for the
most part, I'd consider it simpler to just grab the entire dataset from the
database, and run the number crunching in Python, rather than the overhead
of having SQLite3 invoke a Python "callback" method for each item, just to
be able to have the SQLite3 return a single computed value.


>Btw I would like to write a one liner to calculate population std
>deviation  of a list:
>lst = [2, 5, 7, 9, 10]
>mean = sum(lst)/len(lst)
>std_dev = (sum((ele-mean)**2 for ele in lst)/len(lst))**0.5
>print(std_dev)

	Literally, except for the imports, that is just...

print(statistics.pstdev(lst))

>>> import math as m
>>> import statistics as s
>>> lst = [2, 5, 7, 9, 10]
>>> print(s.pstdev(lst))
2.870540018881465
>>> 

	Going up a level in complexity (IE -- not using the imported pstdev())

>>> print("Population Std. Dev.: %s" % m.sqrt( s.mean( (ele - s.mean(lst)) ** 2 for ele in lst)))
Population Std. Dev.: 2.870540018881465
>>> 

	This has the problem  that it invokes mean(lst) for each element, so
may be slower for large data sets (that problem will also exist if you
manage a multi-argument step() for SQLite3).

	Anytime you have

		sum(equation-with-elements-of-data) / len(data)

you can replace it with just

		mean(equation...)


-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
	wlfraed at ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/



More information about the Tutor mailing list