[Tutor] toy program to find standard deviation of 2 columns of a sqlite3 database

Manprit Singh manpritsinghece at gmail.com
Sun Jul 3 00:23:05 EDT 2022


Yes it is obviously a homework kind of thing ....I do create problems for
myself , try to solve and  try to find better ways .

I was trying to learn the use of create- aggregate() .

Thank you for the hint that is given by you . Let me try .

My purpose is not to find the std dev. It is actually to learn how to use
the functions


On Sun, 3 Jul, 2022, 09:28 , <avi.e.gross at gmail.com> wrote:

> Maybe a dumb question but why the need to do a calculation of a standard
> deviation so indirectly in SQL and in the database but starting from
> Python?
>
> R has a built-in function that calculates a standard deviation. You can
> easily save it where you want after.
>
> As for memory use in general, there are several ways to calculate a
> standard
> deviation but there is a tradeoff. You could read in an entry at a time and
> add it to a continuing sum while keeping track of the number of entries.
> You
> then calculate the mean. Then you can read it al in AGAIN and calculate the
> difference between each number and the mean, and do the rest of the
> calculation by squaring that and so on as you sum that and finally play
> with
> a division and a square root.
>
> But that may not be needed except with large amounts of data.
>
> What am I missing? Is this an artificial HW situation?
>
>
> -----Original Message-----
> From: Tutor <tutor-bounces+avi.e.gross=gmail.com at python.org> On Behalf Of
> Manprit Singh
> Sent: Saturday, July 2, 2022 10:50 PM
> To: tutor at python.org
> Subject: [Tutor] toy program to find standard deviation of 2 columns of a
> sqlite3 database
>
> Dear sir ,
>
> I have tried writing a program in which I am calculating the population
> standard deviation of two columns X1  & X2 of a table of sqlite3  in -
> memory database .
> import sqlite3
> import statistics
>
> class StdDev:
>     def __init__(self):
>         self.lst = []
>
>     def step(self, value):
>         self.lst.append(value)
>
>     def finalize(self):
>         return statistics.pstdev(self.lst)
>
>
> con = sqlite3.connect(":memory:")
> cur = con.cursor()
> cur.execute("create table table1(X1 int, X2 int)") ls = [(2, 4),
>       (3, 5),
>       (4, 7),
>       (5, 8)]
> cur.executemany("insert into table1 values(?, ?)", ls)
> con.commit()
> con.create_aggregate("stddev", 1, StdDev) cur.execute("select stddev(X1),
> stddev(X2) from table1")
> print(cur.fetchone())
> cur.close()
> con.close()
>
> prints the output as :
>
> (1.118033988749895, 1.5811388300841898)
>
> which is correct .
>
> My question is, as you can see i have used list inside the class StdDev,
> which
>
> I think is an inefficient way to do this kind of problem because there may
> be
>
> a large number of values in a column and it can take a huge amount of
> memory.
>
> Can this problem be solved with the use of iterators ? What would be the
> best
>
> approach to do it ?
>
> Regards
>
> Manprit Singh
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
>


More information about the Tutor mailing list