[Tutor] toy program to find standard deviation of 2 columns of a sqlite3 database

Sun Jul 3 03:54:50 EDT 2022

Dear Sir,

I have chosen this standard deviation as an exercise because there are two
steps: first you have to find the mean, then subtract the mean from each
value of the column .

Writing an aggregate function for this using python's sqlite3 seems a
little difficult as there is only single step function inside the class,
used to make that . Kindly put some light .

This is just an exercise to understand how this create_aggregate() works .
Kindly help
Regards
Manprit Singh

On Sun, Jul 3, 2022 at 9:53 AM Manprit Singh <manpritsinghece at gmail.com>
wrote:

> Yes it is obviously a homework kind of thing ....I do create problems for
> myself , try to solve and  try to find better ways .
>
> I was trying to learn the use of create- aggregate() .
>
> Thank you for the hint that is given by you . Let me try .
>
> My purpose is not to find the std dev. It is actually to learn how to use
> the functions
>
>
> On Sun, 3 Jul, 2022, 09:28 , <avi.e.gross at gmail.com> wrote:
>
>> Maybe a dumb question but why the need to do a calculation of a standard
>> deviation so indirectly in SQL and in the database but starting from
>> Python?
>>
>> R has a built-in function that calculates a standard deviation. You can
>> easily save it where you want after.
>>
>> As for memory use in general, there are several ways to calculate a
>> standard
>> deviation but there is a tradeoff. You could read in an entry at a time
>> and
>> add it to a continuing sum while keeping track of the number of entries.
>> You
>> then calculate the mean. Then you can read it al in AGAIN and calculate
>> the
>> difference between each number and the mean, and do the rest of the
>> calculation by squaring that and so on as you sum that and finally play
>> with
>> a division and a square root.
>>
>> But that may not be needed except with large amounts of data.
>>
>> What am I missing? Is this an artificial HW situation?
>>
>>
>> -----Original Message-----
>> From: Tutor <tutor-bounces+avi.e.gross=gmail.com at python.org> On Behalf Of
>> Manprit Singh
>> Sent: Saturday, July 2, 2022 10:50 PM
>> To: tutor at python.org
>> Subject: [Tutor] toy program to find standard deviation of 2 columns of a
>> sqlite3 database
>>
>> Dear sir ,
>>
>> I have tried writing a program in which I am calculating the population
>> standard deviation of two columns X1  & X2 of a table of sqlite3  in -
>> memory database .
>> import sqlite3
>> import statistics
>>
>> class StdDev:
>>     def __init__(self):
>>         self.lst = []
>>
>>     def step(self, value):
>>         self.lst.append(value)
>>
>>     def finalize(self):
>>         return statistics.pstdev(self.lst)
>>
>>
>> con = sqlite3.connect(":memory:")
>> cur = con.cursor()
>> cur.execute("create table table1(X1 int, X2 int)") ls = [(2, 4),
>>       (3, 5),
>>       (4, 7),
>>       (5, 8)]
>> cur.executemany("insert into table1 values(?, ?)", ls)
>> con.commit()
>> con.create_aggregate("stddev", 1, StdDev) cur.execute("select stddev(X1),
>> stddev(X2) from table1")
>> print(cur.fetchone())
>> cur.close()
>> con.close()
>>
>> prints the output as :
>>
>> (1.118033988749895, 1.5811388300841898)
>>
>> which is correct .
>>
>> My question is, as you can see i have used list inside the class StdDev,
>> which
>>
>> I think is an inefficient way to do this kind of problem because there may
>> be
>>
>> a large number of values in a column and it can take a huge amount of
>> memory.
>>
>> Can this problem be solved with the use of iterators ? What would be the
>> best
>>
>> approach to do it ?
>>
>> Regards
>>
>> Manprit Singh
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>>