frequency analysis of a DB column

John Machin sjmachin at lexicon.net
Thu Aug 2 20:01:34 EDT 2007


On Aug 2, 12:21 pm, goldtech <goldt... at worldpost.com> wrote:
> In Python 2.1 are there any tools to take a column from a DB and do a
> frequency analysis - a breakdown of the values for this column?
>
> Possibly a histogram or a table saying out of 500 records I have one
> hundred and two "301" ninety-eight "212" values and three-hundred
> "410"?
>
> Is SQL the way to for this?

Umm yes, screamingly obviously, I would have thought, given that you
have to use SQL anyway to get the data out of the database ....
something like this, with easy embellishment with where, having and
order by clauses:

select col_name, count(*)
from table_name
where yadda yadda etc if necessary/desired
group by col_name
having count(*) >= some_threshold
order by count(*) desc

Apart from ease of writing, this puts the load on the server, not the
client, and saves network traffic -- why transmit thousands/millions
of data items around the building or across town when you don't have
to? All you have to do in Python is add up the frequencies -- better
than doing a "select count(*) from table_name".

If you want to do something smart in Python, package the above SQL up
so that table_name and col_name are parameters ... you could then have
a handy little command-line gadget ready for those moments when you
need to find out things like how many different mis-spellings of
"developer" there are in the HR database :-)

>
> Of course there'd be 1000's of values....
>
> I'm not asking about connecting to the DB, just tools to collate and
> tally this sort of thing.
>
> Thanks,
> Lee G.





More information about the Python-list mailing list