frequency analysis of a DB column

Paul McGuire ptmcg at austin.rr.com
Wed Aug 1 23:35:54 EDT 2007


On Aug 1, 9:21 pm, goldtech <goldt... at worldpost.com> wrote:
> In Python 2.1 are there any tools to take a column from a DB and do a
> frequency analysis - a breakdown of the values for this column?
>
> Possibly a histogram or a table saying out of 500 records I have one
> hundred and two "301" ninety-eight "212" values and three-hundred
> "410"?
>
> Is SQL the way to for this?
>
> Of course there'd be 1000's of values....
>
> I'm not asking about connecting to the DB, just tools to collate and
> tally this sort of thing.
>
> Thanks,
> Lee G.

If you use ORDER BY in your SQL to give you sorted data, you can
easily feed it to the itertools.groupby method, following the one-line
pattern in this example:

from itertools import groupby
from random import choice

# create some random data and sort it
cards = sorted([choice("ABCDEF") for i in range(100)])

# create histogram using groupby on sorted data
# histogram is represented as a list of pair tuples,
# containing (freq,value)
cardGrps = [(len(list(g[1])),g[0]) for g in groupby(cards)]

# sort histogram by descending freq to create Pareto
cardGrps.sort(reverse=True)

print cardGrps

Prints:
[(22, 'C'), (19, 'B'), (16, 'F'), (16, 'E'), (14, 'A'), (13, 'D')]


-- Paul




More information about the Python-list mailing list