Summary grid

Jussi Piitulainen jussi.piitulainen at helsinki.fi
Wed Jun 22 11:43:19 EDT 2016


Jignesh Sutar writes:

> Say I have list of data as given in the example code below, I want to find
> all the unique categories (alphabetic letters) and unique IDs (numbers) and
> then produce a summary grid as manually entered in the "results". How could
> I code this?
>
> Many thanks in advance,
> Jignesh
>
>
> data= ["A.1", "A.2", "A.3", "B.1", "C.2", "C.3",  "D.4", "E.5", "E.6"]
>
> cols=[]
> rows=[]
> for item in data:
>     i=item.split(".")
>     if i[0] not in cols: cols.append(i[0])
>     if i[1] not in rows: rows.append(i[1])

Instead of indexing, consider:

    it, em = item.split(".")

(Except with names like col, row :)

> print cols
> print rows
>
> results=
> [["Row/Col", "A", "B", "C", "D", "E"],
> [1, 1, 1, 0, 0, 0],
> [2, 1, 0, 1, 0, 0],
> [3, 1, 0, 1, 0, 0],
> [4, 0, 0, 0, 1, 0],
> [5, 0, 0, 0, 0, 1],
> [6, 0, 0, 0, 0, 1]]

Once you have the column, row name, the key structure will be nested
loops. Each row is constructed by looping over the column names. The
table itself is constructed by looping over the row names.

The following counts the items with collections.Counter, which should be
good for you to know about, and extracts the names from the keys. (A key
cannot be a list but it can be a tuple; the original string would also
work.) Then it's efficient to find out how many times an item occurred.
You *can* do that by querying your original list, but that will be
slower if the list is long.

A Counter is a special kind of dict.

This program uses Python 3 (you are still on 2) and higher-level idiom
than yours, hides the crucial nested loop inside list comprehensions,
and doesn't include the names in the resulting table, but you should get
the idea and adapt.

from collections import Counter

contents = Counter(tuple(datum.split('.')) for datum in data)
colnames = sorted(set(k for k, r in contents.keys()))
rownames = sorted(set(r for k, r in contents.keys()))

table = [[contents[k, r] for k in colnames] for r in rownames]

for r, row in zip(rownames, table):
    print(r, row)



More information about the Python-list mailing list