[Tutor] updating a dictionary
Peter Otten
__peter__ at web.de
Fri Feb 20 11:59:36 CET 2015
Chris Stinemetz wrote:
> Here is a sample of the input data, it is tab delimited and I chopped it
> down for example purposes:
>
>
> KSL03502_7A_1 11.5921
> KSL03502_7B_1 46.4997
> KSL03502_7C_1 13.5839
> KSL03505_7A_1 12.8684
> KSL03505_7B_1 16.5311
> KSL03505_7C_1 18.9926
> KSL03509_7A_1 3.4104
> KSL03509_7B_1 40.6244
> KSL03509_7C_1 51.0597
> KSL03511_7A_1 7.128
> KSL03511_7B_1 53.4401
> KSL03511_7C_1 66.2584
> KSL03514_2A_1 25.6476
> KSL03514_2B_1 53.17
> KSL03514_2C_1 11.6469
> KSL03514_7A_1 39.2292
> KSL03514_7B_1 65.675
> KSL03514_7C_1 3.4937
>
>
> I would like to parse it buy using a dictionary structure. Where each row
> would be something like:
>
> name 7,8,9,2
> KSL03514_C,3.4937,,,11.6469
> KSL03514_B,65.675,,,53.17
>
> I am just showing an example of what KSL03514_7C_1, KSL03514_2C_1,
> KSL03514_7B_1, KSL03514_2B_1 would parse.
>
> Hope this helps explain what I am trying to accomplish.
You need to merge multiple lines into one row dict and you'll end up with
multiple such rowdicts. The easiest way to keep them around is to put them
into an outer dict that maps keys like "KSL03514_B" to the corresponding
rowdict. This will start with
{'2': '53.17', 'name': 'KSL03514_B'}
in line
> KSL03514_2B_1 53.17
and be updated to
{'7': '65.675', '2': '53.17', 'name': 'KSL03514_B'}
when line
> KSL03514_7B_1 65.675
is encountered. The "name" item is redundant because it's the same as the
key in the outer dict
{'KSL03502_A': {'7': '11.5921', 'name': 'KSL03502_A'},
'KSL03502_B': {'7': '46.4997', 'name': 'KSL03502_B'},
...
'KSL03514_B': {'2': '53.17', '7': '65.675', 'name': 'KSL03514_B'},
'KSL03514_C': {'2': '11.6469', '7': '3.4937', 'name': 'KSL03514_C'}}
but it simplifies generating the resulting file.
If you want to cheat, here's the code I came up with:
import csv
import operator
import sys
import logging
logger = logging.getLogger()
def read_data(infile):
"""Combine lines in infile with same <name> into one dict.
Returns a sorted list of such dicts.
Expected line format:
<basename>_<prefix><suffix>_<don't care><whitespace><value><newline>
where
<prefix> digits only
<suffix> non-digit followed by any non-"_"
Then
<name> = <basename>_<suffix>
"""
# map <name> to rowdict
# rowdict maps <prefix> to <value> and "name" to <name>
rows_by_name = {}
for line in infile:
# key format:
# <basename>_<prefix><suffix>_<don't care>
key, value = line.split()
basename, both, dummy = key.split("_")
suffix = both.lstrip("0123456789")
prefix = both[:len(both)-len(suffix)]
name = basename + "_" + suffix
rowdict = rows_by_name.setdefault(name, {"name": name})
if prefix in rowdict:
# we are going to overwrite a column value
# may raise an exception instead
logger.warn("duplicate column %s=%r for %s",
prefix, value, name)
rowdict[prefix] = value
return sorted(rows_by_name.values(), key=operator.itemgetter("name"))
def main():
logging.basicConfig()
with open("PRB_utilization.txt") as infile:
rows = read_data(infile)
writer = csv.DictWriter(
sys.stdout, # may replace stdout with any writable file object
fieldnames=["name", "7", "8", "9", "2"]
)
writer.writeheader()
writer.writerows(rows)
if __name__ == "__main__":
main()
More information about the Tutor
mailing list