TypeError: unhashable type: 'dict' when attempting to hash list - advice sought

Sun Aug 30 13:02:12 EDT 2015

On Saturday, August 29, 2015 at 11:04:53 PM UTC-4, Ben Finney wrote:
> kbtyo writes:
> 
> > I am using Jupyter Notebook and Python 3.4.
> 
> Thank you for saying so! It is not always required, but when it matters,
> this information is important to state up front.
> 
> > I have a data structure in the format, (type list):
> >
> > [{'AccountNumber': N,
> > 'Amount': '0',
> >  'Answer': '12:00:00 PM',
> >   'ID': None,
> >   'Type': 'WriteLetters',
> >   'Amount': '10',
> >   {'AccountNumber': Y,
> >       'Amount': '0',
> >       'Answer': ' 12:00:00 PM',       
> >        'ID': None,
> >       'Type': 'Transfer',
> >       'Amount': '2'}]
> >
> > The end goal is to write this out to CSV.
> 
> So that assumes that *every* item will be a mapping with all the same
> keys. CSV is limited to a sequence of "records" which all have the same
> fields in the same order.

This clue tipped me off that I wasn't collecting the newly generate key value pairs from my XML parser properly. I was using the dictionary built in method update to update the keys. The terrible thing was that the returned dictionary was only updated with the last keys and values. What a couple of hours of shut eye can do for the mind and body. 

> 
> > The list comprehension "data" is to maintain the integrity of the
> > column headers and the values for each new instance of the data
> > structure (where the keys in the dictionary are the headers and values
> > - row instances). The keys in this specific data structure are meant
> > to check if there is a value instance, and if there is not - place an
> > ''.
> >
> 
> [...]
> >         for row in results:
> >             data = [row.get(index, '') for index in results]
> 
> The 'for' statement iterates over 'results', getting an item each time.
> The name 'row' is bound to each item in turn.
> 
> Then, each time through the 'for' loop, you iterate *again* over
> 'results'. The name 'index' is bound to each item.
> 
> You then attempt to use the dict (each item from 'results' is itself a
> dict) as a key into that same dict. A dict is not a valid key; it is not
> a "hashable type" i.e. a type with a fixed value, that can produce a
> hash of the value).

I discovered that. I need to iterate again to access the keys and values. 
> 
> So you're getting dicts and attempting to use those dicts as keys into
> dicts. That will give the error "TypeError: unhashable type: 'dict'".
> 
> I think what you want is not items from the original sequence, but the
> keys from the mapping::
> 
>     for input_record in results:
>         output_record = [input_record.get(key, "") for key in input_record]
> 
> But you're then throwing away the constructed list, since you do nothing
> with it before the end of the loop.
> 
> >         writer.writerow(data)
> 
> This statement occurs only *after* all the items from 'results' have
> been iterated. You will only have the most recent constructed row.
> 
> Perhaps you want::
> 
>     for input_record in results:
>         output_record = [input_record.get(key, "") for key in input_record]
>         writer.writerow(output_record)
> 

I tried this and some of the values maintained integrity and some were rewritten by a previous dictionary's values. 

> -- 
>  \       "An idea isn't responsible for the people who believe in it." |
>   `\                                      --Donald Robert Perry Marquis |
> _o__)                                                                  |
> Ben Finney

@BenFinney:

I feel that I need to provide some context to avoid any confusion over my motivations for choosing to do something. 

My original task was to parse an XML data structure stored in a CSV file with other data types and then add the elements back as headers and the text as row values. I went back to drawing board and creating a "results" list of dictionaries where the keys have values as lists using this. 

def convert_list_to_dict(get_just_xml_data):
    d = {}
    for item in get_just_xml_data(get_all_data):
        for k, v in item.items():
            try:
                d[k].append(v)
            except KeyError:
                d[k] = [v]
    return d 

This creates a dictionary for each XML tag - for example: 
{
 'Number1': ['0'],
 'Number2': ['0'],
 'Number3': ['0'],
 'Number4': ['0'],
 'Number5': ['0'],
 'RepgenName': [None],
 'RTpes': ['Execution', 'Letters'],
 'RTID': ['3', '5']}

I then used this to create a "headers" set (to prevent duplicates to be added) and the list of dictionaries that I mentioned in my OP. 

I achieve this via:

#just headers
def construct_headers(convert_list_to_dict):
    header = set()
    with open('real.csv', 'rU') as infile:
              reader = csv.DictReader(infile)
              for row in reader:
                    xml_data = convert_list_to_dict(get_just_xml_data) #get_just_xml_data(get_all_data)
                    row.update(xml_data)                                 
                    header.update(row.keys())
    return header

#get all of the results
def construct_results(convert_list_to_dict):
    header = set()
    results = []
    with open('real.csv', 'rU') as infile:
              reader = csv.DictReader(infile)
              for row in reader:
                    xml_data = convert_list_to_dict(get_just_xml_data) #get_just_xml_data(get_all_data)
#                     print(row)
                    row.update(xml_data)
#                     print(row)
                    results.append(row)
#                     print(results)
                    header.update(row.keys())
#     print(type(results))
    return results

I guess I am using the headers list originally written out. My initial thought is to just write out the values corresponding with each transaction. For example, citing this data structure:

{
 'Number1': ['0'],
 'Number2': ['0'],
 'Number3': ['0'],
 'Number4': ['0'],
 'Number5': ['0'],
 'RPN': [None],
 'RTypes': ['Execution', 'Letters'],
 'RTID': ['3', '5']}

I would get a CSV 

Number1, Number2, Number3, Number4, Number5, RPN, RTypes,RTID

0, 0, 0, 0, 0, None, Execution, 3
None, None, None,None,None, Letters, 5

I am wondering how I would achieve this when all of the headers set is not sorted (should I do so before writing this out?). Also, since I have millions of transactions I want to make sure that the values for each of the headers is sequentially placed. Any guidance would be very helpful. Thanks.