TypeError: unhashable type: 'dict' when attempting to hash list - advice sought
kbtyo
ahlusar.ahluwalia at gmail.com
Sun Aug 30 13:02:12 EDT 2015
On Saturday, August 29, 2015 at 11:04:53 PM UTC-4, Ben Finney wrote:
> kbtyo writes:
>
> > I am using Jupyter Notebook and Python 3.4.
>
> Thank you for saying so! It is not always required, but when it matters,
> this information is important to state up front.
>
> > I have a data structure in the format, (type list):
> >
> > [{'AccountNumber': N,
> > 'Amount': '0',
> > 'Answer': '12:00:00 PM',
> > 'ID': None,
> > 'Type': 'WriteLetters',
> > 'Amount': '10',
> > {'AccountNumber': Y,
> > 'Amount': '0',
> > 'Answer': ' 12:00:00 PM',
> > 'ID': None,
> > 'Type': 'Transfer',
> > 'Amount': '2'}]
> >
> > The end goal is to write this out to CSV.
>
> So that assumes that *every* item will be a mapping with all the same
> keys. CSV is limited to a sequence of "records" which all have the same
> fields in the same order.
This clue tipped me off that I wasn't collecting the newly generate key value pairs from my XML parser properly. I was using the dictionary built in method update to update the keys. The terrible thing was that the returned dictionary was only updated with the last keys and values. What a couple of hours of shut eye can do for the mind and body.
>
> > The list comprehension "data" is to maintain the integrity of the
> > column headers and the values for each new instance of the data
> > structure (where the keys in the dictionary are the headers and values
> > - row instances). The keys in this specific data structure are meant
> > to check if there is a value instance, and if there is not - place an
> > ''.
> >
>
> [...]
> > for row in results:
> > data = [row.get(index, '') for index in results]
>
> The 'for' statement iterates over 'results', getting an item each time.
> The name 'row' is bound to each item in turn.
>
> Then, each time through the 'for' loop, you iterate *again* over
> 'results'. The name 'index' is bound to each item.
>
> You then attempt to use the dict (each item from 'results' is itself a
> dict) as a key into that same dict. A dict is not a valid key; it is not
> a "hashable type" i.e. a type with a fixed value, that can produce a
> hash of the value).
I discovered that. I need to iterate again to access the keys and values.
>
> So you're getting dicts and attempting to use those dicts as keys into
> dicts. That will give the error "TypeError: unhashable type: 'dict'".
>
> I think what you want is not items from the original sequence, but the
> keys from the mapping::
>
> for input_record in results:
> output_record = [input_record.get(key, "") for key in input_record]
>
> But you're then throwing away the constructed list, since you do nothing
> with it before the end of the loop.
>
> > writer.writerow(data)
>
> This statement occurs only *after* all the items from 'results' have
> been iterated. You will only have the most recent constructed row.
>
> Perhaps you want::
>
> for input_record in results:
> output_record = [input_record.get(key, "") for key in input_record]
> writer.writerow(output_record)
>
I tried this and some of the values maintained integrity and some were rewritten by a previous dictionary's values.
> --
> \ "An idea isn't responsible for the people who believe in it." |
> `\ --Donald Robert Perry Marquis |
> _o__) |
> Ben Finney
@BenFinney:
I feel that I need to provide some context to avoid any confusion over my motivations for choosing to do something.
My original task was to parse an XML data structure stored in a CSV file with other data types and then add the elements back as headers and the text as row values. I went back to drawing board and creating a "results" list of dictionaries where the keys have values as lists using this.
def convert_list_to_dict(get_just_xml_data):
d = {}
for item in get_just_xml_data(get_all_data):
for k, v in item.items():
try:
d[k].append(v)
except KeyError:
d[k] = [v]
return d
This creates a dictionary for each XML tag - for example:
{
'Number1': ['0'],
'Number2': ['0'],
'Number3': ['0'],
'Number4': ['0'],
'Number5': ['0'],
'RepgenName': [None],
'RTpes': ['Execution', 'Letters'],
'RTID': ['3', '5']}
I then used this to create a "headers" set (to prevent duplicates to be added) and the list of dictionaries that I mentioned in my OP.
I achieve this via:
#just headers
def construct_headers(convert_list_to_dict):
header = set()
with open('real.csv', 'rU') as infile:
reader = csv.DictReader(infile)
for row in reader:
xml_data = convert_list_to_dict(get_just_xml_data) #get_just_xml_data(get_all_data)
row.update(xml_data)
header.update(row.keys())
return header
#get all of the results
def construct_results(convert_list_to_dict):
header = set()
results = []
with open('real.csv', 'rU') as infile:
reader = csv.DictReader(infile)
for row in reader:
xml_data = convert_list_to_dict(get_just_xml_data) #get_just_xml_data(get_all_data)
# print(row)
row.update(xml_data)
# print(row)
results.append(row)
# print(results)
header.update(row.keys())
# print(type(results))
return results
I guess I am using the headers list originally written out. My initial thought is to just write out the values corresponding with each transaction. For example, citing this data structure:
{
'Number1': ['0'],
'Number2': ['0'],
'Number3': ['0'],
'Number4': ['0'],
'Number5': ['0'],
'RPN': [None],
'RTypes': ['Execution', 'Letters'],
'RTID': ['3', '5']}
I would get a CSV
Number1, Number2, Number3, Number4, Number5, RPN, RTypes,RTID
0, 0, 0, 0, 0, None, Execution, 3
None, None, None,None,None, Letters, 5
I am wondering how I would achieve this when all of the headers set is not sorted (should I do so before writing this out?). Also, since I have millions of transactions I want to make sure that the values for each of the headers is sequentially placed. Any guidance would be very helpful. Thanks.
More information about the Python-list
mailing list