save dictionary to a file without brackets.

Giuseppe Amatulli giuseppe.amatulli at gmail.com
Thu Aug 9 17:53:49 EDT 2012


Thanks a lot for the clarification.
Actually my problem is giving to raster dataset in geo-tif format find out
unique pair combination, count the number of observation
unique combination in rast1, count the number of observation
unique combination in rast2, count the number of observation

I try different solution and this seems to me the faster


Rast00=dsRast00.GetRasterBand(1).ReadAsArray()
Rast10=dsRast10.GetRasterBand(1).ReadAsArray()

mask=( Rast00 != 0 ) & ( Rast10 != 0  )  # may be this masking
operation can be included in the for loop

Rast00_mask= Rast00[mask]                # may be this masking
operation can be included in the for loop
Rast10_mask= Rast10[mask]                # may be this masking
operation can be included in the for loop

array2D = np.array(zip( Rast00_mask,Rast10_mask))

unique_u=dict()
unique_k1=dict()
unique_k2=dict()

for key1,key2 in  array2D :
    row = tuple((key1,key2))
    if row in unique_u:
        unique_u[row] += 1
    else:
        unique_u[row] = 1
    if key1 in unique_k1:
        unique_k1[key1] += 1
    else:
        unique_k1[key1] = 1
    if key2 in unique_k2:
        unique_k2[key2] += 1
    else:
        unique_k2[key2] = 1

output = open(dst_file_rast0010, "w")
for (a, b), c in unique_u.items():
    print(a, b, c, file=output)
output.close()

output = open(dst_file_rast00, "w")
for (a), b in unique_k1.items():
    print(a, b, file=output)
output.close()

output = open(dst_file_rast10, "w")
for (a), b in unique_k2.items():
    print(a, b, file=output)
output.close()

What do you think? is there a way to speed up the process?
Thanks
Giuseppe





On 9 August 2012 16:34, Roman Vashkevich <vashkevichrb at gmail.com> wrote:
> Actually, they are different.
> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> Dict uses hashing to get a value from the dict and this is why it's O(1).
>
> 10.08.2012, в 1:21, Tim Chase написал(а):
>
>> On 08/09/12 15:41, Roman Vashkevich wrote:
>>> 10.08.2012, в 0:35, Tim Chase написал(а):
>>>> On 08/09/12 15:22, Roman Vashkevich wrote:
>>>>>> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
>>>>>> and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
>>>>>> 4 5 1
>>>>>> 5 4 1
>>>>>> 4 4 2
>>>>>> 2 3 1
>>>>>> 4 3 2
>>>>>
>>>>> for key in dict:
>>>>>    print key[0], key[1], dict[key]
>>>>
>>>> This might read more cleanly with tuple unpacking:
>>>>
>>>> for (edge1, edge2), cost in d.iteritems(): # or .items()
>>>>   print edge1, edge2, cost
>>>>
>>>> (I'm making the assumption that this is a edge/cost graph...use
>>>> appropriate names according to what they actually mean)
>>>
>>> dict.items() is a list - linear access time whereas with 'for
>>> key in dict:' access time is constant:
>>> http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#use-in-where-possible-1
>>
>> That link doesn't actually discuss dict.{iter}items()
>>
>> Both are O(N) because you have to touch each item in the dict--you
>> can't iterate over N entries in less than O(N) time.  For small
>> data-sets, building the list and then iterating over it may be
>> faster faster; for larger data-sets, the cost of building the list
>> overshadows the (minor) overhead of a generator.  Either way, the
>> iterate-and-fetch-the-associated-value of .items() & .iteritems()
>> can (should?) be optimized in Python's internals to the point I
>> wouldn't think twice about using the more readable version.
>>
>> -tkc
>>
>>
>



-- 
Giuseppe Amatulli
Web: www.spatial-ecology.net



More information about the Python-list mailing list