[SciPy-user] compare two csv files

Zachary Pincus zachary.pincus at yale.edu
Tue Jan 29 11:10:35 EST 2008


Hi Fabian,

Perhaps you could specify your problem more clearly. Basically, you  
want to write a python function that takes two values and calls them  
"equal" or not (in a fuzzy manner), and then you want to apply that  
function along a column of data?

This is probably best handled in pure python, until you get a little  
more comfortable with the basic language and want to learn numpy/ 
scipy. But first things first.

So -- you need to specify *exactly* what sort of "fuzzy" matches are  
acceptable. Then you need to transform this specification into a  
python function. Given this, it's easy to compare two lists:


list1 = [...whatever...]
list2 = [...whatever...]

def are_fuzzy_equal(element1, element2):
   ...whatever...

list3 = []
for element1, element2 in zip(list1, list2):
   if are_fuzzy_equal(element1, element2):
     list3.append(element1)

If your question is about how to implement are_fuzzy_equal, you'll  
need to (a) specify that clearly, and (b) probably want to ask on a  
basic python-language list. Or I'm sure some folks here would help in  
a pinch.

Zach



On Jan 28, 2008, at 4:11 PM, Fabian Braennstroem wrote:

> Hi to all,
>
> sorry for the bad question... actually I might be in the
> wrong group...
>
>
> At the beginning I thought, that I only have to compare two
> columns with numbers in it, but now the two columns could
> look like:
>
> 1st column:
> 1
> 2
> 3
>
>
> 2nd column:
> 0
> 5
> 1
>
> So the result should be a list with entries, which exist in
> both lists like '1'.
> A little bit more difficult would be two lists with number
> and characters.
> E.g. the lists could look like:
>
> 1st column:
> 1.test
> 2.test
> 123.test
> 123.Test
>
> 2nd column:
> 0.test
> 123_test
> 5.test
> 123.Test
>
> The searching/comparing should produce two lists; one with
> the 'double' fuzzy entries like '123.test' and '123.Test'.
>
> Would be nice, if anyone can help!?
> Thanks!
> Fabian
>
> Andrew Straw schrieb am 01/24/2008 10:18 PM:
>> Hi Fabian, this is not a direct answer to your question, but you also
>> may be intrested in matplotlib's mlab.csv2rec() which automatically
>> creates a recordarray from a csv file. John Hunter and I, to much  
>> lesser
>> degree, have been hacking on this to work for us. Please feel free to
>> check its suitability for your purposes.
>>
>> Fabian Braennstroem wrote:
>>> Hi,
>>> I would like to compare two csv file; actually two columns
>>> from two csv files.
>>> I would use something like:
>>> def read_test():
>>>      start  = time.clock()
>>>      reader = csv.reader( file('data.txt') )
>>>      data   = [ map(float, row) for row in reader ]
>>>      data   = array(data, dtype = float)
>>>
>>> To get my data into an array.
>>>
>>> Does anyone have an idea, how to compare the two columns?
>>> Would be nice!
>>> Fabian
>>>
>>> _______________________________________________
>>> SciPy-user mailing list
>>> SciPy-user at scipy.org
>>> http://projects.scipy.org/mailman/listinfo/scipy-user
>>>
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user




More information about the SciPy-User mailing list