[Tutor] 1 to N searches in files

Spectral None spectralnone at yahoo.com.sg
Sun Dec 2 09:53:43 CET 2012


Hi all

I have two files (File A and File B) with strings of data in them (each string on a separate line). Basically, each string in File B will be compared with all the strings in File A and the resulting output is to show a list of matched/unmatched lines and optionally to write to a third File C

File A: Unique strings
File B: Can have duplicate strings (that is, "string1" may appear more than once)

My code currently looks like this:

-----------------
FirstFile = open('C:\FileA.txt', 'r')
SecondFile = open('C:\FileB.txt', 'r')
ThirdFile = open('C:\FileC.txt', 'w')

a = FirstFile.readlines()
b = SecondFile.readlines()

mydiff = difflib.Differ()
results = mydiff(a,b)
print("\n".join(results))

#ThirdFile.writelines(results)

FirstFile.close()
SecondFile.close()
ThirdFile.close()
---------------------

However, it seems that the results do not correctly reflect the matched/unmatched lines. As an example, if FileA contains "string1" and FileB contains multiple occurrences of "string1", it seems that the first occurrence matches correctly but subsequent "string1"s are treated as unmatched strings.

I am thinking perhaps I don't understand Differ() that well and that it is not doing what I hoped to do? Is Differ() comparing first line to first line and second line to second line etc in contrast to what I wanted to do?

Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20121202/6507cbad/attachment.html>


More information about the Tutor mailing list