Beginner Question : Iterators and zip
Terry Reedy
tjreedy at udel.edu
Sat Jul 12 19:57:38 EDT 2008
moogyd at yahoo.co.uk wrote:
> Hi group,
>
> I have a basic question on the zip built in function.
>
> I am writing a simple text file comparison script, that compares line
> by line and character by character. The output is the original file,
> with an X in place of any characters that are different.
>
> I have managed a solution for a fixed (3) number of files, but I want
> a solution of any number of input files.
>
> The outline of my solution:
>
> for vec in zip(vec_list[0],vec_list[1],vec_list[2]):
> res = ''
> for entry in zip(vec[0],vec[1],vec[2]):
> if len(set(entry)) > 1:
> res = res+'X'
> else:
> res = res+entry[0]
> outfile.write(res)
>
> So vec is a tuple containing a line from each file, and then entry is
> a tuple containg a character from each line.
>
> 2 questions
> 1) What is the general solution. Using zip in this way looks wrong. Is
> there another function that does what I want
zip(*vec_list) will zip together all entries in vec_list
Do be aware that zip stops on the shortest iterable. So if vec[1] is
shorter than vec[0] and matches otherwise, your output line will be
truncated. Or if vec[1] is longer and vec[0] matches as far as it goes,
there will be no signal either.
res=rex+whatever can be written as res+=whatever
> 2) I am using set to remove any repeated characters. Is there a
> "better" way ?
I might have written a third loop to compare vec[0] to vec[1]..., but
your set solution is easier and prettier.
If speed is an issue, don't rebuild the output line char by char. Just
change what is needed in a mutable copy. I like this better anyway.
res = list(vec[0]) # if all ascii, in 3.0 use bytearray
for n, entry in enumerate(zip(vec[0],vec[1],vec[2])):
if len(set(entry)) > 1:
res[n] = 'X'
outfile.write(''.join(res)) # in 3.0, write(res)
tjr
More information about the Python-list
mailing list