dictionary comparison

Jordan Rastrick jrastrick at student.usyd.edu.au
Thu May 5 12:59:53 EDT 2005


rickle wrote:
> I'm trying to compare sun patch levels on a server to those of what
sun
> is recommending.  For those that aren't familiar with sun patch
> numbering here is a quick run down.
>
> A patch number shows up like this:
> 113680-03
> ^^^^^^ ^^
> patch#  revision
>
> What I want to do is make a list.  I want to show what server x has
> versus what sun recommends, and if the patch exists, but the revision
> is different, I want to show that difference.
>
> Here are some sample patches that sun recommends:
> 117000-05
> 116272-03
> 116276-01
> 116278-01
> 116378-02
> 116455-01
> 116602-01
> 116606-01
>
> Here are some sample patches that server x has:
> 117000-01
> 116272-02
> 116272-01
> 116602-02
>
> So there are some that are the same, some that sun recommends that
> server x doesn't have, and some where the patch is the same but the
> revision is different.
>
> I've thrown the data into dictionaries, but I just can't seem to
figure
> out how I should actually compare the data and present it.  Here's
what
> I have so far (the split is in place because there is actually a lot
> more data in the file, so I split it out so I just get the patch
number
> and revision).  So I end up with (for example) 116272-01, then split
so
> field[0] is 116272 and field[1] is 01.
>
> def sun():
>         sun = open('sun-patchlist', 'r')
>         for s in sun:
>                 sun_fields = s.split(None, 7)
>                 for sun_field in sun_fields:
>                         sun_field = sun_field.strip()
>                 sun_patch = {}
>                 sun_patch['number'] = sun_fields[0]
>                 sun_patch['rev'] = sun_fields[1]
>                 print sun_patch['number'], sun_patch['rev']
>         sun.close()
>
> def serverx():
>         serverx = open('serverx-patchlist', 'r')
>         for p in serverx:
>                 serverx_fields = p.split(None, 7)
>                 for serverx_field in serverx_fields:
>                         serverx_field = serverx_field.strip()
>                 serverx_patch = {}
>                 serverx_patch['number'] = serverx_fields[0]
>                 serverx_patch['rev'] = serverx_fields[1]
>                 print serverx_patch['number'], serverx_patch['rev']
>         serverx.close()
>
> if __name__=='__main__':
>         sun()
>         serverx()
>
>
> Right now I'm just printing the data, just to be sure that each
> dictionary contains the correct data, which it does.  But now I need
> the comparison and I just can't seem to figure it out.  I could
> probably write this in perl or a shell script, but I'm trying really
> hard to force myself to learn Python so I want this to be a python
> script, created with only built-in modules.
>
> Any help would be greatly appreciated,
> Rick

Well, it seems that what youre asking is more of a generic programming
question than anything specific to Python - if you can think of how
you'd solve this in Perl, for example, then a Python solution along the
same lines would work just as well. I'm not sure if there was some
specific issue with Python that was confusing you - if so, perhaps you
could state it more explicitly.

To address the problem itself, there are a few things about your
approach in the above code that I find puzzling. First of all, the
sun() and servex() functions are identical, except for the name of the
file they open. This kind of code duplication is bad practice, in
Python, Perl, or any other language (even Shell scripting perhaps,
although I wouldn't really know) - you should definitely use a single
function that takes a filename as an argument instead.

Second, you are creating a new dictionary inside every iteration of the
for loop, one for every patch in the file; each dictionary you create
contains one patch number and one revision number. This data is
printed, and thereafter ignored (and thus will be consumed by Python's
Garbage Collector.) Hence youre not actually storing it for later use.
I don't know whether this was because you were unsure how to proceed to
the comparing the two datasets; however I think what you probably
wanted was to have a single dictionary, that keeps track of all the
patches in the file. You need to define this outside the for loop; and,
if you want to use it outside the body of the function, you'll need to
return it. Also, rather than have a dictionary of two values, keyed by
strings, I'd suggest a dictionary mapping patch numbers to their
corresponding revision numbers is what you want.

Once you've got two dictionaries - one for the list for the servers
patches, and one for Sun's recommended patches - you can compare the
two sets of data by going through the Sun's patches, checking if the
server has that patch, and if so, caluclating the difference in
revision numbers.

So heres a rough idea of how I'd suggest modifying what you've got to
get the intended result:

def patchlevels(filename):
    patchfile = open(filename, 'r')
    patch_dict = {}
    for line in patchfile:
        fields = line.split(None, 7)
        for field in fields:
            field = field.strip()
        number = fields[0]
        rev = fields[1]
        patch_dict[number] = rev
        # print number, patch_dict[number]
    patchfile.close()
    return patch_dict

if __name__=='__main__':
    sun = patchlevels('sun-patchfile')
    serverx = patchlevels('serverx-patchfile')
    print "Sun recommends:\t\t", "Server has:\n"
    for patch in sun:
        if patch in serverx:
            rev = serverx[patch]
            diff = int(rev) - int(sun[patch])
            serverhas = "Revision: %s Difference: %s" % (rev, diff)
        else:
            serverhas = "Does not have this patch"
        print patch, sun[patch], "\t\t", serverhas

I've tried to stay as close to your code as possible and not introduce
new material, although I have had to use the inbuilt function int to
convert the revision numbers from strings to integers in order to
subtract one from the other; also, I used C printf-style string
formatting on the line after. I hope its reasonably obvious what these
things do.

For the sample data you gave, this outputs:

Sun recommends:		Server has:

116276 01 		Does not have this patch
116378 02 		Does not have this patch
116272 03 		Revision: 01 Difference: -2
116278 01 		Does not have this patch
116602 01 		Revision: 02 Difference: 1
116606 01 		Does not have this patch
116455 01 		Does not have this patch
117000 05 		Revision: 01 Difference: -4

Here negative differences mean the server's version of the patch is out
of date, whereas positive differences mean its as recent as Sun's
recommendation or better. You could change the nature of the output to
whatever your own preference is easily enough. Or, if you want store
the data in some other structure like a list for further processing,
instead of just printing it, thats also pretty simple to do.

This code isn't exactly a work of art, I could have put more effort
into a sensible name for the function and variables, made it more
'pythonic' (e.g. by using a list-comprehension in place of the
whitespace stripping for loop ), etc; but I think it achieves the
desired result, or something close to it, right?

Let me know if I was on completely the wrong track.




More information about the Python-list mailing list