ImSim: Image Similarity

Sat Mar 5 12:40:52 EST 2011

On Mar 5, 7:10 pm, Mel <mwil... at the-wire.com> wrote:
> n00m wrote:
>
> > I uploaded a new version of the subject with a
> > VERY MINOR correction in it. Namely, in line #55:
>
> >     print '%12s %7.2f' % (db[k][1], db[k][0] / 3600.0,)
>
> > instead of
>
> >     print '%12s %7.2f' % (db[k][1], db[k][0] * 0.001,)
>
> > I.e. I normalized it to base = 100.
> > Now the values of similarity can't be greater than 100
> > and can be treated as some "regular" percents (%%).
>
> > Also, due to this change, the *empirical* threshold of
> > "system alarmity" moved down from "number 70" to "20%".
>
> >   bears2.jpg
> > --------------------
> >   bears2.jpg    0.00
> >   bears3.jpg   15.37
> >   bears1.jpg   19.13
> >     sky1.jpg   23.29
> >     sky2.jpg   23.45
> >      ff1.jpg   25.37
> >    lake1.jpg   26.43
> >   water1.jpg   26.93
> >      ff2.jpg   28.43
> >   roses1.jpg   31.95
> >   roses2.jpg   36.12
>
> I'd like to see a *lot* more structure in there, with modularization, so the
> internal functions could be used from another program.  Once I'd figured out
> what it was doing, I had this:
>
> from PIL import Image
> from PIL import ImageStat
>
> def row_column_histograms (file_name):
>     '''Reduce the image to a 5x5 square of b/w brightness levels 0..3
>     Return two brightness histograms across Y and X
>     packed into a 10-item list of 4-item histograms.'''
>     im = Image.open (file_name)
>     im = im.convert ('L')       # convert to 8-bit b/w
>     w, h = 300, 300
>     im = im.resize ((w, h))
>     imst = ImageStat.Stat (im)
>     sr = imst.mean[0]   # average pixel level in layer 0
>     sr_low, sr_mid, sr_high = (sr*2)/3, sr, (sr*4)/3
>     def foo (t):
>         if t < sr_low: return 0
>         if t < sr_mid: return 1
>         if t < sr_high: return 2
>         return 3
>     im = im.point (foo) # reduce to brightness levels 0..3
>     yhist = [[0]*4 for i in xrange(5)]
>     xhist = [[0]*4 for i in xrange(5)]
>     for y in xrange (h):
>         for x in xrange (w):
>             k = im.getpixel ((x, y))
>             yhist[y / 60][k] += 1
>             xhist[x / 60][k] += 1
>     return yhist + xhist
>
> def difference_ranks (test_histogram, sample_histograms):
>     '''Return a list of difference ranks between the test histograms and
> each of the samples.'''
>     result = [0]*len (sample_histograms)
>     for k, s in enumerate (sample_histograms):  # for each image
>         for i in xrange(10):    # for each histogram slot
>             for j in xrange(4): # for each brightness level
>                 result[k] += abs (s[i][j] - test_histogram[i][j])      
>     return result
>
> if __name__ == '__main__':
>     import getopt, sys
>     opts, args = getopt.getopt (sys.argv[1:], '', [])
>     if not args:
>         args = [
>             'bears1.jpg',
>             'bears2.jpg',
>             'bears3.jpg',
>             'roses1.jpg',
>             'roses2.jpg',
>             'ff1.jpg',
>             'ff2.jpg',
>             'sky1.jpg',
>             'sky2.jpg',
>             'water1.jpg',
>             'lake1.jpg',
>         ]
>         test_pic = 'bears2.jpg'
>     else:
>         test_pic, args = args[0], args[1:]
>
>     z = [row_column_histograms (a) for a in args]
>     test_z = row_column_histograms (test_pic)
>
>     file_ranks = zip (difference_ranks (test_z, z), args)      
>     file_ranks.sort()
>
>     print '%12s' % (test_pic,)
>     print '--------------------'
>     for r in file_ranks:
>         print '%12s %7.2f' % (r[1], r[0] / 3600.0,)
>
> (omitting a few comments that wrapped around.)  The test-case still agrees
> with your archived version:
>
> mwilson at tecumseth:~/sandbox/im_sim$ python image_rank.py bears2.jpg *.jpg
>   bears2.jpg
> --------------------
>   bears2.jpg    0.00
>   bears3.jpg   15.37
>   bears1.jpg   19.20
>     sky1.jpg   23.20
>     sky2.jpg   23.37
>      ff1.jpg   25.30
>    lake1.jpg   26.38
>   water1.jpg   26.98
>      ff2.jpg   28.43
>   roses1.jpg   32.01
>
> I'd vaguely wanted to do something like this for a while, but I never dug
> far enough into PIL to even get started.  An additional kind of ranking that
> takes colour into account would also be good -- that's the first one I never
> did.
>
>         Cheers,         Mel.

Very nice, Mel.

As for using color info...
my current strong opinion is: the colors must be forgot for good.
Paradoxically but "profound" elaboration and detailization can/will
spoil/undermine the whole thing. Just my current imo.

===========================
Vitali