Printing a drop down menu for a specific field.

rurpy at yahoo.com rurpy at yahoo.com
Mon Oct 28 01:22:51 EDT 2013


On 10/27/2013 01:31 AM, Nick the Gr33k wrote:
> Στις 27/10/2013 6:00 πμ, ο/η rurpy at yahoo.com έγραψε:
>[...] 
[following quote lightly edited for clarity] 
> I almost understand your code, but this part is not so clear to me:
>
  key = host, city, useros, browser
> if key not in seen:
       newdata.append( [host, city, useros, browser, [ref], hits, [visit]] )
>      seen[key] = len( newdata ) - 1 # Save index (for 'newdata') of this row.
> else:    # This row is a duplicate row with a different referrer & visit time.
>      rowindex = seen[key]
>      newdata[rowindex][4].append( ref )
>      newdata[rowindex][5] += hits
>      newdata[rowindex][6].append( visit )

I'm not sure exactly what part is not clear to you so I'll give 
you a very long-winded explanation and you can ignore any parts 
that are already obvious to you.

The code above is inside a loop that looks at each row in <data>.

In <data> there can be several rows for the same visitor, where you
define a visitor as a unique combination of <host>, <city>, <useros> 
and <browser>.

What you want to do is combine all of the rows that are for the same
visitor into one row.  That one row, instead of having a single value
for <ref> and <lastvisit> will have lists of all the <ref>s and 
<lastvisit>s from all the rows that have the same visitor value.

So first, for each row, we set <key> to a tuple that identifies the 
visitor.  (Actually, I should have named that variable "visitor" 
instead of "key".)  Then we use an ordinary python dictionary <seen> 
to record each visitor as we see them.  Remember that a dictionary 
can use a tuple as a key (unlike Perl were a hash key has to be a 
string).  

For each row we look in the dictionary <seen> to see if this visitor
is a new one that we haven't seen before.  If we haven't seen them
before we create a new row in <newdata> for them that is a copy of 
the row in <data> except we change the <ref> and <lastvisit> fields 
from single values to lists.  We also add an entry to <seen> whose 
key is the visitor, and whose value is the index of the vistor's row 
in <newdata>. 

If the visitor *was* seen before (because we find an entry for the 
visitor in <seen>), then the value of that entry tells us the index 
of that visitor's row in <newdata> and instead of adding a new row
to <newdata> we update the visitors row that is already there.

Maybe it's easier to see what is happening by looking at how the 
code actually runs.

Suppose the data you get from your database is

    data = ['mail14.ess.barracuda.com',           'Άγνωστη Πόλη', 'Windows', 'Explorer', 'Direct Hit',           '1', 'Σάββατο 26 Οκτ, 18:49',
            '209.133.77.165.T01713-01.above.net', 'Άγνωστη Πόλη', 'Windows', 'Explorer', 'Direct Hit',           '1', 'Σάββατο 26 Οκτ, 18:59',
            'mail14.ess.barracuda.com',           'Άγνωστη Πόλη', 'Windows', 'Explorer', 'http://superhost.gr/', '1', 'Σάββατο 26 Οκτ, 18:48',
           ]

When the first row of <data> is processed, <key> will be
set to the 4-tuple:

  ('mail14.ess.barracuda.com','Άγνωστη Πόλη','Windows','Explorer').

Then, when "if key not in seen" is executed.  This will look in 
dictionary <seen> and see if there in an entry in it with a key that 
matches the tuple above.  Since <seen> is still an empty dictionary,
<key> is not in the dictionary because there is nothing in the
dictionary and "if key not in seen" is true.

So the first branch of the if statement runs:

       newdata.append( [host, city, useros, browser, [ref], hits, [visit]] )
       seen[key] = len( newdata ) - 1 # Save index (for 'newdata') of this row.

Now, <newdata> contains 1 row:

    [ 'mail14.ess.barracuda.com','Άγνωστη Πόλη','Windows','Explorer', ['Direct Hit'], 1, ['Σάββατο 26 Οκτ, 18:49'] ]

And, <seen> contains:

  { ('mail14.ess.barracuda.com','Άγνωστη Πόλη','Windows','Explorer'): 0 }

Note the the 0 value in the <seen> dictionary is the index of the 
corresponding row in <newdata>.

When the second row of <data> is processed, the same thing happens.
<key> is the tuple

  ('209.133.77.165.T01713-01.above.net','Άγνωστη Πόλη','Windows','Explorer')

but since the only key in <seen> is 

  ('mail14.ess.barracuda.com','Άγνωστη Πόλη','Windows','Explorer')

again the "not in" branch is executed.  When it runs this time it
adds another row to <newdata> so <newdata> now looks like:

    [ 'mail14.ess.barracuda.com','Άγνωστη Πόλη','Windows','Explorer', ['Direct Hit'], 1, ['Σάββατο 26 Οκτ, 18:49'], 
      '209.133.77.165.T01713-01.above.net','Άγνωστη Πόλη','Windows','Explorer', ['Direct Hit'], 1, ['Σάββατο 26 Οκτ, 18:59'], ]

and adds another entry to <seen> so that <seen> is now:

  { ('mail14.ess.barracuda.com','Άγνωστη Πόλη','Windows','Explorer'): 0, 
    ('209.133.77.165.T01713-01.above.net','Άγνωστη Πόλη','Windows','Explorer'): 1 }

Again, the 1 is the index of the corresponding row in <newdata>.

Now the third row of <data> is processed.  <key> is set to

  ('mail14.ess.barracuda.com','Άγνωστη Πόλη','Windows','Explorer')

This time when "if key not in seen" is executed, it is false because
that key *is* in seen, it was added the when the first <data> was
processed (look at <seen> above).  So the statements

       rowindex = seen[key]
       newdata[rowindex][4].append( ref )
       newdata[rowindex][5] += hits
       newdata[rowindex][6].append( visit )

are executed.  These statements will update the existing row for
visitor <key> in <newdata>.  The value of <key> is

  ('mail14.ess.barracuda.com','Άγνωστη Πόλη','Windows','Explorer')

and seen[key] is 0 and <rowindex> is set to that.  newdata[0] is
the row in <newdata> for the same visitor.  The next three lines 
just update that row by appending the current <data> row's 
<lastvisit> time to newdata[0]'s visits list.  Similarly for ref, 
and hits field of newdata[0] is incremented by the current <data> 
row's hits field.  After, <newdata> looks like:

    [ 'mail14.ess.barracuda.com','Άγνωστη Πόλη','Windows','Explorer', ['Direct Hit',            
                                                                       'http://superhost.gr/'], 2, ['Σάββατο 26 Οκτ, 18:49',
                                                                                                    'Σάββατο 26 Οκτ, 18:48'], 

      '209.133.77.165.T01713-01.above.net','Άγνωστη Πόλη','Windows','Explorer', ['Direct Hit'], 1, ['Σάββατο 26 Οκτ, 18:59'], ]

And so on for the rest of the data.  When a new visitor is seen
a row is added to <newdata> and the visitor (identified as the 
tupple <key>) is saved in <seen> along with that index of that
visitor's row in <newdata>.  If the same visitor is seen again
later in <data>, the corresponding row in <newdata> is updated 
rather then adding a new row to <newdata>.  

Did that help make it clearer?  It is a lot easier to write
code than to explain it :-) 



More information about the Python-list mailing list