Tracking Users By IP Address

Michael Sparks zathras at thwackety.com
Fri Oct 8 20:15:01 EDT 2004


Michael Foord wrote:
> [thanks but snip.. ;-) ]
..
> Sorry... :-( don't get it.
> What is add-to-sample-set(request) doing ? Is it simply choosing a
> proportion of our users to sample ?
> 
> If this is only a 'do if you have too many users' kind-of-thing then
> unfortunately it won't be a problem for me !!

It's precisely that. If you don't think it'll be an issue for you then I'll
just leave things as is :) (If anyone's curious as to what I mean though
I'll be happy to expand)

>> (Or something like that IYSWIM - ie get the user population to indicate
>> if they're being sampled - again, this allows your users to easily opt
>> out,
> 
> As above... I don't get it, so I don't see how it achieves this ?

Suppose the following user groups:
   * Refuse all cookies - can't use cookies to track, IP isn't 100%
     reliable.
   * Users accept all cookies, don't care - ideal candidates for sampling
   * Users generally have cookies enabled, but don't like being tracked.
      * In this scenario you can have a "click on this image to cease being
        tracked" picture for them to click on, which sets a cookie that
        effectively says "don't track me". It does rely on them trusting you
        not to track them, but if you send them all the same cookie value
        (eg "NOTRACK") it should be obvious to them that the cookie is
        useless for tracking.

That way you get 3 types of cookies:
   * NEWUSER
   * specific trackable id value
   * NOTRACK

It's a nice little thing you can add on after the fact which I think is
quite nice.

>> and also means the memory/etc required to determine whether to track the
>> user or not isn't dependent on the number of requests your site gets -
>> meaning that you can keep analysis costs for your site under control. If
>> you've only got a small site this probably doesn't matter to you, but
>> worth bearing in mind).
>> 
>> The interesting thing about this from my perspective is that if you do
>> take a cookie approach like this, it actually allows you to figure out
>> how much error there actually is between IP and cookie - rather than just
>> guess.
> 
> One last question. You didn't explicitly say this, but I was thinking
> of doing it anyway. Are you suggesting to store USERID *and* IP
> address and ...

Most webservers allow you to define custom logging formats, or at minimum
some extended formats. There is a defined format for example that will
include in the log any cookie value received along with the usual details.
You can then use standard tools to analyse the log after the fact. By
noting which log lines are new users "NEWUSER" (and hence not really
trackable) and people who don't want to be tracked "NOTRACK", you can
exclude these lines and pass the thing to tools like analog, and more
sophisticated tools designed to follow user paths through a site.

Obviously such things become harder depending on how much traffic your site
sees... (Which was why I was putting pointers to sampling earlier - it's
non-trivial to get "right" in many respects :)

The advantage of taking the "just let the system log the details" is that
it's simple to do, you can use standard tools, and you can choose whether
to analyse using IPs or cookies at your leisure.

> compare the results of anylysing by IP and analysing by
> cookie.... Sounds worthwhile...

It does. I'm not really aware of anyone who has actively attempted to follow
user trails through sites based on IP then repeating the same analysis
using cookies in this kind of way. I _suspect_ the results would be very
different for small vs large sites, and potentially even how niche a
website is.

After all if you find that the margin of error is acceptable and
(preferably) predictable, you can just choose the computationally
cheaper option.

> Thanks for your help - very interesting.

No problem.

Best Regards,


Michael.



More information about the Python-list mailing list