Capturing instant messages

Nick Vatamaniuc vatamane at gmail.com
Tue Jul 18 07:36:51 EDT 2006


Ed,

It depends on what IM protocol the company is using. If there is more
than one, your job might end up being quite complicated. You indicated
port 5190 in your post, does it mean that the company is using only AOL
IM? In general it seems like you would have to:

1) Capture the traffic
2) Decode the IM protocol
3) Record the captured text

1) As far as capturing the traffic, I would use a specific tool like
tcpick ( a cousin of tcpdump but actually dumps the data to console not
just the headers and recreates the tcp streams -- good stuff!).  Again
if you know the exact port number and the exact protocol this might be
very easy because you will set up your capturing program to capture
traffic from only 1 port. Let's assume that for now. Here is my quick
and dirty attempt. First install tcpick http://tcpick.sourceforge.net/
if you don't have it, then become root and open a Python prompt. (Use
ipython... because my mom says it's better ;).
In [1]:from subprocess import * #don't do this in your final script
always use 'import subprocess'
In [2]:cmd='/usr/sbin/tcpick -i eth0 -bR tcp port 80' #use your IM port
here instead of 80
#-bR means reconstruct TCP stream and dump data in raw mode to console
(good for ASCII stuff).
In [3]:p=Popen(cmd, shell=True, bufsize=0, stdout=PIPE, stderr=PIPE)
#start a subprocess w/ NO_WAIT
In [4]:p.pid #check the process pid, can use this to issue a 'kill'
command later...
Out[4]:7100
In [5]:p.poll()
In [6]:#Acutally it is None, which means process is not finished
In [7]:#Read some lines one by one from output
In [8]:p.stdout.readline() #Might block here, if so start a browser and
load a page
Out[8]:'Starting tcpick 0.2.1 at 2006-XX-XX XX:XX EDT\n'
In [9]:#
In [10]:#Print some lines from the output, one by one:
In [11]:p.stdout.readline()
Out[11]:'Timeout for connections is 600\n' #first line, tcpick prompt
stuff
In [12]:p.stdout.readline()
Out[12]:'tcpick: listening on eth0\n'
In [13]:p.stdout.readline()
Out[13]:'setting filter: "tcp"\n'
In [14]:p.stdout.readline()
Out[14]:'1      SYN-SENT       192.168.0.106:53498 >
64.233.167.104:www\n'
In [15]:p.stdout.readline()
Out[15]:'1      SYN-RECEIVED   192.168.0.106:53498 >
64.233.167.104:www\n'
In [16]:p.stdout.readline()
Out[16]:'1      ESTABLISHED    192.168.0.106:53498 >
64.233.167.104:www\n'
In [17]:p.stdout.readline() #the good stuff should start right here
Out[17]:'GET /search?hl=en&q=42&btnG=Google+Search HTTP/1.1\r\n'
In [18]:p.stdout.readline()
Out[18]:'Host: www.google.com\r\n'
In [19]:p.stdout.readline()
Out[19]:'User-Agent: blah blah...\r\n'
In [20]:p.stdout.read() #try a read() -- will block, press Ctrl-C
exceptions.KeyboardInterrupt
In [21]:p.poll()
Out[21]:0  #process is finished, return errcode = 0
In [22]:p.stderr.read()
Out[22]:'' #no error messages
In [23]:p.stdout.read()
Out[23]:'\n257 packets captured\n7 tcp sessions detected\n'
In [24]: #those were the last stats before tcpick was terminated.

Well anyway, your readline()'s will block on process IO when no data
supplied from tcpick. Might have to start a thread in Python to manage
the thread that spawns the capture process. But in the end the
readlines will get you the raw data from the network (in this case it
was just one way from my ip to Google, of course you will need it both
ways).

2) The decoding will depend on your protocol, if you have more than one
IM protocol then the capture idea from above won't work too well, you
will have to capture all the traffic then decode each stream, for each
side, for each protocol.

3) Recording or replay is easy. Save to files or dump to a MySQL table
indexed by user id,  timestamp, IP etc. Because of buffering issues you
will probably not get a very accurate real-time monitoring system with
this setup.

Hope this helps,
Nick Vatamaniuc






Ed Leafe wrote:
> I've been approached by a local business that has been advised that
> they need to start capturing and archiving their instant messaging in
> order to comply with Sarbanes-Oxley. The company is largely PC, but
> has a significant number of Macs running OS X, too.
>
> 	Googling around quickly turns up IM Grabber for the PC, which would
> seem to be just what they need. But there is no equivalent to be
> found for OS X. So if anyone knows of any such product, please let me
> know and there will be no need for the rest of this post.
>
> 	But assuming that there is no such product, would it be possible to
> create something in Python, using the socket or a similar module?
> They have a number of servers that provide NAT for each group of
> machines; I was thinking that something on those servers could
> capture all traffic on port 5190 and write it to disk. Is this
> reasonable, or am I being too simplistic in my approach?
> 
> -- Ed Leafe
> -- http://leafe.com
> -- http://dabodev.com




More information about the Python-list mailing list