Searching through two logfiles in parallel?

Tue Jan 8 18:40:01 EST 2013

On 8 January 2013 19:16, darnold <darnold992000 at yahoo.com> wrote:
> i don't think in iterators (yet), so this is a bit wordy.
> same basic idea, though: for each message (set of parameters), build a
> list of transactions consisting of matching send/receive times.

The advantage of an iterator based solution is that we can avoid
loading all of both log files into memory.

[SNIP]
>
> results = {}
>
> for line in sendData.split('\n'):
>     if not line.strip():
>         continue
>
>     timestamp, params = parse(line)
>     if params not in results:
>         results[params] = [{'sendTime': timestamp, 'receiveTime':
> None}]
>     else:
>         results[params].append({'sendTime': timestamp, 'receiveTime':
> None})
[SNIP]

This kind of logic is made a little easier (and more efficient) if you
use a collections.defaultdict instead of a dict since it saves needing
to check if the key is in the dict yet. Example:

>>> import collections
>>> results = collections.defaultdict(list)
>>> results
defaultdict(<type 'list'>, {})
>>> results['asd'].append(1)
>>> results
defaultdict(<type 'list'>, {'asd': [1]})
>>> results['asd'].append(2)
>>> results
defaultdict(<type 'list'>, {'asd': [1, 2]})
>>> results['qwe'].append(3)
>>> results
defaultdict(<type 'list'>, {'qwe': [3], 'asd': [1, 2]})

Oscar