sorting tuples...

Bengt Richter bokr at oz.net
Sat Sep 17 13:57:27 EDT 2005


On 17 Sep 2005 06:41:08 -0700, nidhog at gmail.com wrote:

>Hello guys,
>
>I made a script that extracts strings from a binary file. It works.
>
>My next problem is sorting those strings.
>
>Output is like:
>
>---- snip ----
>200501221530
>John
>*** long string here ***
>
>200504151625
>Clyde
>*** clyde's long string here ***
>
>200503130935
>Jeremy
>*** jeremy string here ****
>---- snip ----
>
>How can I go about sorting this list based on the date string that
>marks the start of each message?
>
>Should I be using lists, dictionaries or tuples?
>
>What should I look into?
>
>Is there a way to generate variables in a loop? Like:
>
>x=0
>while (x<10):
>    # assign variable-x = [...list...]
>    x = x+1
>
>Thanks.
>
Assuming your groups of strings are all non-blank lines delimited by blank lines,
and using StringIO as a line iterable playing the role of your source of lines,
(not tested beyond what you see ;-)

 >>> from StringIO import StringIO
 >>> lines = StringIO("""\
 ... 200501221530
 ... John
 ... *** long string here ***
 ...
 ... 200504151625
 ... Clyde
 ... *** clyde's long string here ***
 ...
 ... 200503130935
 ... Jeremy
 ... *** jeremy string here ****
 ... """)
 >>>
 >>> from itertools import groupby
 >>> for t in sorted(tuple(g) for k, g in groupby(lines,
 ...                     lambda line:line.strip()!='') if k):
 ...     print t
 ...
 ('200501221530\n', 'John\n', '*** long string here ***\n')
 ('200503130935\n', 'Jeremy\n', '*** jeremy string here ****\n')
 ('200504151625\n', 'Clyde\n', "*** clyde's long string here ***\n")

The lambda computes a grouping key that groupby uses to collect group members
as long as the value doesn't change, so this groups non-blank vs blank lines,
and the "if k" throws out the blank-line groups.

Obviously you could do something else with the sorted line tuples t, e.g.,

 >>> lines.seek(0)
(just needed that to rewind the StringIO data here)

 >>> for t in sorted(tuple(g) for k, g in groupby(lines,
 ...                     lambda line:line.strip()!='') if k):
 ...     width = max(map(lambda x:len(x.rstrip()), t))
 ...     topbot = '+-%s-+'%('-'*width)
 ...     print topbot
 ...     for line in t: print '| %s |' % line.rstrip().ljust(width)
 ...     print topbot
 ...     print
 ...
 +--------------------------+
 | 200501221530             |
 | John                     |
 | *** long string here *** |
 +--------------------------+ 

 +-----------------------------+
 | 200503130935                |
 | Jeremy                      |
 | *** jeremy string here **** |
 +-----------------------------+

 +----------------------------------+
 | 200504151625                     |
 | Clyde                            |
 | *** clyde's long string here *** |
 +----------------------------------+

Or of course you can just print the sorted groups bare:

 >>> lines.seek(0)
 >>> for t in sorted(tuple(g) for k, g in groupby(lines,
 ...                     lambda line:line.strip()!='') if k):
 ...     print ''.join(t)
 ...
 200501221530
 John
 *** long string here ***

 200503130935
 Jeremy
 *** jeremy string here ****

 200504151625
 Clyde
 *** clyde's long string here ***

 >>>

If your source of line groups is not delimited by blank lines,
or has other non-blank lines, you will have to change the source
or change the lambda to some other key function that produces one
value for the lines to include (True if you want to use if k as above)
and another (False) for the ones to exclude.

HTH

Regards,
Bengt Richter



More information about the Python-list mailing list