toy list processing problem: collect similar terms

sln at netherlands.com sln at netherlands.com
Wed Oct 6 13:52:19 EDT 2010


On Sat, 25 Sep 2010 21:05:13 -0700 (PDT), Xah Lee <xahlee at gmail.com> wrote:

>here's a interesting toy list processing problem.
>
>I have a list of lists, where each sublist is labelled by
>a number. I need to collect together the contents of all sublists
>sharing
>the same label. So if I have the list
>
>((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
>r) (5 s t))
>
>where the first element of each sublist is the label, I need to
>produce:
>
>output:
>((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
[snip]

>anyone care to give a solution in Python, Perl, javascript, or other
>lang? am guessing the scheme solution can be much improved... perhaps
>using some lib but that seems to show scheme is pretty weak if the lib
>is non-standard.
>

Crossposting to Lisp, Python and Perl because the weird list of lists looks
like Lisp or something else, and you mention other languages so I'm throwing
this out for Perl.

It appears this string you have there is actually list syntax in another language.
If it is, its the job of the language to parse the data out. Why then do you
want to put it into another language form? At runtime, once the data is in variables,
dictated by the syntax, you can do whatever data manipulation you want
(combining arrays, etc..).

So, in the spirit of a preprocessor, given that the text is balanced, with proper closure,
ie:   ( (data) (data) )    is ok.
      ( data (data) )      is not ok.

the below does simple text manipulation, joining like labeled sublists, without going into
the runtime guts of internalizing the data itself. Internally, this is too simple.

-sln
-----------------
Alternate input:
(
  (
    (0 a b) (1 c d) (2 e f )
  )
  (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q r) (5 s t)
)
------------------
use strict;
use warnings;

my $input = <<EOI;
((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q r)
 (5 s t))
EOI
my $output = $input;

my $regxout = qr/
  ( (?: \( \s* [^()]+ \s* \) (\s*) )+ )
/x;


$output =~ 
s{ $regxout }
 {
    my ( $list, $format ) = ( $1, $2 );
    my ( %hseen,
         @order,
         $replace
    );
    while ($list =~  /\(\s* (\S+) \s* (.+?) \s*\)/xsg) {
        if ( exists $hseen{$1} ) {
            $hseen{$1} .= " $2";
            next;
        }
        push @order, $1;
        $hseen{$1} = $2;
    }
    for my $id (@order) {
        $replace .= "($hseen{$id}) ";
    }
    $replace =~ s/ $//;
    $replace . $format
 }xeg;

print "Input  -\n$input\n";
print "Output -\n$output";

__END__

Input  -
((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q r)
 (5 s t))

Output -
((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))





More information about the Python-list mailing list