joining rows

Tim Chase python.list at tim.thechases.com
Sat Dec 29 10:22:52 EST 2007


> A            1
>  A            2
>  A            3
>  B            1
>   C           2
>   D           3
>   D           4
> The result should be
> 
> A    1|2|3
> B    1
> C    2
> D    3|4
> 
> What should I do to get my results

Well, it depends on whether the resulting order matters.  If not,
you can use KM's suggestion of a dictionary:

  results = {}
  for line in file('in.txt'):
    k,v = line.rstrip('\n').split('\t')
    results.setdefault(k, []).append(v)
  for k,v in results.iteritems():
    print k, '|'.join(v)

If, however, order matters, you have to do it in a slightly
buffered manner.  It makes a difference when your input looks like

  A   1
  B   2
  A   3

which should yield its own input, rather than "A 1|3".  In this case

  last_key = ''
  values = []
  for line in file('in.txt'):
    k,v = line.rstrip('\n').split('\t')
    if last_key != k:
      if last_key:
        print last_key, '|'.join(values)
      last_key = k
      values = [v]
    else:
      values.append(v)
  if last_key:
    print last_key, '|'.join(values)

should do the job.  Which, if you like, can be reduced to a sed
one-liner

  sed ':a;N;s/^\([^\t]\+\)\(.*\)\n\1\t\+\(.*\)/\1\2|\3/;ta;P;D'

...if it doesn't make your modem hang up on you, or if line-noise
assembly-language is your thing ;)

-tkc










More information about the Python-list mailing list