converting a sed / grep / awk / . . . bash pipe line into python

hofer blabla at dungeon.de
Tue Sep 2 13:36:50 EDT 2008


Hi,

Something I have to do very often is filtering / transforming line
based file contents and storing the result in an array or a
dictionary.

Very often the functionallity exists already in form of a shell script
with sed / awk / grep , . . .
and I would like to have the same implementation in my script

What's a compact, efficient (no intermediate arrays generated /
regexps compiled only once) way in python
for such kind of 'pipe line'

Example 1 (in bash):  (annotated with comment (thus not working) if
copied / pasted
#-------------------------------------------------------------------------------------------
cat file \                   ### read from file
| sed 's/\.\..*//' \        ### remove '//' comments
| sed 's/#.*//' \           ### remove '#' comments
| grep -v '^\s*$'  \        ### get rid of empty lines
| awk '{ print $1 + $2 " " $2 }' \ ### knowing, that all remaining
lines contain always at least
\                                           ### two integers calculate
sum and 'keep' second number
| grep '^42 '                 ### keep lines for which sum is 42
| awk '{ print $2 }'         ### print number

Same example in perl:
# I guess (but didn't try), taht the perl example will create more
intermediate
# data structures than necessary.
# Ideally the python implementation shouldn't do this, but just
'chain' iterators.
#-------------------------------------------------------------------------------------------
my $filename= "file";
open(my $fh,$filename) or die "failed opening file $filename";

# order of 'pipeline' is syntactically reversed (if compared to shell
script)
my @numbers =
    map { $_->[1] }               # extract num 2
    grep { $_->[0] == 42       }  # keep lines with result 42
    map { [ $_->[0]+$_->[1],$_->[1] ]  }  # calculate sum of first two
nums and keep second num
    map { [ split(' ',$_,3) ]  }  # split by white space
    grep { ! ($_ =~ /^\s*$/) }    # remove empty lines
    map { $_ =~ s/#.*//    ; $_}  # strip '#' comments
    map { $_ =~ s/\/\/.*// ; $_}  # strip '//' comments
    <$fh>;
print "Numbers are:\n",join("\n", at numbers),"\n";

thanks in advance for any suggestions of how to code this (keeping the
comments)


H







More information about the Python-list mailing list