how to extract columns like awk $1 $5

Dan Valentine nobody at invalid.domain
Sat Jan 8 00:05:45 EST 2005


On Fri, 07 Jan 2005 12:15:48 -0500, Anand S Bisen wrote:

> Is there a simple way to extract words speerated by a space in python 
> the way i do it in awk '{print $4 $5}' . I am sure there should be some 
> but i dont know it.

i guess it depends on how faithfully you want to reproduce awk's behavior
and options.

as several people have mentioned, strings have the split() method for 
simple tokenization, but blindly indexing into the resulting sequence 
can give you an out-of-range exception.  out of range indexes are no
problem for awk; it would just return an empty string without complaint.

note that the index bases are slightly different: python sequences
start with index 0, while awk's fields begin with $1.  there IS a $0,
but it means the entire unsplit line.

the split() method accepts a separator argument, which can be used to
replicate awk's -F option / FS variable.

so, if you want to closely approximate awk's behavior without fear of
exceptions, you could try a small function like this:


def awk_it(instring,index,delimiter=" "):
  try:
    return [instring,instring.split(delimiter)[index-1]][max(0,min(1,index))]
  except:
    return ""


>>> print awk_it("a b c d e",0)
a b c d e

>>> print awk_it("a b c d e",1)
a

>>> print awk_it("a b c d e",5)
e

>>> print awk_it("a b c d e",6)


- dan



More information about the Python-list mailing list