convert script awk in python

Avi Gross avigross at verizon.net
Wed Mar 24 12:00:03 EDT 2021


Alan,

Back when various UNIX (later also included in other Operating environments
like Linux and the Mac OS and even Microsoft) utilities came along, the
paradigm was a bit different and some kinds of tasks were seen as being done
with a pipeline of often small and focused utilities. You mentioned SED
which at first seems like a very simple tool but if you look again, it can
replace lots of other tools mostly as you can write one-liners with lots of
power. AWK, in some sense, was even more powerful and can emulate so many
others.

But it came with a cost compared to some modern languages where by attaching
a few modules, you can do much of the same in fewer passes over the data.

I am not sure if I mentioned it here, but I was once on a project that
stored all kinds of billing information in primitive text files using a
vertical bar as  record separator. My boss, who was not really a programmer,
started looking at analyzing the data fairly primitively ended up writing
huge shell scripts (ksh, I think) that remotely went to our computers around
the world and gathered the files and processed them through pipelines that
often were 10 or more parts as he selectively broke each line into parts,
removed some and so on. He would use /bin/echo, cut, grep, sed, and so on.
The darn thing ran for hours which was fine when it was running at midnight
in Missouri, but not so much when it ran the same time in countries like
Japan and Israel where the users were awake. I got lots of complaints and
showed him how his entire mess could be replaced mostly by a single AWK
script and complete in minutes.

Of course, now, with a fast internet and modern languages that can run
threads in parallel, it probably would complete in seconds. Maybe I would
have translated that AWK to python after all, but these days I am studying
Kotlin so maybe ...

As I see it, many languages have a trade-off. The fact that AWK decided to
allow a variable to be used without any other form of declaration, was a
feature. It could easily lead to errors if you spelled something wrong. But
look at Python. You can use a variable to hold anything just by using it. If
you spell it wrong later when putting something else in it, no problem. You
now have two variables. If you try to access the value of a non-initialized
variable, you get an error. But many more strongly-typed languages would
catch more potential errors. If you store an int in a variable and later
mistakenly put a string in the same variable name, python is happy. And that
can be a GOOD feature for programmers but will not catch some errors.
Initializing variables to 0 really only makes sense for numeric variables.
When a language allows all kinds of "objects" you might need an
object-specific default initialization and for some objects, that makes no
sense. As you note, the POSIX compliant versions of AWK do also initialize,
if needed, to empty strings.

But I wonder how much languages like AWK are still used to make new programs
as compared to a time they were really useful. So many people sort of live
within one application in a GUI rather than work at a textual level in a
shell where many problems can rapidly be done with a few smaller tools,
often in a pipeline.

Avi

-----Original Message-----
From: Python-list <python-list-bounces+avigross=verizon.net at python.org> On
Behalf Of Alan Gauld via Python-list
Sent: Wednesday, March 24, 2021 5:28 AM
To: python-list at python.org
Subject: Re: convert script awk in python

On 23/03/2021 14:40, Avi Gross via Python-list wrote:

> $1 == 113 {
>     if (x || y || z)
>         print "More than one type $8 atom.";
>     else {
>         x = $2; y = $3; z = $4;
>         istep++;
>     }
> }
> 
> I am a tod concerned as to where any of the variables x, y or z have 
> been defined at this point.

They haven't been, they are using awk's auto-initialization feature.
The variables are defined in this bit of code. The first time we see $1 ==
113 we define the variables. On subsequent appearances we print the warning.

> far as I know has not been called. Weird. Maybe awk is allowing an 
> uninitialized variable to be tested for in your code but if so, you 
> need to be cautious how you do this in python.

It's standard behaviour in any POSIX compliant awk, variables are
initialised to empty strings/arrays or zero as appropriate to first use.

The original AWK book has already been mentioned, which covers nawk.
I'll add the O'Reilly book "sed & awk" which covers the POSIX version and
includes several extensions not covered in the original book. (It also
covers sed but that's irrelevant here)

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


-- 
https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list