convert script awk in python

Avi Gross avigross at verizon.net
Tue Mar 23 10:40:01 EDT 2021


Alberto,

To convert any algorithm to python (or anything else) you have to understand
it. Do you know what AWK is doing? And does the darn thing work already in
awk? Why do you need to convert it? My suspicion is that it has errors and
if so, it is NOT about converting at all.

I will not solve this for you except to outline what it does.

AWK is designed to read in data and organize it into fields based on rules.
It then sees patterns to match against each clump of data (usually a line)
and performs actions.

The beginning of your shown program simply defined helper functions before
the main program begins. You will need to translate them individually. I
would not have written and formatted it the way you show.

function sq(x) {
    return x * x;
}

function dist(x1, y1, z1, x2, y2, z2) {
    return sqrt(sq(x1 - x2) + sq(y1 - y2) + sq(z1 - z2)); }

function print_distances() {
    if (na == 0)
        print "No type 8 atoms.";
    else {
        min = 1000;
        for (a = 0; a < na; a++) {
            d = dist(x, y, z, pos[a,"x"], pos[a,"y"], pos[a,"z"]);
#            printf "%7.5f ", d;
            if (d < min) min = d;
        }
        printf "%6i    %7.5f\n", istep, min;
        x = y = z = 0;
        delete pos;
        na = 0;
    }
}

OK, so far? You need to make changes in the above as needed or completely
redo the algorithms in python but first you must recognize what the above
are.

Now the main AWK call is hidden and looks like:

awk 'PROGRAM' $1.lammpstrj > $1_mindist.txt

The above says that AWK is called with a filename the shell script makes
using an argument to the script. The shell also sends the standard output
from awk to a second file with a name based on the same argument. So your
python program needs to open the same file but write to standard output OR
you can rewrite it any way you please to get the same result.

The rest of the program is patterns and actions. But AWK is doing all kinds
of things for you invisibly that you now need to do explicitly in python.

You need to open a file and read a line at a time in a loop. The line must
be parsed into fields the same way AWK would have done. There probably is a
simple package that can do that or you do it by hand.

Now the rest of the program is patterns like:

$1 == 113

$8 == 10

And a more complex one I defer. The above has an action it wants to do for
any line that has a first part exactly equal to 113. The second wants an 8th
part equal to 10. In python, once you have figured out what those parts are
(as well as the other parts) you need to test using something like an "if"
statement for that condition and do what action follows. Here is the
condition then action for the first clause:

$1 == 113 {
    if (x || y || z)
        print "More than one type $8 atom.";
    else {
        x = $2; y = $3; z = $4;
        istep++;
    }
}

I am a tod concerned as to where any of the variables x, y or z have been
defined at this point. I have not seen a BEGIN {...} pattern/action or
anywhere these have been initialized but they are set in a function that as
far as I know has not been called. Weird. Maybe awk is allowing an
uninitialized variable to be tested for in your code but if so, you need to
be cautious how you do this in python.

Be that as it may, I am explaining what I think I see. PATTERN ACTION.

So the next pattern/action is:

$8 == 10 {
    pos[na,"x"] = $2; pos[na,"y"] = $3; pos[na,"z"] = $4;
    na += 1;
}

And the next one is a bit more complex:

/^ITEM: ATOMS/ && na != 0 { print_distances(); }

It does pattern matching on the original line and asks for the text being
looked for to be at the beginning of the line. So you need to learn how to
ask python to match that pattern. And note the && joins two Boolean parts.

All the above patterns are checked for on every line and you need to know
what AWK does once it matches. Does the code above ask you to move to the
next line when it matches or do every applicable action?

So you have been in a loop and when you reach the end of the file, you need
to get out of the loop. Only then does the final pattern match:

END                       { print_distances(); }

I hope some of that was helpful. To do the full job you need to do way more
than translate how AWK does something like use curly braces for grouping
while python uses indentations. AWK is designed to be a filter in the ways
described above and does many helpful things behind the scenes. You need to
do all that work for yourself in python.

Of course, having said all that, I know this is a common enough problem that
people have solved by making modules that do awk-like activities in python
but I have no experience there. Had you done a search for something like
this, an answer might have presented itself that involves less work for you,
unless this is homework that needs to be done by you:

https://www.google.com/search?q=python+awk+module&sxsrf=ALeKk03gD2jZYJkZ0cGv
zbKlErWzQJ5Spw%3A1616510303610&source=hp&ei=X_1ZYJPDIobI5gKMk4CACA&iflsig=AI
NFCbYAAAAAYFoLb50VZVAododj5tTkC9AtICpv08Aw&oq=python+awk+module&gs_lcp=Cgdnd
3Mtd2l6EAMyBggAEBYQHjoHCCMQ6gIQJzoHCC4Q6gIQJzoECCMQJzoFCAAQsQM6CwguELEDEMcBE
KMCOggIABCxAxCDAToCCAA6BQguELEDUNobWLhGYIFIaAFwAHgAgAFWiAH2CJIBAjE3mAEAoAEBq
gEHZ3dzLXdperABCg&sclient=gws-wiz&ved=0ahUKEwjT7q2T0sbvAhUGpFkKHYwJAIAQ4dUDC
Ak&uact=5




-----Original Message-----
From: Python-list <python-list-bounces+avigross=verizon.net at python.org> On
Behalf Of alberto
Sent: Tuesday, March 23, 2021 7:32 AM
To: python-list at python.org
Subject: convert script awk in python

Hi to everyone I have an awk script that calculate minimum distances between
points 

## atom type frag - atom type surface
#!/bin/bash

FILE1=$1.lammpstrj

if [ -f $FILE1 ];
then

awk 'function sq(x) {
    return x * x;
}
function dist(x1, y1, z1, x2, y2, z2) {
    return sqrt(sq(x1 - x2) + sq(y1 - y2) + sq(z1 - z2)); } function
print_distances() {
    if (na == 0)
        print "No type 8 atoms.";
    else {
        min = 1000;
        for (a = 0; a < na; a++) {
            d = dist(x, y, z, pos[a,"x"], pos[a,"y"], pos[a,"z"]);
#            printf "%7.5f ", d;
            if (d < min) min = d;
        }
        printf "%6i    %7.5f\n", istep, min;
        x = y = z = 0;
        delete pos;
        na = 0;
    }
}
$1 == 113 {
    if (x || y || z)
        print "More than one type $8 atom.";
    else {
        x = $2; y = $3; z = $4;
        istep++;
    }
}
$8 == 10 {
    pos[na,"x"] = $2; pos[na,"y"] = $3; pos[na,"z"] = $4;
    na += 1;
}
/^ITEM: ATOMS/ && na != 0 { print_distances(); }
END                       { print_distances(); }
' $1.lammpstrj > $1_mindist.txt
fi

where $1 is a particular atom and $8 is a other type of atoms

How could I prepare  a python script 

regards

A
--
https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list