[SciPy-User] Molecular Index Calculation

David Mikolas david.mikolas1 at gmail.com
Sun Apr 16 11:56:17 EDT 2017


I escaped a cluttered Fortran/C/Pascal/Basic/Perl jumble by "living" in
stackoverflow. Mostly by reading questions and answers, and learning how
not to get my questions closed in milliseconds. Break your program into
small pieces, and look for how each piece might be done differently.

If you try to find ways to do things without ever manually checking the
sizes of arrays, you'll naturally be guided towards "pythonic" methods and
techniques. There is almost always a better way.

Computers are so fast these days, don't worry about ultimate speed until
you really hit a brick wall. Letting python, numpy, or scipy methods handle
housekeeping for you means you are usually defaulting to very well written
compiled routines.

So for example, let python do the thinking...

r = np.sqrt((diff ** 2).sum(2))
r[r < 0.0001] = 0.0001
# or just
r = np.maximum(np.sqrt((diff ** 2).sum(2)), 0.0001)

# and

Zij = Z * Z[:, None]
triangle = np.triu(np.ones_like(Zij), k=1)
I = Zij*trangle.sum()

# or just

I = np.triu(Z * Z[:, None], k=1).sum()


On Sun, Apr 16, 2017 at 8:25 PM, Stephen P. Molnar <s.molnar at sbcglobal.net>
wrote:

> On 04/15/2017 04:07 PM, Hjalmar Turesson wrote:
>
>> Hi Stephen,
>>
>>
>> I see that N = 10 (ie. the length of Z), and that r is indexed by i and
>> j, which run up to N. However, r is only a 5x5 array. Is this correct?
>>
>> Otherwise, something like this might work:
>>
>> import numpy as np
>>
>> start=1
>> finish=31
>> points=300
>> s = np.linspace(start, finish, points)
>>
>> MASS =  np.array([12.011, 1.008, 79.9, 35.453, 18.998])
>>
>> r = np.array([[0., 2.059801, 3.60937686, 3.32591826, 2.81569212],
>>                [2.059801, 0., 4.71452879, 4.45776445, 4.00467382],
>>                [3.60937686, 4.71452879, 0., 5.66500917, 5.26602175],
>>                [3.32591826, 4.45776445, 5.66500917, 0., 5.02324896],
>>                [2.81569212, 4.00467382, 5.26602175, 5.02324896, 0.]])
>>
>> r[r == 0.] = 0.0001
>>
>> Z = np.array([6.0, 210.0, 102.0, 54.0, 35.0, 17.0, 9.0, 595.0, 315.0,
>> 153.0])
>>
>> N = Z.size
>>
>> I = np.zeros(s.shape)
>>
>> for i in range(1, N):
>>      for j in range(j):
>>          I += Z[i] * Z[j] * np.sin(s * r[i, j])/(s * r[i, j])
>>
>>
>> Do you have an example of the correct output?
>>
>> Best,
>> Hjalmar
>>
>> On Sat, Apr 15, 2017 at 6:29 AM, Stephen P. Molnar
>> <s.molnar at sbcglobal.net <mailto:s.molnar at sbcglobal.net>> wrote:
>>
>>     I am a newcomer to Python and am attempting to translate a FORTRAN
>>     program that I wrote about 20 years ago into Python, specifically
>> v-3.6.
>>
>>     Let me say that I am not asking someone to write the program for me,
>>     but only to point me is the right direction.
>>
>>     So far, I have bumbled my way to the point that I can get all of the
>>     input data resulting from a quantum mechanical calculation of a very
>>     simple organic molecule in to a Python program, but am encountering
>>     problems with processing the data.
>>
>>     The equation that I a want to evaluate (the one that I programmed in
>>     FORTRAN) is equation (7) in the attached scan of one of the pages of
>>     the following document.
>>
>>     Stephen P. Molnar and James W. King, Theory and Applications of the
>>     Integrated Molecular Transform and the Normalized Molecular Moment
>>     Structure Descriptors: QSAR and QSPR Paradigms, Int. J Quantum
>>     Chem., 85, 662 (2001).
>>
>>     the equation is
>>
>>     I(s) = sum from i=2toN sum from j=1toi-1 Z_sub_i*Z_sub_j *
>>
>>         sin  s*r_sub_ij
>>        -----------------  Equation (7)
>>             s*r_sub_ij
>>
>>     What I have managed to do so far is for a simple organic molecule
>>     containing 5 atoms (CHFClBr):
>>
>>     import numpy as np
>>     import numpy, pandas, scipy.spatial.distance as dist
>>
>>     r=[]        #distances between pairs of atoms
>>     z = []      #atomic numbers of atoms
>>     Z_dist = [] #products of successive atomic numbers
>>     I = []      #vector calculated in equation  (7)
>>
>>     start=1
>>     finish=31
>>     points=300
>>     s = np.linspace(start,finish,points)
>>
>>
>>     name = input("Enter Molecule ID: ")
>>     name = str(name)
>>     name_in = name+'.dat'
>>
>>     df = pandas.read_table(name_in, skiprows=2, sep=" ",
>>     skipinitialspace=True)
>>     z = numpy.array(df["ZA"])
>>     N =  numpy.ma.size(z)  ## of atoms in molecule
>>     a = numpy.array(df[["X", "Y", "Z"]])
>>     dist.squareform(dist.pdist(a, "euclidean"))
>>     anrows, ancols = np.shape(a)
>>     a_new = a.reshape(anrows, 1, ancols)
>>     diff = a_new - a
>>
>>     r = (diff ** 2).sum(2)
>>     r = np.sqrt(r)
>>
>>     for j in range(0,N-1):
>>           for k in range(j+1,N):
>>               Z_diff = (z[j]*z[k])
>>
>>
>>     For the test molecule I get the following:
>>
>>     MASS =  [ 12.011   1.008  79.9    35.453  18.998]  (Atomic weights
>>     of atoms)
>>
>>     r:  [ 0.          2.059801    3.60937686  3.32591826  2.81569212]
>>           [ 2.059801    0.          4.71452879  4.45776445  4.00467382]
>>           [ 3.60937686  4.71452879  0.          5.66500917  5.26602175]
>>           [ 3.32591826  4.45776445  5.66500917  0.          5.02324896]
>>           [ 2.81569212  4.00467382  5.26602175  5.02324896 0
>>     <tel:02324896%20%200>.        ]
>>
>>
>>     z:
>>
>>     6.0
>>     210.0
>>     102.0
>>     54.0
>>     35.0
>>     17.0
>>     9.0
>>     595.0
>>     315.0
>>     153.0
>>
>>     I have checked these calculations with a spreadsheet calculation,
>>     consequently I'm correct as far as I have gotten.
>>
>>     However, my problem is calculating the value of I for the 300
>>     x-coordinate values required for the molecular index calculation.
>>
>>     The problem that I am currently having is calculating the product of
>>     the pre-sine function and the sine term for the 300 values of s.
>>
>>     Unfortunately, the wealth of  function in Python makes it difficult
>>     for me to ascertain just how to proceed.  It seems that I have the
>>     problem of seeing the trees because of the size of the forest.  At
>>     this point Goggling only makes my confusion deeper.  Any pointers in
>>     the correct direction will be greatly appreciated.
>>
>>     Thanks in advance.
>>
>>     --
>>     Stephen P. Molnar, Ph.D.                Life is a fuzzy set
>>     www.molecular-modeling.net <http://www.molecular-modeling.net>
>>              Stochastic and multivariate
>>     (614)312-7528 (c)
>>     Skype: smolnar1
>>
>>
>>     _______________________________________________
>>     SciPy-User mailing list
>>     SciPy-User at python.org <mailto:SciPy-User at python.org>
>>     https://mail.python.org/mailman/listinfo/scipy-user
>>     <https://mail.python.org/mailman/listinfo/scipy-user>
>>
>>
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at python.org
>> https://mail.python.org/mailman/listinfo/scipy-user
>>
>> Hjalmar
>
> Thanks for your response.
>
> I'm sorry that my original request wasn't a bit clearer.  I presume that
> the paper I attached to the request help was deleted before the email was
> posted to the list.
>
> Here is how I incorporated your suggestion in the code.
>
> import numpy as np
> import numpy, pandas, scipy.spatial.distance as dist
>
> r = []
> s=[]
> z = []
> Z = []
> I = []
>
> start=1
> finish=31
> points=300
> s = np.linspace(start,finish,points)
> np.savetxt('Number of Points',s,delimiter=' ')
>
> name = input("Enter Molecule ID: ")
> name = str(name)
> name_in = name+'.dat'
>
> df = pandas.read_table(name_in, skiprows=2, sep=" ", skipinitialspace=True)
> Z = numpy.array(df["ZA"])
> print('z = ',z)
> N =  numpy.ma.size(z)
> a = numpy.array(df[["X", "Y", "Z"]])
> dist.squareform(dist.pdist(a, "euclidean"))
> anrows, ancols = np.shape(a)
> a_new = a.reshape(anrows, 1, ancols)
> diff = a_new - a
>
> r = (diff ** 2).sum(2)
> r = np.sqrt(z)
>
> r[r == 0.] = 0.0001
>
> N = Z.size
>
> I = np.zeros(s.shape)
>
> for i in range(1, N):
>     for j in range(j):
>         I += Z[i] * Z[j] * np.sin(s * r[i, j])/(s * r[i, j])
>         print('I: ',I)
>
> np.savetxt('I',I,delimiter=' ')
>
> The for j in range(j): has a marginal note (I'm using the Spyder IDE)
> "undefined name 'y' " and I is a vector with three hundred 0.0 terms.
>
> also, please note that I removed 'Z = np.array([6.0, 210.0, 102.0, 54.0,
> 35.0, 17.0, 9.0, 595.0, 315.0, 153.0])' from your suggestion as that is the
> result of the per-sine term 'Z[i] * Z[j]'
>
> Regards,
>
>         Steve
>
>
> --
> Stephen P. Molnar, Ph.D.                Life is a fuzzy set
> www.molecular-modeling.net              Stochastic and multivariate
> (614)312-7528 (c)
> Skype: smolnar1
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at python.org
> https://mail.python.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20170416/db3ab513/attachment.html>


More information about the SciPy-User mailing list