[scikit-learn] Using Scikit-Learn to predict magnetism in chemical systems

Bill Ross ross at cgl.ucsf.edu
Tue Mar 28 13:07:57 EDT 2017


I think I saw it in the Deep Learning book: http://www.deeplearningbook.org/

Bill

On 3/28/17 9:48 AM, Henrique C. S. Junior wrote:
> @Tommaso, this is something like Internal Coordinates[1], right?
> @Bill, thanks for the hint, I'll definitely take a look at this.
>
> [1] - https://en.wikipedia.org/wiki/Z-matrix_(chemistry) 
> <https://en.wikipedia.org/wiki/Z-matrix_%28chemistry%29>
>
> On Tue, Mar 28, 2017 at 2:12 AM, Bill Ross <ross at cgl.ucsf.edu 
> <mailto:ross at cgl.ucsf.edu>> wrote:
>
>     Image processing deals with xy coordinates by (as I understand)
>     training with multiple permutations of the raw data, in the form
>     of translations and rotations in the 2d space. If training with 3d
>     data, there would be that much more translating and rotating to
>     do, in order to divorce the learning from the incidentals.
>
>     Bill
>
>
>     On 3/27/17 4:35 PM, Tommaso Costanzo wrote:
>>     Dear Henrique,
>>     I am sorry for the poor email I wrote before. What I was saying
>>     is simply the fact that if you are trying to use the coordinates
>>     as "features" from an .xyz file then by machine learning you will
>>     learn at wich coordinate certain atoms will occur so you can only
>>     make prediction on the coordinate. However, if I correctly
>>     understood, the "features" representing the coupling J are
>>     distance, angle, and electron number. Definitely this properties
>>     can be derived from the XYZ file format from simple geometric
>>     calculations and the number of electrons will depend from the
>>     type of atom. So, what I was trying to say is that instead of
>>     using the XYZ file as input for scikit-learn, I was suggesting to
>>     do the calculation of angle, distances, electrons' number in
>>     advance (with other software(s) or directly in python)  and use
>>     the new calculated matrix as input for scikit-learn. In this case
>>     the machine will learn how J(AB) varies as a function of angle,
>>     distance, number of electrons.
>>     For example
>>
>>     distance     angle   n el.
>>     1                  90      1
>>     1                  90      1
>>     2                  90      1
>>     ....                ...        ...
>>
>>     If you are using a supervised learning you will have to add a 4th
>>     column ( in reality a separate column vector) with your J(AB) on
>>     which you can train your model and then predict the unknown samples
>>
>>     For example
>>     distance     angle   n el.    J(AB)
>>     1                  90      1        1
>>     1                  90      1        1
>>     2                  90      1         0.5
>>     ....                ...        ... ...
>>
>>     Now if you train the model on the second matrix, and then you try
>>     to predict the first one you should expect a results like:
>>
>>     1
>>     1
>>     0.5
>>
>>     Of course in this case the "features" are perfectly equal, hence
>>     the example is completely unrealistic. However, I hope that it
>>     will help to understand what I was explaining in the previous email.
>>     If you want you can directly contact me at this email, and I hope
>>     that you got additional hints from Robert, that he seems to be
>>     even more knowledgeable than me.
>>
>>     Sincerely
>>     Tommaso
>>
>>
>>
>>     2017-03-27 18:44 GMT-04:00 Henrique C. S. Junior
>>     <henriquecsj at gmail.com <mailto:henriquecsj at gmail.com>>:
>>
>>         Dear Tommaso, thank you for your kind reply.
>>         I know I have a lot to study before actually starting any
>>         code and that's why any suggestion is so valuable.
>>         So, you're suggesting that a simplification of the system
>>         using only the paramagnetic centers can be a good approach?
>>         (I'm not sure if I understood it correctly).
>>         My main idea was, at first, try to represent the systems as
>>         realistically as possible (using coordinates). I know that
>>         the software will not know what a bond is or what an
>>         intermolecular interaction is but, let's say, after including
>>         1000s of examples in the training, I was expecting that (as
>>         an example) finding a C 0.000 and an H at 1.000 should start
>>         to "make sense" because it leads to an experimental trend.
>>         And I totally agree that my way to represent the system is
>>         not the better.
>>
>>         Thank you so much for all the help.
>>
>>         On Mon, Mar 27, 2017 at 4:15 PM, Tommaso Costanzo
>>         <tommaso.costanzo01 at gmail.com
>>         <mailto:tommaso.costanzo01 at gmail.com>> wrote:
>>
>>             Dear Henrique,
>>
>>
>>             I agree with Robert on the use of a supervised algorithm
>>             and I would also suggest you to try a semisupervised one
>>             if you have trouble in labeling your data.
>>
>>
>>             Moreover, as a chemist I think that the input you are
>>             thinking to use is not the in the best form for machine
>>             learning because you are trying to predict coupling J(AB)
>>             but in the future space you have only coordinates (XYZ).
>>             What I suggest is to generate the pair of atoms
>>             externally and then use a matrix of the form (Mx3), where
>>             M are the pairs of atoms you want to predict your J and 3
>>             are the features of the two atoms (distance, angle,
>>             unpaired electrons). For a supervised approach you will
>>             need a training set where the J is know so your training
>>             data will be of the form Mx4 and the fourth feature will
>>             be the J you know.
>>
>>             Hope that this is clear, if not I will be happy to help more
>>
>>
>>             Sincerely
>>
>>             Tommaso
>>
>>
>>             2017-03-27 13:46 GMT-04:00 Henrique C. S. Junior
>>             <henriquecsj at gmail.com <mailto:henriquecsj at gmail.com>>:
>>
>>                 Dear Robert, thank you. Yes, I'd like to talk about
>>                 some specifics on the project.
>>                 Thank you again.
>>
>>                 On Mon, Mar 27, 2017 at 2:25 PM, Robert Slater
>>                 <rdslater at gmail.com <mailto:rdslater at gmail.com>> wrote:
>>
>>                     You definitely can use some of the tools in
>>                     sci-kit learn for supervised machine learning. 
>>                     The real trick will be how well your training
>>                     system is representative of your future
>>                     predictions. All of the various regression
>>                     algorithms would be of some value and you make
>>                     even consider an ensemble to help generalize.
>>                     There will be some important questions to
>>                     answer--what kind of loss function do you want to
>>                     look at?  I assumed regression (continuous
>>                     response) but it could also
>>                     classify--paramagnetic, diamagnetic,
>>                     ferromagnetic, etc...
>>
>>                     Another task to think about might be dimension
>>                     reduction.
>>                     There is no guarantee you will get fantastic
>>                     results--every problem is unique and much will
>>                     depend on exactly what you want out of the
>>                     solution--it may be that we get '10%' accuracy at
>>                     best--for some systems that is quite good, others
>>                     it is horrible.
>>
>>                     If you'd like to talk specifics, feel free to
>>                     contact me at this email.  I have a background in
>>                     magnetism (PhD in magnetic multilayers--i was
>>                     physics, but as you are probably aware chemisty
>>                     and physics blend in this area) and have a fairly
>>                     good knowledge of sci-kit learn and machine
>>                     learning.
>>
>>
>>
>>                     On Mon, Mar 27, 2017 at 10:50 AM, Henrique C. S.
>>                     Junior <henriquecsj at gmail.com
>>                     <mailto:henriquecsj at gmail.com>> wrote:
>>
>>                         I'm a chemist with some rudimentary
>>                         programming skills (getting started with
>>                         python) and in the middle of the year I'll be
>>                         starting a Ph.D. project that uses computers
>>                         to describe magnetism in molecular systems.
>>
>>                         Most of the time I get my results after
>>                         several simulations and experiments, so, I
>>                         know that one of the hardest tasks in
>>                         molecular magnetism is to predict the nature
>>                         of magnetic interactions. That's why I'll try
>>                         to tackle this problem with Machine Learning
>>                         (because such interactions are dependent,
>>                         basically, of distances, angles and number of
>>                         unpaired electrons). The idea is to feed the
>>                         computer with a large training set (with
>>                         number of unpaired electrons, XYZ coordinates
>>                         of each molecule and experimental magnetic
>>                         couplings) and see if it can predict the
>>                         magnetic couplings (J(AB)) of new systems:
>>
>>                         (see example in the attached image)
>>
>>                         Can Scikit-Learn handle the task, knowing
>>                         that the matrix used to represent atomic
>>                         coordinates will probably have a different
>>                         number of atoms (because some molecules have
>>                         more atoms than others)? Or is this a job
>>                         better suited for another software/approach? ​
>>
>>
>>                         -- 
>>                         *Henrique C. S. Junior*
>>                         Industrial Chemist - UFRRJ
>>                         M. Sc. Inorganic Chemistry - UFRRJ
>>                         Data Processing Center - PMP
>>                         Visite o Mundo Químico
>>                         <http://mundoquimico.com.br>
>>
>>                         _______________________________________________
>>                         scikit-learn mailing list
>>                         scikit-learn at python.org
>>                         <mailto:scikit-learn at python.org>
>>                         https://mail.python.org/mailman/listinfo/scikit-learn
>>                         <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>                     _______________________________________________
>>                     scikit-learn mailing list
>>                     scikit-learn at python.org
>>                     <mailto:scikit-learn at python.org>
>>                     https://mail.python.org/mailman/listinfo/scikit-learn
>>                     <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>
>>                 -- 
>>                 *Henrique C. S. Junior*
>>                 Industrial Chemist - UFRRJ
>>                 M. Sc. Inorganic Chemistry - UFRRJ
>>                 Data Processing Center - PMP
>>                 Visite o Mundo Químico <http://mundoquimico.com.br>
>>
>>                 _______________________________________________
>>                 scikit-learn mailing list
>>                 scikit-learn at python.org <mailto:scikit-learn at python.org>
>>                 https://mail.python.org/mailman/listinfo/scikit-learn
>>                 <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>
>>             -- 
>>             Please do NOT send Microsoft Office Attachments:
>>             http://www.gnu.org/philosophy/no-word-attachments.html
>>             <http://www.gnu.org/philosophy/no-word-attachments.html>
>>
>>             _______________________________________________
>>             scikit-learn mailing list
>>             scikit-learn at python.org <mailto:scikit-learn at python.org>
>>             https://mail.python.org/mailman/listinfo/scikit-learn
>>             <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>
>>         -- 
>>         *Henrique C. S. Junior*
>>         Industrial Chemist - UFRRJ
>>         M. Sc. Inorganic Chemistry - UFRRJ
>>         Data Processing Center - PMP
>>         Visite o Mundo Químico <http://mundoquimico.com.br>
>>
>>         _______________________________________________
>>         scikit-learn mailing list
>>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>>         https://mail.python.org/mailman/listinfo/scikit-learn
>>         <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>
>>     -- 
>>     Please do NOT send Microsoft Office Attachments:
>>     http://www.gnu.org/philosophy/no-word-attachments.html
>>     <http://www.gnu.org/philosophy/no-word-attachments.html>
>>
>>
>>     _______________________________________________
>>     scikit-learn mailing list
>>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>>     https://mail.python.org/mailman/listinfo/scikit-learn
>>     <https://mail.python.org/mailman/listinfo/scikit-learn>
>     _______________________________________________ scikit-learn
>     mailing list scikit-learn at python.org
>     <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>     <https://mail.python.org/mailman/listinfo/scikit-learn> 
>
> -- 
> *Henrique C. S. Junior* Industrial Chemist - UFRRJ
> M. Sc. Inorganic Chemistry - UFRRJ Data Processing Center - PMP
> Visite o Mundo Químico <http://mundoquimico.com.br>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170328/b75a3b31/attachment-0001.html>


More information about the scikit-learn mailing list