[Numpy-discussion] The mu.py script will keep running and never end.

Mon Oct 12 08:37:28 EDT 2020

On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
<evgeny.burovskiy at gmail.com> wrote:
>
> On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
> <evgeny.burovskiy at gmail.com> wrote:
> >
> > The script seems to be computing the particle numbers for an array of chemical potentials.
> >
> > Two ways of speeding it up, both are likely simpler then using dask:
> >
> > First: use numpy
> >
> > 1. Move constructing mu_all out of the loop (np.linspace)
> > 2. Arrange the integrands into a 2d array
> > 3. np.trapz along an axis which corresponds to a single integrand array
> > (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)
>
>
> Roughly like this:
> https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99

I've done the comparison of the real execution time for your version
I've compared the execution efficiency of your above method and the
original method of the python script by directly using fermi() without
executing vectorize() on it. Very surprisingly, the latter is more
efficient than the former, see following for more info:

$ time python fermi_integrate_np.py
[[1.03000000e+01 4.55561775e+17]
 [1.03001000e+01 4.55561780e+17]
 [1.03002000e+01 4.55561786e+17]
 ...
 [1.08997000e+01 1.33654085e+21]
 [1.08998000e+01 1.33818034e+21]
 [1.08999000e+01 1.33982054e+21]]

real    1m8.797s
user    0m47.204s
sys    0m27.105s
$ time python mu.py
[[1.03000000e+01 4.55561775e+17]
 [1.03001000e+01 4.55561780e+17]
 [1.03002000e+01 4.55561786e+17]
 ...
 [1.08997000e+01 1.33654085e+21]
 [1.08998000e+01 1.33818034e+21]
 [1.08999000e+01 1.33982054e+21]]

real    0m38.829s
user    0m41.541s
sys    0m3.399s

So, I think that the benchmark dataset used by you for testing code
efficiency is not so appropriate. What's your point of view on this
testing results?

Regards,
HY

>
>
>
> > Second:
> >
> > Move the loop into cython.
> >
> >
> >
> >
> > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <hongyi.zhao at gmail.com>:
> >>
> >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <andrea.gavana at gmail.com> wrote:
> >> >
> >> >
> >> >
> >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <hongyi.zhao at gmail.com> wrote:
> >> >>
> >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <andrea.gavana at gmail.com> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <andrea.gavana at gmail.com> wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <hongyi.zhao at gmail.com> wrote:
> >> >> >>>
> >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <robert.kern at gmail.com> wrote:
> >> >> >>> >
> >> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
> >> >> >>>
> >> >> >>> Yes, it really does the trick. See the following for the benchmark
> >> >> >>> based on your suggestion:
> >> >> >>>
> >> >> >>> $ time python mu.py
> >> >> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
> >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>>
> >> >> >>> real    0m41.056s
> >> >> >>> user    0m43.970s
> >> >> >>> sys    0m3.813s
> >> >> >>>
> >> >> >>>
> >> >> >>> But are there any ways to further improve/increase efficiency?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
> >> >> >>
> >> >> >> Andrea.
> >> >> >
> >> >> >
> >> >> >
> >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
> >> >>
> >> >> Still not so clear on your solutions for this problem. Could you
> >> >> please post here the corresponding snippet of your enhancement?
> >> >
> >> >
> >> > I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
> >> >
> >> > The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -
> >>
> >> The size of the “energy” and “DOS” array is Problem-related and
> >> shouldn't be reduced arbitrarily.
> >>
> >> > not to mention calling trapz 6000 times in a loop.
> >>
> >> I'm currently thinking on parallelization the execution of the for
> >> loop, say, with joblib <https://github.com/joblib/joblib>, but I still
> >> haven't figured out the corresponding codes. If you have some
> >> experience on this type of solution, could you please give me some
> >> more hints?
> >>
> >> >  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
> >> >
> >> >>
> >> >>
> >> >> Regards,
> >> >> HY
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>>
> >> >> >>>
> >> >> >>> Regards,
> >> >> >>> HY
> >> >> >>>
> >> >> >>> >
> >> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <hongyi.zhao at gmail.com> wrote:
> >> >> >>> >>
> >> >> >>> >> Hi,
> >> >> >>> >>
> >> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
> >> >> >>> >> try to run the script
> >> >> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
> >> >> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
> >> >> >>> >> terminate it, it will give the following output:
> >> >> >>> >>
> >> >> >>> >> $ python mu.py
> >> >> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
> >> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>> >>
> >> >> >>> >> I have to terminate it and obtained the following information:
> >> >> >>> >>
> >> >> >>> >> ^CTraceback (most recent call last):
> >> >> >>> >>   File "mu.py", line 38, in <module>
> >> >> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
> >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >> >>> >> line 2108, in __call__
> >> >> >>> >>     return self._vectorize_call(func=func, args=vargs)
> >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >> >>> >> line 2192, in _vectorize_call
> >> >> >>> >>     outputs = ufunc(*inputs)
> >> >> >>> >>   File "mu.py", line 8, in fermi
> >> >> >>> >>     return 1./(exp((E-mu)/kT)+1)
> >> >> >>> >> KeyboardInterrupt
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Any helps and hints for this problem will be highly appreciated?
> >> >> >>> >>
> >> >> >>> >> Regards,
> >> >> >>> >> --
> >> >> >>> >> Hongyi Zhao <hongyi.zhao at gmail.com>
> >> >> >>> >> _______________________________________________
> >> >> >>> >> NumPy-Discussion mailing list
> >> >> >>> >> NumPy-Discussion at python.org
> >> >> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >>> >
> >> >> >>> > _______________________________________________
> >> >> >>> > NumPy-Discussion mailing list
> >> >> >>> > NumPy-Discussion at python.org
> >> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> --
> >> >> >>> Hongyi Zhao <hongyi.zhao at gmail.com>
> >> >> >>> _______________________________________________
> >> >> >>> NumPy-Discussion mailing list
> >> >> >>> NumPy-Discussion at python.org
> >> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >
> >> >> > _______________________________________________
> >> >> > NumPy-Discussion mailing list
> >> >> > NumPy-Discussion at python.org
> >> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Hongyi Zhao <hongyi.zhao at gmail.com>
> >> >> _______________________________________________
> >> >> NumPy-Discussion mailing list
> >> >> NumPy-Discussion at python.org
> >> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >
> >> > _______________________________________________
> >> > NumPy-Discussion mailing list
> >> > NumPy-Discussion at python.org
> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >>
> >>
> >>
> >> --
> >> Hongyi Zhao <hongyi.zhao at gmail.com>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at python.org
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-- 
Hongyi Zhao <hongyi.zhao at gmail.com>