[Numpy-discussion] The mu.py script will keep running and never end.

Mon Oct 12 10:49:25 EDT 2020

On Mon, Oct 12, 2020 at 10:41 PM Andrea Gavana <andrea.gavana at gmail.com> wrote:
>
> Hi,
>
> On Mon, 12 Oct 2020 at 16.22, Hongyi Zhao <hongyi.zhao at gmail.com> wrote:
>>
>> On Mon, Oct 12, 2020 at 9:33 PM Andrea Gavana <andrea.gavana at gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > On Mon, 12 Oct 2020 at 14:38, Hongyi Zhao <hongyi.zhao at gmail.com> wrote:
>> >>
>> >> On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
>> >> <evgeny.burovskiy at gmail.com> wrote:
>> >> >
>> >> > On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
>> >> > <evgeny.burovskiy at gmail.com> wrote:
>> >> > >
>> >> > > The script seems to be computing the particle numbers for an array of chemical potentials.
>> >> > >
>> >> > > Two ways of speeding it up, both are likely simpler then using dask:
>> >> > >
>> >> > > First: use numpy
>> >> > >
>> >> > > 1. Move constructing mu_all out of the loop (np.linspace)
>> >> > > 2. Arrange the integrands into a 2d array
>> >> > > 3. np.trapz along an axis which corresponds to a single integrand array
>> >> > > (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)
>> >> >
>> >> >
>> >> > Roughly like this:
>> >> > https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99
>> >>
>> >> I've done the comparison of the real execution time for your version
>> >> I've compared the execution efficiency of your above method and the
>> >> original method of the python script by directly using fermi() without
>> >> executing vectorize() on it. Very surprisingly, the latter is more
>> >> efficient than the former, see following for more info:
>> >>
>> >> $ time python fermi_integrate_np.py
>> >> [[1.03000000e+01 4.55561775e+17]
>> >>  [1.03001000e+01 4.55561780e+17]
>> >>  [1.03002000e+01 4.55561786e+17]
>> >>  ...
>> >>  [1.08997000e+01 1.33654085e+21]
>> >>  [1.08998000e+01 1.33818034e+21]
>> >>  [1.08999000e+01 1.33982054e+21]]
>> >>
>> >> real    1m8.797s
>> >> user    0m47.204s
>> >> sys    0m27.105s
>> >> $ time python mu.py
>> >> [[1.03000000e+01 4.55561775e+17]
>> >>  [1.03001000e+01 4.55561780e+17]
>> >>  [1.03002000e+01 4.55561786e+17]
>> >>  ...
>> >>  [1.08997000e+01 1.33654085e+21]
>> >>  [1.08998000e+01 1.33818034e+21]
>> >>  [1.08999000e+01 1.33982054e+21]]
>> >>
>> >> real    0m38.829s
>> >> user    0m41.541s
>> >> sys    0m3.399s
>> >>
>> >> So, I think that the benchmark dataset used by you for testing code
>> >> efficiency is not so appropriate. What's your point of view on this
>> >> testing results?
>> >
>> >
>> >
>> >   Evgeni has provided an interesting example on how to speed up your code - granted, he used toy data but the improvement is real. As far as I can see, you haven't specified how big are your DOS etc... vectors, so it's not that obvious how to draw any conclusions. I find it highly puzzling that his implementation appears to be slower than your original code.
>> >
>> > In any case, if performance is so paramount for you, then I would suggest you to move in the direction Evgeni was proposing, i.e. shifting your implementation to C/Cython or Fortran/f2py.
>>
>> If so, I think that the C/Fortran based implementations should be more
>> efficient than the ones using Cython/f2py.
>
>
> That is not what I meant: what I meant is: write the time consuming part of your code in C or Fortran and then bridge it to Python using Cython or f2py.

I understand your meaning, but I think that for such small job, why
not do them with pure C/Fortran if we must bother them?

All the best,
-- 
Hongyi Zhao <hongyi.zhao at gmail.com>