Multiwfn forum

Multiwfn official website: http://sobereva.com/multiwfn. Multiwfn forum in Chinese: http://bbs.keinsci.com/wfn. E-mail of admin: sobereva[at]sina.com

You are not logged in.

#1 2022-12-15 13:31:45

i.s.ger
Member
Registered: 2020-12-01
Posts: 55

Patch: omp collapse(2) in grid.f90

Dear Tian,

As I mentioned on http://sobereva.com/wfnbbs/viewtopic.php?id=732 topic, I found a way to have a speed-up.

Here, the patch is presented. The patch is affected for machines with a large number of threads. Probably, a similar patch can be applied through the whole code.

Multiwfn_collapse.patch.txt

The effect of the patch I tested on 704atoms.wfn. Here, the speed-ups are presented for a different number of cores. The black line means ideal scale. After the patch, the ideal scale is up to 26 cores, while before only up to 19 (?).

collapse.png

Probably, for better scalability, I need a larger system (or a slower computer) since even for code without collapse, near 32 cores, time became about 5 seconds, and for `collapse(2)`, time became about 3 seconds for 32 cores.

Best regards,
Igor

Offline

#2 2022-12-16 06:02:06

sobereva
Tian Lu (Multiwfn developer)
From: Beijing
Registered: 2017-09-11
Posts: 1,622
Website

Re: Patch: omp collapse(2) in grid.f90

Dear Igor,

Thanks, I'll check and test shortly. I just infected with COVID-19 and my productivity has been greatly affected, so it may take longer time for me to give you reponse...

Best regards,

Tian

Offline

#3 2022-12-18 09:38:33

sobereva
Tian Lu (Multiwfn developer)
From: Beijing
Registered: 2017-09-11
Posts: 1,622
Website

Re: Patch: omp collapse(2) in grid.f90

Dear Igor,

collapse(2) is really fantastic! Your patch has been merged into official source code.

I tested 704atoms.wfn on my dual AMD EPYC 7R32 (96 physical cores) server, the costs using new version for calculating high quality grid data of electron density and ELF are 2s and 6s, respectively. While the costs using old version are 5s and 20s. The speed-up by collapse(2) on the server with large number of cores is surprisingly high!

However, I removed "if(mod(ifinish,256)==0)", otherwise after calculation I will observe

Calculation of grid data took up wall clock time         2 s-]   99.89 %     /

Namely the progress bar is not 100%. My brief test showed that removing "if(mod(ifinish,256)==0)" doesn't detectably hurt performance, at least on my 8-core notebook and 96-core server.

Best regards,

Tian

Offline

Board footer

Powered by FluxBB