Page 1 of 1

Potential Speed Issues

Posted: Wed 19 Feb 2020, 18:19
by GeraldCarda
I am just revisiting openEMS after some time (~2 years). I did a fresh install from git sources and did run:

./update_openEMS.sh ~/opt/openEMS --with-hyp2mat --with-CTB --with-MPI

This is the output of the compiled binary:

----------------------------------------------------------------------
| openEMS 64bit -- version v0.0.35-62-gbb235b2
| (C) 2010-2018 Thorsten Liebig <thorsten.liebig@gmx.de> GPL license
----------------------------------------------------------------------
Used external libraries:
CSXCAD -- Version: v0.6.2-88-g3818a03
hdf5 -- Version: 1.10.0
compiled against: HDF5 library version: 1.10.0-patch1
tinyxml -- compiled against: 2.6.2
fparser
boost -- compiled against: 1_65_1
vtk -- Version: 6.3.0
compiled against: 6.3.0

Usage: openEMS <FDTD_XML_FILE> [<options>...]


As the next step I run some small PCB structures (using hyp2mat). Simulation runs as expected, but at a low? speed. The system I am using is a 24 core machine (NUMA 2x12 cores). I end up with a speed of around 16 MC/s. Using 'htop' I can see 24 openEMS process, but the load average is just about 4. Each core is showing a utilization of about 20% only.

Does anybody have an idea what goes wrong here? Is there a way to check that MPI works correctly?

Best regards,
Gerald

Re: Potential Speed Issues

Posted: Thu 20 Feb 2020, 02:28
by Hale_812
I use it in Windows... But I would try these solver parameters first:

--engine=multithreaded %engine using compressed operator + sse vector extensions + MPI + multithreading
--numThreads=<n> %Force use n threads for multithreaded engine
--dump-statistics %dump simulation statistics to 'openEMS_run_stats.txt' and 'openEMS_stats.txt'

Re: Potential Speed Issues

Posted: Fri 21 Feb 2020, 11:39
by GeraldCarda
Thanks for your hints. I did some test using different values for 'numTreads'. Here are my results:

Threads,MC/s,CPU
1,26.8,100
2,43.3,185
4,57.5,340
6,64.4,430
8,62.3,560
10,60.8,630
12,57.3,720
14,55.1,780
16,50.4,820
18,44.4,840
20,33.7,720
22,23.6,640
24,18.8,550

I created a plot too, but did not find a easy way to attach it (You may just copy/paste the above table in to a Calc/Excel and recreate the diagram). Highest MC/s can be found for 6 threads. This matches the number of pyhsical cores for a single socket of the machine I am using. The limiting factor seems to be the data (memory) sharing at the boundery areas of the partitioned problem. I did not look at the code yet to verify this assumption.

For now I will just use the --numTreads=6 for my tests.

There may be other options like bounding each thread to a fixed core id, etc... Any ideas welcome.

Regards,
Gerald

Re: Potential Speed Issues

Posted: Fri 21 Feb 2020, 17:25
by thorsten
Hello Gerald,

yes the FDTD method is mostly memory bandwidth limited and letting 24 threads fight for the limited bandwidth is not good.
Limiting one simulation to only use one socket may actually help too and you may be able to run two openEMS simulations simultaneous this way...
But I never had a multi socket system and I'm not sure how to limit a process to a specific socket, but I'm pretty sure it is possible on Linux.

regards
Thorsten