OpenEMS multi-processor performance

How to use openEMS. Discussion on examples, tutorials etc

Moderator: thorsten

Garias
Posts: 14
Joined: Mon 13 Feb 2017, 19:11

Re: OpenEMS multi-processor performance

Post by Garias » Sun 28 May 2017, 21:35

Hi Thorsen:

This setup cost me £1500. It was put together buying used parts from reliable sellers in US and EU from decommissioned servers. The only new part was the chassis/PSU - EUR 470 from Germany) ) all the other stuff is not "last generation" but one step behind as prices drop considerably if you are not looking for the last bit (e.g. I can't afford nor need Opterons 6300, The 6200 are just fine for the project).

I have never used commercial SW (except TouchStone running on DOS 6.0 when I was in the University about 25 years ago).

At the moment, yes I''m exploring subwavelength structures by brute processing power at 1/150 wavelength meshing or even more in order to explore metamterials based scatterers (having 1/10 wavelength features/geometries) e.g. lens antennas or transmit arrays and radomes are my distant goal.

Yes, at the first setup stage the code manages to use all 64 cores (on the 4 sockets) pretty much at 100%. You probably didn't know that? Well done...!.
But yes, after that stage, almost all the cores are parked and they or the scheduler takes turns to run that only thread one at a time. I hope this is good information for you. I'm glad to contribute. I'm guessing why?
I have heard that HFSS does that as well, hence I thought this was a limitation proper from the FTDT setup process (agnostic to platforms).

If nothing can be done for avoiding these "one-thread" stages, well, that's a pity, because I love OpneEMS and its scripting flexibility (using equations instead of choosing from a rigid menu of shapes) and unlike you might think, I'm afraid don't have the kind of money for affording commercial SW like HFSS or Empire (EMPIRE XPU - 6 Months Rental - Gold Edition €8,000.00 and this is just "Rental"... ) .

The FTDT (after you invite us to grab a coffee...:-)) and the MCPS performance is not a concern, however, speed is always welcome, but the setup stage is taking quite long for these one-thread stages. (So, no plans for speed/parallel processing there? right)
(anyway I can only give this project a few hours a week, so the code can run while I'm at work)

Again thank you for developing this tool and the fun I get with it!

German
Last edited by Garias on Sun 28 May 2017, 23:01, edited 2 times in total.

thorsten
Posts: 1393
Joined: Mon 27 Jun 2011, 12:26

Re: OpenEMS multi-processor performance

Post by thorsten » Sun 28 May 2017, 22:43

So, absolute no chance of getting these "single-thread" phases exploiting multi-core platforms?
Are we talking about the setup stage? Because I never really cared too much for this stages as they are a tiny fraction of the main setup work.
And then again 99% of the time should be in the FDTD engine (after all coefficients have been calculated during setup) and this is always all cores enabled.
The only phase during simulation were this changes again is when e.g. large field dumps have to be processed... Do you have those?
That said, please clarify in which stages you are and for how long? Maybe use the "-v" (or "-vv") option for RunOpenEMS to get more details during setup...
How many cells does your simulation have? How long does the setup take and how fast does the simulation run? E.g. MC/s and TS/s?

regards
Thorsten

Garias
Posts: 14
Joined: Mon 13 Feb 2017, 19:11

Re: OpenEMS multi-processor performance

Post by Garias » Mon 29 May 2017, 01:10

Hi Thorsen,

Yes. The setup stage. But again, these are one of these "insane" cases as I'm meshing 1/75 wavelength and this is a really (exploratory) sort of "cellular" dielectric structure composed of little cubes of variable dielectric (This will be actually made by sub-sub wavelength perforations into a Ti Dioxide substrate dielectric).

I recently paste some of what I'm trying to do, into one of your tutorial examples ("Horn") but using the waveguide feed geometry only (I'm not deploying the horn itself, although there might be some of its variables defined, not used) but all the other stuff there. "Horn" example is a good test bed for what I'm trying to achieve, which is stuffing a complex metamaterial dielectric into a waveguide.

Agree that openEMS might not be intended for such purpose. However, it does it very well and with a bit of patience (and other things to do in the meantime) is perfectly stable for such undertaking.

Attached the code (guide_lens_xz_ln_v05_min_epsilon_1.txt) and the metamaterial script (zx_ln_lens_05.zip) it uses and I'm talking about. Want to give it a try?

The geometry might not make much sense, but I'm exploring the resources needed for such subwavelength structures rather than the geometries themselves

(You will recognize your "horn" tutorial example lines, plus the use of afine to position things in the right place. It is a bit "patchy" and I hate to show this because it is just an intermediate tool, not elegant and a few things are not correct, but it should give you an idea )

Best Regards
Attachments
guide_lens_xz_ln_v05_min_epsilon_1.txt
(9.98 KiB) Downloaded 244 times
zx_ln_lens_05.zip
(143.29 KiB) Downloaded 242 times

Garias
Posts: 14
Joined: Mon 13 Feb 2017, 19:11

Re: OpenEMS multi-processor performance

Post by Garias » Sun 04 Jun 2017, 21:07

[quote="thorsten"]

I just do not have the time and resources for engine optimization as these tools do...
Furthermore my goal with openEMS was never the fastest speed possible, but an engine that allows an easy way to extend with new experimental features.
But this open and flexible approach does not go well with as fast as possible as you can imagine...
If raw speed is what you care about you really should consider a commercial solver?

Let me know what you think...
........
Hi Thorsen (I'm not sure if I'm quoting correctly you above):

I'm trying to find whether myself or a colleague of mine, could even consider something in the line with what Empire does that seem to be some sort of "multiple time stepping and parallelization on all built-in cores" (runtime compilation and multiple time-stepping and exploiting NUMA Non-uniform memory access).

What sort of reading might you advise? Or even about the feasibility of this endeavor up to a certain extent? Obviously, I'm aware this should be the work of a team of experts and perhaps years of development.

The reason I'm looking for speed is that in the long run, I'd like to run Montecarlo or an evolutional algorithm (like genetic) on results/objectives.

BTW, thanks for your help with "SetMaterialWeight" in the other thread. Much appreciated.

German

thorsten
Posts: 1393
Joined: Mon 27 Jun 2011, 12:26

Re: OpenEMS multi-processor performance

Post by thorsten » Mon 05 Jun 2017, 16:38

I'm trying to find whether myself or a colleague of mine, could even consider something in the line with what Empire does that seem to be some sort of "multiple time stepping and parallelization on all built-in cores" (runtime compilation and multiple time-stepping and exploiting NUMA Non-uniform memory access).
Unfortunately the openEMS engine is designed in a way that all this approaches are really not applicable. For example everything except the pure/basic engine is done in engine extensions, that concept alone is not really compatible with that kind of speedup approach.
If you would want to do something like this, I think you would need to create your own new FDTD engine. And since everything is kind of linked, you would pretty much create your own new openEMS. Which would of course be fine with me :)

I'm of course not sure in what context you want to do your research, but most (or all?) commercial solver e.g. have special offers for students or university research too.
And e.g. Empire has a scripting interface too. I used that in the past and my openEMS interface was a bit inspired by it too..

regards
Thorsten

Rishabh
Posts: 3
Joined: Mon 07 Jan 2019, 13:05

Re: OpenEMS multi-processor performance

Post by Rishabh » Wed 16 Jan 2019, 18:34

Unfortunately the openEMS engine is designed in a way that all this approaches are really not applicable. For example everything except the pure/basic engine is done in engine extensions, that concept alone is not really compatible with that kind of speedup approach.

Hi,

I was wondering if there could be a way to modify (completely, if required) the basic engine such the multi-time stepping can be done while making only a few changes to the extensions. If you believe that this indeed has potential, I could implement the changes in the basic engine and start testing with the simplest extension first.
If successful, this would allow the speed-up of the remaining extensions without significant changes.

What are the points in the interfaces that you think can be looked into to get started with this approach? What kind of problems do you expect when trying to do this?


Regards,
Rishabh

Hale_812
Posts: 171
Joined: Fri 13 May 2016, 02:54

Re: OpenEMS multi-processor performance

Post by Hale_812 » Thu 17 Jan 2019, 06:58

Garias, let me express my scepticism about the configuration. It is exactly the configuration I always try to avoid. There are too many SLOW cores, and too little RAM, and channels per-core.
This machine is good for fast repetitive solving of short equations, working with small chunks of huge data (big data, realtime imaging-radar filters, neural networks...) but not good for large wave simulations.

In addition you should remember that AMD's fpu has better sin/tan precision (error distribution), but it is slower at the same clock.
Then, I had serious delays at DP Xeon 8+8 (E5-2687W)/ 4x16Gb per CPU, and in QP configuration with more RAM i/o transactions it will be much greater.

So, I quess, this machine was built not for OPENEMS in the first place, but for general computation algorithms, and there can be 8x Tesla K80... am I wrong?

OK, OpenEMS is good... But there are other tools. Even ADS/EMPro has serious FDTD wih adaptive nested meshing. And investitions in ADS kits will pay off with excess in future.

thorsten
Posts: 1393
Joined: Mon 27 Jun 2011, 12:26

Re: OpenEMS multi-processor performance

Post by thorsten » Thu 17 Jan 2019, 08:55

Hello Hale, I'm not sure what you try to accomplish, but your post sounds very negative to me and I do not really like that. Please give constructive feedback or none at all. If you want to buy a commercial FDTD solver with extremely expensive GPU's, please go ahead, but this is not what openEMS is about.

Back on topic:
could be a way to modify (completely, if required) the basic engine
Don't look at the basic engine (only for how it is done). You will need to create one more closely to the multithreaded+compression+SSE engine.
But for example using AVX(2) instead.
And with the extensions I'm not sure If they would really fit in there in any way. I think you have to make yourself very familiar with how the FDTD engine works first and then decide how to proceed...

regards
Thorsten

Hale_812
Posts: 171
Joined: Fri 13 May 2016, 02:54

Re: OpenEMS multi-processor performance

Post by Hale_812 » Fri 18 Jan 2019, 00:44

t>but your post sounds very negative to me and I do not really like that. Please give constructive feedback
I am commenting the discussion about "HFSS or Empire" you have started yourself
t>If you have these kind of resources (money) it would make sense to think about a commercial FDTD solver (e.g. Empire)?
So, I am not sure why it sounds negative to you. What I am saying is exactly opposite to negative. Speaking about ADS/EMPro, it is like Apple. Can you agree, that it is also a great tool for circuit simulation, PCB and chassis design, and SI/P,I EMC. Not many companies can offer such environment, if you are ready to spend THAT much money. That does not make others bad, or open software not great for academic and freelance work.
But the main point was about multi-CPU configuration mentioned, that I feel irrational for wave-simulations in general.
t>FDTD is pretty much not limited by the speed of your cpu, but the speed/bandwidth of your memory. That means, sometimes less threads fighting for the limited memory bandwidth can be better...

Michael
Posts: 9
Joined: Fri 29 Mar 2019, 18:41

Re: OpenEMS multi-processor performance

Post by Michael » Mon 01 Apr 2019, 16:31

Hello Thorsten,

this discussion has become somewhat diffusing, so I'm not sure if I understand it corretly.
I like to help speeding up the simulation engine, but it sounds that OpenEMS is not ready for this.

To my opinion OpenEMS has already all necessary features available. (Why is it still version 0.0.35?)
Now simulation time should be the most important subject.
Have you plans where to go with OpenEMS in general?

To your opinion, what is the best strategy for speed-up the simulation?
And would you like to see this in OpenEMS?
Or to ask the other way around, what kind of changes would you accept?

Anyway, a lot of thanks for this project!

Post Reply