Possible speed optimization for PEC?

Discussion about new features and development support

Moderator: thorsten

frankst
Posts: 32
Joined: Thu 21 Jul 2016, 11:36

Possible speed optimization for PEC?

Post by frankst » Thu 15 Jun 2017, 14:32

Hi Thorsten,

Is it possible that openEMS always calculates fields for all cells, even the ones which are a priori known to be field free, e.g. cells within PEC?

Wouldn't it be possible to exclude such cells from calculation?
For example, during setup phase PEC cells which are fully surrounded by other PEC cells could be marked to be excluded from calculation.

Cheers
Frank

thorsten
Posts: 1425
Joined: Mon 27 Jun 2011, 12:26

Re: Possible speed optimization for PEC?

Post by thorsten » Fri 16 Jun 2017, 08:24

Hello Frank,

this could be done of course. But keep in mind any "if" clause inside the main engine is very computational expensive. Since the coefficients and fields are zero the CPU is very quick to decide the answer and I feel that it is faster to just run over it then have more complicated loops or (never do it) having if statements in your core.
I even have cases where instead of an if clause I force a multiplication with zero instead, just because that is much faster...

Speed optimization is not a trivial thing unfortunately...

I hope I understood your question correctly...

regards
Thorsten

frankst
Posts: 32
Joined: Thu 21 Jul 2016, 11:36

Re: Possible speed optimization for PEC?

Post by frankst » Fri 16 Jun 2017, 08:44

Hi Thorsten,

I hope I understand your answer correctly ;-)

I agree that an extra "if" would add to the computation time.
But I wonder if skipping PEC cells should not reduce the amount of data that has to be read and written to the memory.
And since memory bandwidth is the limit (not CPU speed), I would guess that an overall speed improvement would result.

Cheers
Frank

thorsten
Posts: 1425
Joined: Mon 27 Jun 2011, 12:26

Re: Possible speed optimization for PEC?

Post by thorsten » Fri 16 Jun 2017, 09:05

But I wonder if skipping PEC cells
yes of course but how to skip if not asking for a flag with an if-variable?

If you can skip complete mesh areas you should not mesh it in the first place, if it is inside the domain you need if-clauses?
What do you have in mind on how to skip the cells without an (expensive) flag query?

frankst
Posts: 32
Joined: Thu 21 Jul 2016, 11:36

Re: Possible speed optimization for PEC?

Post by frankst » Fri 16 Jun 2017, 09:36

Hi Thorsten,

Since I have to cut out structures of different sizes from a block of PEC, I end up having a lot PEC within the meshed region.
Certainly, I would prefer not to waste my mesh for PEC, but I haven't found a better solution.

Of course, an "if" is required that checks for a flag.
This flag could be created once during setup phase.
During the simulation phase, querying this flag costs some CPU but might require less memory bandwidth then trying to calculate unnecessary fields.

Obviously, the possible benefit (or drawback) depends on how openEMS loops over the mesh cells and how it accesses the stored fields per mesh cell.
I tried to understand the code, but unfortunately I have not made a lot progress.
May be you can give me a hint where to start?

Edit:
I am just speculating.
Depending on the openEMS internals, it might be possible to completely remove unnecessary PEC cells from memory once the have been identified.
No "if" required.
But this really depends on how openEMS internally handles data.


Cheers
Frank

thorsten
Posts: 1425
Joined: Mon 27 Jun 2011, 12:26

Re: Possible speed optimization for PEC?

Post by thorsten » Sat 17 Jun 2017, 19:13

During the simulation phase, querying this flag costs some CPU but might require less memory bandwidth then trying to calculate unnecessary fields.
It's not that simple I fear. Any if clause is expensive because the compiler does not know in advance what he is going to need. Thus clearing the cache and reloading memory every single time.
Additionally openEMS uses SSE, that is doing 4 cells at once. Furthermore, a careful memory alignment is important such that larger chunks of memory are copied at once. Would some of these not be needed it would just mess up everything or at least do not help at all... I'm not saying that there is no room for improvement (there is a lot) and I know quite of few of them, but they are all not that simple...
Depending on the openEMS internals, it might be possible to completely remove unnecessary PEC cells from memory once the have been identified.
openEMS uses internally a simple 3D array to avoid any wired jumping around in memory. Adding holes etc is a bad idea (see above).
Believe me, it is faster and more efficient to just run over the PEC cells, anything else just hurts the speed.
There are other options to improve the speed. E.g. moving to AVX2 (instead of SSE), doing E+H update at once not as two steps, multiple time-stepping, any many more.
But only the first one would be really an options as all the other would most likely just destroy the whole concept of the engine and its engine extensions...
Another route could be to use a GPU-FDTD, but that again would require a complete rewrite of all engines and extensions...
I just do not have the resources (read: time) for something like that...

If speed is really important for you, check out Empire XPU ;)

openEMS was designed with flexibility and ease of maintenance in mind. I was able to achieve both. Speed was only third (or less) priority.
I did as much as I could come up with, but I always made sure it does not get in the way of flexibility and good code maintainability.
Going for as much speed as you can get (at any cost) is just not compatible with that goals I feel.
But again, if anyone has good ideas how to improve the speed without sacrificing the primaries goals, keep the ideas coming ;)

best regards
Thorsten

thorsten
Posts: 1425
Joined: Mon 27 Jun 2011, 12:26

Re: Possible speed optimization for PEC?

Post by thorsten » Sat 17 Jun 2017, 19:28

I tried to understand the code, but unfortunately I have not made a lot progress.
May be you can give me a hint where to start?
There are multiple engines (with different level of optimization) and complexity:

Simple Engine:
(a very basic and simple engine, easy to understand, good for testing/implementing new features)
https://github.com/thliebig/openEMS/blo ... e.cpp#L108

SSE Engine:
(using SSE CPU instructions)
https://github.com/thliebig/openEMS/blo ... se.cpp#L75

Compressed SSE Engine:
(using a compressed set of coefficients to save memory, e.g. all PEC coefficients thus exist only once)
https://github.com/thliebig/openEMS/blo ... ed.cpp#L40

Compressed Multi-threading SSE Engine:
(This one just uses the "compressed SSE engine" in partial areas and syncs everything properly...)
https://github.com/thliebig/openEMS/blo ... thread.cpp

By default the last one is used. And I doubt that anyone uses the other ones any more. But they can be handy if you test new features...
If you want to you could have a look how to replace the SSE with AVX or even AVX2. But that would need to be optional as not all (older) CPU's support AVX(2)...

regards
Thorsten

frankst
Posts: 32
Joined: Thu 21 Jul 2016, 11:36

Re: Possible speed optimization for PEC?

Post by frankst » Tue 20 Jun 2017, 09:32

Hi Thorsten,

Thank you for the explanations and for pointing out where to look in the code.

Cheers
Frank

frankst
Posts: 32
Joined: Thu 21 Jul 2016, 11:36

Re: Possible speed optimization for PEC?

Post by frankst » Tue 27 Jun 2017, 12:31

Hi Thorsten,

I had a look at the code.
If I understand your engines correctly, in UpdateVoltages and UpdateCurrents you have three nested for loops cycling over the grid.
You use "pos" for the three indices to get the correct cell within the 3d mesh. Correct?
But what is "shift" in UpdateVoltages?
For example, you set "shift[2]=pos[2]".
And a few lines later you use pos[2]-shift[2] as index.
Shouldn't that always result index 0?

Cheers
Frank

Edit: O.k., I just saw that "shift" is a bool.
So, it effectively takes only values 0 and 1, right?
But I still do not fully get what you do with it.
I guess some averaging over neighboring cells.
But in this case shouldn't shift be -1 or 1?

Edit2: I'm making progress ;-)
Due to C++ rules, "shift" will be 1 if pos~=0, i.e. almost always.
Only if pos=0, shift=0.

thorsten
Posts: 1425
Joined: Mon 27 Jun 2011, 12:26

Re: Possible speed optimization for PEC?

Post by thorsten » Tue 27 Jun 2017, 19:21

Yes exactly. The engine needs the 4 values around the edge. In both orthogonal directions at pos[d] and pos[d]-1...

But why am I doing it like this? ;)

Post Reply