Possible speed optimization for PEC?
Moderator: thorsten
Possible speed optimization for PEC?
Hi Thorsten,
Is it possible that openEMS always calculates fields for all cells, even the ones which are a priori known to be field free, e.g. cells within PEC?
Wouldn't it be possible to exclude such cells from calculation?
For example, during setup phase PEC cells which are fully surrounded by other PEC cells could be marked to be excluded from calculation.
Cheers
Frank
Is it possible that openEMS always calculates fields for all cells, even the ones which are a priori known to be field free, e.g. cells within PEC?
Wouldn't it be possible to exclude such cells from calculation?
For example, during setup phase PEC cells which are fully surrounded by other PEC cells could be marked to be excluded from calculation.
Cheers
Frank
Re: Possible speed optimization for PEC?
Hello Frank,
this could be done of course. But keep in mind any "if" clause inside the main engine is very computational expensive. Since the coefficients and fields are zero the CPU is very quick to decide the answer and I feel that it is faster to just run over it then have more complicated loops or (never do it) having if statements in your core.
I even have cases where instead of an if clause I force a multiplication with zero instead, just because that is much faster...
Speed optimization is not a trivial thing unfortunately...
I hope I understood your question correctly...
regards
Thorsten
this could be done of course. But keep in mind any "if" clause inside the main engine is very computational expensive. Since the coefficients and fields are zero the CPU is very quick to decide the answer and I feel that it is faster to just run over it then have more complicated loops or (never do it) having if statements in your core.
I even have cases where instead of an if clause I force a multiplication with zero instead, just because that is much faster...
Speed optimization is not a trivial thing unfortunately...
I hope I understood your question correctly...
regards
Thorsten
Re: Possible speed optimization for PEC?
Hi Thorsten,
I hope I understand your answer correctly
I agree that an extra "if" would add to the computation time.
But I wonder if skipping PEC cells should not reduce the amount of data that has to be read and written to the memory.
And since memory bandwidth is the limit (not CPU speed), I would guess that an overall speed improvement would result.
Cheers
Frank
I hope I understand your answer correctly

I agree that an extra "if" would add to the computation time.
But I wonder if skipping PEC cells should not reduce the amount of data that has to be read and written to the memory.
And since memory bandwidth is the limit (not CPU speed), I would guess that an overall speed improvement would result.
Cheers
Frank
Re: Possible speed optimization for PEC?
yes of course but how to skip if not asking for a flag with an if-variable?But I wonder if skipping PEC cells
If you can skip complete mesh areas you should not mesh it in the first place, if it is inside the domain you need if-clauses?
What do you have in mind on how to skip the cells without an (expensive) flag query?
Re: Possible speed optimization for PEC?
Hi Thorsten,
Since I have to cut out structures of different sizes from a block of PEC, I end up having a lot PEC within the meshed region.
Certainly, I would prefer not to waste my mesh for PEC, but I haven't found a better solution.
Of course, an "if" is required that checks for a flag.
This flag could be created once during setup phase.
During the simulation phase, querying this flag costs some CPU but might require less memory bandwidth then trying to calculate unnecessary fields.
Obviously, the possible benefit (or drawback) depends on how openEMS loops over the mesh cells and how it accesses the stored fields per mesh cell.
I tried to understand the code, but unfortunately I have not made a lot progress.
May be you can give me a hint where to start?
Edit:
I am just speculating.
Depending on the openEMS internals, it might be possible to completely remove unnecessary PEC cells from memory once the have been identified.
No "if" required.
But this really depends on how openEMS internally handles data.
Cheers
Frank
Since I have to cut out structures of different sizes from a block of PEC, I end up having a lot PEC within the meshed region.
Certainly, I would prefer not to waste my mesh for PEC, but I haven't found a better solution.
Of course, an "if" is required that checks for a flag.
This flag could be created once during setup phase.
During the simulation phase, querying this flag costs some CPU but might require less memory bandwidth then trying to calculate unnecessary fields.
Obviously, the possible benefit (or drawback) depends on how openEMS loops over the mesh cells and how it accesses the stored fields per mesh cell.
I tried to understand the code, but unfortunately I have not made a lot progress.
May be you can give me a hint where to start?
Edit:
I am just speculating.
Depending on the openEMS internals, it might be possible to completely remove unnecessary PEC cells from memory once the have been identified.
No "if" required.
But this really depends on how openEMS internally handles data.
Cheers
Frank
Re: Possible speed optimization for PEC?
It's not that simple I fear. Any if clause is expensive because the compiler does not know in advance what he is going to need. Thus clearing the cache and reloading memory every single time.During the simulation phase, querying this flag costs some CPU but might require less memory bandwidth then trying to calculate unnecessary fields.
Additionally openEMS uses SSE, that is doing 4 cells at once. Furthermore, a careful memory alignment is important such that larger chunks of memory are copied at once. Would some of these not be needed it would just mess up everything or at least do not help at all... I'm not saying that there is no room for improvement (there is a lot) and I know quite of few of them, but they are all not that simple...
openEMS uses internally a simple 3D array to avoid any wired jumping around in memory. Adding holes etc is a bad idea (see above).Depending on the openEMS internals, it might be possible to completely remove unnecessary PEC cells from memory once the have been identified.
Believe me, it is faster and more efficient to just run over the PEC cells, anything else just hurts the speed.
There are other options to improve the speed. E.g. moving to AVX2 (instead of SSE), doing E+H update at once not as two steps, multiple time-stepping, any many more.
But only the first one would be really an options as all the other would most likely just destroy the whole concept of the engine and its engine extensions...
Another route could be to use a GPU-FDTD, but that again would require a complete rewrite of all engines and extensions...
I just do not have the resources (read: time) for something like that...
If speed is really important for you, check out Empire XPU

openEMS was designed with flexibility and ease of maintenance in mind. I was able to achieve both. Speed was only third (or less) priority.
I did as much as I could come up with, but I always made sure it does not get in the way of flexibility and good code maintainability.
Going for as much speed as you can get (at any cost) is just not compatible with that goals I feel.
But again, if anyone has good ideas how to improve the speed without sacrificing the primaries goals, keep the ideas coming

best regards
Thorsten
Re: Possible speed optimization for PEC?
There are multiple engines (with different level of optimization) and complexity:I tried to understand the code, but unfortunately I have not made a lot progress.
May be you can give me a hint where to start?
Simple Engine:
(a very basic and simple engine, easy to understand, good for testing/implementing new features)
https://github.com/thliebig/openEMS/blo ... e.cpp#L108
SSE Engine:
(using SSE CPU instructions)
https://github.com/thliebig/openEMS/blo ... se.cpp#L75
Compressed SSE Engine:
(using a compressed set of coefficients to save memory, e.g. all PEC coefficients thus exist only once)
https://github.com/thliebig/openEMS/blo ... ed.cpp#L40
Compressed Multi-threading SSE Engine:
(This one just uses the "compressed SSE engine" in partial areas and syncs everything properly...)
https://github.com/thliebig/openEMS/blo ... thread.cpp
By default the last one is used. And I doubt that anyone uses the other ones any more. But they can be handy if you test new features...
If you want to you could have a look how to replace the SSE with AVX or even AVX2. But that would need to be optional as not all (older) CPU's support AVX(2)...
regards
Thorsten
Re: Possible speed optimization for PEC?
Hi Thorsten,
Thank you for the explanations and for pointing out where to look in the code.
Cheers
Frank
Thank you for the explanations and for pointing out where to look in the code.
Cheers
Frank
Re: Possible speed optimization for PEC?
Hi Thorsten,
I had a look at the code.
If I understand your engines correctly, in UpdateVoltages and UpdateCurrents you have three nested for loops cycling over the grid.
You use "pos" for the three indices to get the correct cell within the 3d mesh. Correct?
But what is "shift" in UpdateVoltages?
For example, you set "shift[2]=pos[2]".
And a few lines later you use pos[2]-shift[2] as index.
Shouldn't that always result index 0?
Cheers
Frank
Edit: O.k., I just saw that "shift" is a bool.
So, it effectively takes only values 0 and 1, right?
But I still do not fully get what you do with it.
I guess some averaging over neighboring cells.
But in this case shouldn't shift be -1 or 1?
Edit2: I'm making progress
Due to C++ rules, "shift" will be 1 if pos~=0, i.e. almost always.
Only if pos=0, shift=0.
I had a look at the code.
If I understand your engines correctly, in UpdateVoltages and UpdateCurrents you have three nested for loops cycling over the grid.
You use "pos" for the three indices to get the correct cell within the 3d mesh. Correct?
But what is "shift" in UpdateVoltages?
For example, you set "shift[2]=pos[2]".
And a few lines later you use pos[2]-shift[2] as index.
Shouldn't that always result index 0?
Cheers
Frank
Edit: O.k., I just saw that "shift" is a bool.
So, it effectively takes only values 0 and 1, right?
But I still do not fully get what you do with it.
I guess some averaging over neighboring cells.
But in this case shouldn't shift be -1 or 1?
Edit2: I'm making progress

Due to C++ rules, "shift" will be 1 if pos~=0, i.e. almost always.
Only if pos=0, shift=0.
Re: Possible speed optimization for PEC?
Yes exactly. The engine needs the 4 values around the edge. In both orthogonal directions at pos[d] and pos[d]-1...
But why am I doing it like this?
But why am I doing it like this?
