Page 2 of 2

Re: Possible speed optimization for PEC?

Posted: Wed 28 Jun 2017, 09:11
by frankst
Hi Thorsten,

I am slightly surprised that you just use floats.
Double precision is not required?

It seems for multi-threading you cut the mesh in x direction only.
That means, every thread cycles over part of the x mesh but the full y and z meshes. Right?

Why do you need the extra treatment of pos[2]=0?
Edit: Actually, also pos[0]=0 and pos[1]=0 get extra treatment (shift=0).
Hence, the question is not just why you need some extra treatment, but also why you need different extra treatment.

Cheers
Frank

Re: Possible speed optimization for PEC?

Posted: Wed 28 Jun 2017, 21:59
by thorsten
I am slightly surprised that you just use floats.
Double precision is not required?
No you really do not... I think no numerical (EM?) software uses double precision. Its just slow and you really do not gain accuracy.
It seems for multi-threading you cut the mesh in x direction only.
That means, every thread cycles over part of the x mesh but the full y and z meshes. Right?
Yes
Why do you need the extra treatment of pos[2]=0?
Edit: Actually, also pos[0]=0 and pos[1]=0 get extra treatment (shift=0).
Hence, the question is not just why you need some extra treatment, but also why you need different extra treatment.
It's for all 3 directions pos[0], pos[1] and pos[2]. The reason is simple, I must not use a negative index, but an if statement would be horribly slow. Thats why I use this trick with the bool (0 or 1) as shift.
That will of course always result in a zero for pos[0/1/2]==0, but the fields at the boundaries are locked to zero in any case.

Re: Possible speed optimization for PEC?

Posted: Thu 29 Jun 2017, 12:15
by frankst
Hi Thorsten,

I get the idea of the pos=0 treatment for x and y.
But for z you do it totally different.
It seems in z direction you first loop and then mix first cell (pos[2]=0) and last cell (pos[2]=numVectors-1).
Even more confusing, UpdateVoltages and UpdateCurrents loop differently over pos[2].

Cheers
Frank

Re: Possible speed optimization for PEC?

Posted: Thu 29 Jun 2017, 19:44
by thorsten
Are you talking about the SSE engine?

Yes this is much more complex as always 4 values are combined in one vector.
But I'm not sure I can follow your confusion...

regards
Thorsten

Re: Possible speed optimization for PEC?

Posted: Fri 30 Jun 2017, 09:44
by frankst
Hi Thorsten,

Indeed, I am talking about the SSE engine.

I am looking at
https://github.com/thliebig/openEMS/blo ... ressed.cpp
Despite looking simpler,
https://github.com/thliebig/openEMS/blo ... ne_sse.cpp
doesn't improve my understanding.
Surely,
https://github.com/thliebig/openEMS/blo ... engine.cpp
is (rather) straightforward.

Trying to put my question in other words:
For example, why in the SSE engine (UpdateVoltages) you use
f4_curr[1][pos[0]][pos[1]][0].v
and
f4_curr[1][pos[0]][pos[1]][numVectors-1].v
to calculate the new
f4_volt[0][pos[0]][pos[1]][0].v?

Cheers
Frank

Re: Possible speed optimization for PEC?

Posted: Fri 30 Jun 2017, 15:10
by thorsten
Well as I said, how the data is organized for SSE instructions is even more complex...

In this case it helps to see how the data is accessed:
https://github.com/thliebig/openEMS/blo ... _sse.h#L38

That means, the data in z-direction is spread over the array differently. The first quarter of all data is in each of the first position of all vectors and the second quarter in the second position and so on...
See:

Code: Select all

f4_curr[n][pos[0]][pos[1]][pos[2]%numVectors].f[pos[2]/numVectors];
A much different organization as for the simple engine... again of course to increase the speed...