## Possible speed optimization for PEC?

Discussion about new features and development support

Moderator: thorsten

frankst
Posts: 32
Joined: Thu 21 Jul 2016, 11:36

### Re: Possible speed optimization for PEC?

Hi Thorsten,

I am slightly surprised that you just use floats.
Double precision is not required?

It seems for multi-threading you cut the mesh in x direction only.
That means, every thread cycles over part of the x mesh but the full y and z meshes. Right?

Why do you need the extra treatment of pos[2]=0?
Edit: Actually, also pos[0]=0 and pos[1]=0 get extra treatment (shift=0).
Hence, the question is not just why you need some extra treatment, but also why you need different extra treatment.

Cheers
Frank

thorsten
Posts: 1438
Joined: Mon 27 Jun 2011, 12:26

### Re: Possible speed optimization for PEC?

I am slightly surprised that you just use floats.
Double precision is not required?
No you really do not... I think no numerical (EM?) software uses double precision. Its just slow and you really do not gain accuracy.
It seems for multi-threading you cut the mesh in x direction only.
That means, every thread cycles over part of the x mesh but the full y and z meshes. Right?
Yes
Why do you need the extra treatment of pos[2]=0?
Edit: Actually, also pos[0]=0 and pos[1]=0 get extra treatment (shift=0).
Hence, the question is not just why you need some extra treatment, but also why you need different extra treatment.
It's for all 3 directions pos[0], pos[1] and pos[2]. The reason is simple, I must not use a negative index, but an if statement would be horribly slow. Thats why I use this trick with the bool (0 or 1) as shift.
That will of course always result in a zero for pos[0/1/2]==0, but the fields at the boundaries are locked to zero in any case.

frankst
Posts: 32
Joined: Thu 21 Jul 2016, 11:36

### Re: Possible speed optimization for PEC?

Hi Thorsten,

I get the idea of the pos=0 treatment for x and y.
But for z you do it totally different.
It seems in z direction you first loop and then mix first cell (pos[2]=0) and last cell (pos[2]=numVectors-1).
Even more confusing, UpdateVoltages and UpdateCurrents loop differently over pos[2].

Cheers
Frank

thorsten
Posts: 1438
Joined: Mon 27 Jun 2011, 12:26

### Re: Possible speed optimization for PEC?

Are you talking about the SSE engine?

Yes this is much more complex as always 4 values are combined in one vector.

regards
Thorsten

frankst
Posts: 32
Joined: Thu 21 Jul 2016, 11:36

### Re: Possible speed optimization for PEC?

Hi Thorsten,

Indeed, I am talking about the SSE engine.

I am looking at
https://github.com/thliebig/openEMS/blo ... ressed.cpp
Despite looking simpler,
https://github.com/thliebig/openEMS/blo ... ne_sse.cpp
doesn't improve my understanding.
Surely,
https://github.com/thliebig/openEMS/blo ... engine.cpp
is (rather) straightforward.

Trying to put my question in other words:
For example, why in the SSE engine (UpdateVoltages) you use
f4_curr[1][pos[0]][pos[1]][0].v
and
f4_curr[1][pos[0]][pos[1]][numVectors-1].v
to calculate the new
f4_volt[0][pos[0]][pos[1]][0].v?

Cheers
Frank

thorsten
Posts: 1438
Joined: Mon 27 Jun 2011, 12:26

### Re: Possible speed optimization for PEC?

Well as I said, how the data is organized for SSE instructions is even more complex...

In this case it helps to see how the data is accessed:
https://github.com/thliebig/openEMS/blo ... _sse.h#L38

That means, the data in z-direction is spread over the array differently. The first quarter of all data is in each of the first position of all vectors and the second quarter in the second position and so on...
See:

Code: Select all

``f4_curr[n][pos[0]][pos[1]][pos[2]%numVectors].f[pos[2]/numVectors];``
A much different organization as for the simple engine... again of course to increase the speed...