Page **2** of **2**

### Re: Possible speed optimization for PEC?

Posted: **Wed 28 Jun 2017, 09:11**

by **frankst**

Hi Thorsten,

I am slightly surprised that you just use floats.

Double precision is not required?

It seems for multi-threading you cut the mesh in x direction only.

That means, every thread cycles over part of the x mesh but the full y and z meshes. Right?

Why do you need the extra treatment of pos[2]=0?

Edit: Actually, also pos[0]=0 and pos[1]=0 get extra treatment (shift=0).

Hence, the question is not just why you need some extra treatment, but also why you need different extra treatment.

Cheers

Frank

### Re: Possible speed optimization for PEC?

Posted: **Wed 28 Jun 2017, 21:59**

by **thorsten**

I am slightly surprised that you just use floats.

Double precision is not required?

No you really do not... I think no numerical (EM?) software uses double precision. Its just slow and you really do not gain accuracy.

It seems for multi-threading you cut the mesh in x direction only.

That means, every thread cycles over part of the x mesh but the full y and z meshes. Right?

Yes

Why do you need the extra treatment of pos[2]=0?

Edit: Actually, also pos[0]=0 and pos[1]=0 get extra treatment (shift=0).

Hence, the question is not just why you need some extra treatment, but also why you need different extra treatment.

It's for all 3 directions pos[0], pos[1] and pos[2]. The reason is simple, I must not use a negative index, but an if statement would be horribly slow. Thats why I use this trick with the bool (0 or 1) as shift.

That will of course always result in a zero for pos[0/1/2]==0, but the fields at the boundaries are locked to zero in any case.

### Re: Possible speed optimization for PEC?

Posted: **Thu 29 Jun 2017, 12:15**

by **frankst**

Hi Thorsten,

I get the idea of the pos=0 treatment for x and y.

But for z you do it totally different.

It seems in z direction you first loop and then mix first cell (pos[2]=0) and last cell (pos[2]=numVectors-1).

Even more confusing, UpdateVoltages and UpdateCurrents loop differently over pos[2].

Cheers

Frank

### Re: Possible speed optimization for PEC?

Posted: **Thu 29 Jun 2017, 19:44**

by **thorsten**

Are you talking about the SSE engine?

Yes this is much more complex as always 4 values are combined in one vector.

But I'm not sure I can follow your confusion...

regards

Thorsten

### Re: Possible speed optimization for PEC?

Posted: **Fri 30 Jun 2017, 09:44**

by **frankst**

Hi Thorsten,

Indeed, I am talking about the SSE engine.

I am looking at

https://github.com/thliebig/openEMS/blo ... ressed.cpp
Despite looking simpler,

https://github.com/thliebig/openEMS/blo ... ne_sse.cpp
doesn't improve my understanding.

Surely,

https://github.com/thliebig/openEMS/blo ... engine.cpp
is (rather) straightforward.

Trying to put my question in other words:

For example, why in the SSE engine (UpdateVoltages) you use

f4_curr[1][pos[0]][pos[1]][0].v

and

f4_curr[1][pos[0]][pos[1]][numVectors-1].v

to calculate the new

f4_volt[0][pos[0]][pos[1]][0].v?

Cheers

Frank

### Re: Possible speed optimization for PEC?

Posted: **Fri 30 Jun 2017, 15:10**

by **thorsten**

Well as I said, how the data is organized for SSE instructions is even more complex...

In this case it helps to see how the data is accessed:

https://github.com/thliebig/openEMS/blo ... _sse.h#L38
That means, the data in z-direction is spread over the array differently. The first quarter of all data is in each of the first position of all vectors and the second quarter in the second position and so on...

See:

Code: Select all

`f4_curr[n][pos[0]][pos[1]][pos[2]%numVectors].f[pos[2]/numVectors];`

A much different organization as for the simple engine... again of course to increase the speed...