Hi Thorsten,

I am slightly surprised that you just use floats.

Double precision is not required?

It seems for multi-threading you cut the mesh in x direction only.

That means, every thread cycles over part of the x mesh but the full y and z meshes. Right?

Why do you need the extra treatment of pos[2]=0?

Edit: Actually, also pos[0]=0 and pos[1]=0 get extra treatment (shift=0).

Hence, the question is not just why you need some extra treatment, but also why you need different extra treatment.

Cheers

Frank

## Possible speed optimization for PEC?

**Moderator:** thorsten

### Re: Possible speed optimization for PEC?

No you really do not... I think no numerical (EM?) software uses double precision. Its just slow and you really do not gain accuracy.I am slightly surprised that you just use floats.

Double precision is not required?

YesIt seems for multi-threading you cut the mesh in x direction only.

That means, every thread cycles over part of the x mesh but the full y and z meshes. Right?

It's for all 3 directions pos[0], pos[1] and pos[2]. The reason is simple, I must not use a negative index, but an if statement would be horribly slow. Thats why I use this trick with the bool (0 or 1) as shift.Why do you need the extra treatment of pos[2]=0?

Edit: Actually, also pos[0]=0 and pos[1]=0 get extra treatment (shift=0).

Hence, the question is not just why you need some extra treatment, but also why you need different extra treatment.

That will of course always result in a zero for pos[0/1/2]==0, but the fields at the boundaries are locked to zero in any case.

### Re: Possible speed optimization for PEC?

Hi Thorsten,

I get the idea of the pos=0 treatment for x and y.

But for z you do it totally different.

It seems in z direction you first loop and then mix first cell (pos[2]=0) and last cell (pos[2]=numVectors-1).

Even more confusing, UpdateVoltages and UpdateCurrents loop differently over pos[2].

Cheers

Frank

I get the idea of the pos=0 treatment for x and y.

But for z you do it totally different.

It seems in z direction you first loop and then mix first cell (pos[2]=0) and last cell (pos[2]=numVectors-1).

Even more confusing, UpdateVoltages and UpdateCurrents loop differently over pos[2].

Cheers

Frank

### Re: Possible speed optimization for PEC?

Are you talking about the SSE engine?

Yes this is much more complex as always 4 values are combined in one vector.

But I'm not sure I can follow your confusion...

regards

Thorsten

Yes this is much more complex as always 4 values are combined in one vector.

But I'm not sure I can follow your confusion...

regards

Thorsten

### Re: Possible speed optimization for PEC?

Hi Thorsten,

Indeed, I am talking about the SSE engine.

I am looking at

https://github.com/thliebig/openEMS/blo ... ressed.cpp

Despite looking simpler,

https://github.com/thliebig/openEMS/blo ... ne_sse.cpp

doesn't improve my understanding.

Surely,

https://github.com/thliebig/openEMS/blo ... engine.cpp

is (rather) straightforward.

Trying to put my question in other words:

For example, why in the SSE engine (UpdateVoltages) you use

f4_curr[1][pos[0]][pos[1]][0].v

and

f4_curr[1][pos[0]][pos[1]][numVectors-1].v

to calculate the new

f4_volt[0][pos[0]][pos[1]][0].v?

Cheers

Frank

Indeed, I am talking about the SSE engine.

I am looking at

https://github.com/thliebig/openEMS/blo ... ressed.cpp

Despite looking simpler,

https://github.com/thliebig/openEMS/blo ... ne_sse.cpp

doesn't improve my understanding.

Surely,

https://github.com/thliebig/openEMS/blo ... engine.cpp

is (rather) straightforward.

Trying to put my question in other words:

For example, why in the SSE engine (UpdateVoltages) you use

f4_curr[1][pos[0]][pos[1]][0].v

and

f4_curr[1][pos[0]][pos[1]][numVectors-1].v

to calculate the new

f4_volt[0][pos[0]][pos[1]][0].v?

Cheers

Frank

### Re: Possible speed optimization for PEC?

Well as I said, how the data is organized for SSE instructions is even more complex...

In this case it helps to see how the data is accessed:

https://github.com/thliebig/openEMS/blo ... _sse.h#L38

That means, the data in z-direction is spread over the array differently. The first quarter of all data is in each of the first position of all vectors and the second quarter in the second position and so on...

See:
A much different organization as for the simple engine... again of course to increase the speed...

In this case it helps to see how the data is accessed:

https://github.com/thliebig/openEMS/blo ... _sse.h#L38

That means, the data in z-direction is spread over the array differently. The first quarter of all data is in each of the first position of all vectors and the second quarter in the second position and so on...

See:

Code: Select all

`f4_curr[n][pos[0]][pos[1]][pos[2]%numVectors].f[pos[2]/numVectors];`