Raspberry Pi Cluster - could it run openEMS?
Moderators: thorsten, sebastian
-
- Posts: 31
- Joined: Thu 23 May 2019, 18:05
Raspberry Pi Cluster - could it run openEMS?
Hi,
We'd like to build some hardware optimised to running openEMS in a distributed way. To build up towards this, I am wondering if it is possible to install openEMS on raspberry pis configured on the same network. We'd be prepared not to have AppCSXCAD running, so graphics aren't a problem - visual verification of the meshes etc can be done on a different machine. The aim of this is to discover how we can use e.g. parallel, MPI etc to improve simulation times of a batch of antenna designs.
Any thoughts would be welcome.
We'd like to build some hardware optimised to running openEMS in a distributed way. To build up towards this, I am wondering if it is possible to install openEMS on raspberry pis configured on the same network. We'd be prepared not to have AppCSXCAD running, so graphics aren't a problem - visual verification of the meshes etc can be done on a different machine. The aim of this is to discover how we can use e.g. parallel, MPI etc to improve simulation times of a batch of antenna designs.
Any thoughts would be welcome.
-
- Posts: 31
- Joined: Thu 23 May 2019, 18:05
Re: Raspberry Pi Cluster - could it run openEMS?
Just in case anyone is wondering, I got openEMS running on a raspberry pi 3b. It's not fast on its own but if you want to know how I did it, please say. They key parts were in the SSE vs NEON / x86 vs ARM side. Learned far more about architecture differences than I wanted to!
Re: Raspberry Pi Cluster - could it run openEMS?
Well sounds interesting, Did you have to make changes to openEMS? If so can we maybe include them in the main openEMS repositories to make this easier in the future?
-
- Posts: 31
- Joined: Thu 23 May 2019, 18:05
Re: Raspberry Pi Cluster - could it run openEMS?
I had to make some changes. And this obviously hasn't been thoroughly tested. I used the patch antenna example without any App2CSXCAD visualisation and I did not attempt to post process any of the results (so can't tell yet if they made sense!).
I used a Raspberry Pi 3B. I installed Linux Ubuntu MATE, as I thought that this would be as close as possible to a "normal" OS for octave/openEMS. My first fail was when I forgot to install Octave (
). All the insight into my troubleshooting came from the log files that are made for each build - this was a great resource when I was trying to see what I needed to change to make openEMS work on ARM.
The first thing the build complained about before giving up was "xmmintrin.h". This appears in FDTD/engine_sse.cpp and FDTD/engine_multithread.cpp. I googled xmmintrin and found this is linked to x86 SSE instructions and ARM uses NEON instead (SIMD might be the common term to describe what SSE and NEON are but I'm a noob at this - just recording what I noticed in case it helps someone else here). I found a blog post on porting to ARM from x86 here and this led to a repo for a new header file you can swap into any file that uses xmmintrin.h and another one called emmintrin.h (which appears in FDTD/engine_sse_compressed.cpp). The instructions for this new header are simple and I just put the new header SSE2NEON.h file in the openEMS FDTD folder where engine_sse.cpp etc are found. All the other files in the repo are for testing the sse2neon.h file, which I did not do and ignored them. Then you need to commend out the #INCLUDE for xmmintrin.h and emmintrin.h and replace with/add: Note that the <> (for <xmmintrin.h>) has been replaced with "" because you are looking for the local version of this header file. Please note I did not include the g++/gcc because I don't know which file to put that in!
I really don't know if this made a big difference because then I ran into my next problem. You see the SSE2NEON.h has only implemented some (many but not all) the conversions between SSE instructions and NEON equivalent instructions. So I fell down when I next tried to build. There are two missing x86 instructions in SSE2NEON.h, which are: _mm_getcsr and _mm_setcsr. openEMS uses these in FDTD/engine_sse.cpp and FDTD/engine_multithread.cpp. See here for the _mm_getcsr instruction reference from Intel: _mm_getcsr. I went back to the blog post and looked at the reference he based his header file on, which was an intel repo and blog post about porting in the reverse direction - ARM to x86. Apparently you cannot just reverse the instruction mapping but I couldn't be sure how to implement _mm_getcsr and _mm_setcsr without breaking something. I was too scared to try for now. But I also noticed something else. Apparently all this fuss is about handling "denormals" and I noticed a few places say that ARM handles "denormals" in the way we want without having to translate. So I looked at the openEMS code and I commented out the _mm_getcsr and _mm_setcsr lines and tried to build again. It appeared to work and I managed to run the patch antenna example. The octave monitor looked like it was behaving normally, so at the moment I assume this was successful, although I haven't visualised the results yet.
A warning from the gcc docs about using NEON:
If one of the openEMS developers can solve the _mm_getcsr and _mm_setcsr problem or confirm that they can be ignored for ARM, that would be wonderful, as I think some people would like to try lower power HPC on ARM SBCs.
I used a Raspberry Pi 3B. I installed Linux Ubuntu MATE, as I thought that this would be as close as possible to a "normal" OS for octave/openEMS. My first fail was when I forgot to install Octave (

The first thing the build complained about before giving up was "xmmintrin.h". This appears in FDTD/engine_sse.cpp and FDTD/engine_multithread.cpp. I googled xmmintrin and found this is linked to x86 SSE instructions and ARM uses NEON instead (SIMD might be the common term to describe what SSE and NEON are but I'm a noob at this - just recording what I noticed in case it helps someone else here). I found a blog post on porting to ARM from x86 here and this led to a repo for a new header file you can swap into any file that uses xmmintrin.h and another one called emmintrin.h (which appears in FDTD/engine_sse_compressed.cpp). The instructions for this new header are simple and I just put the new header SSE2NEON.h file in the openEMS FDTD folder where engine_sse.cpp etc are found. All the other files in the repo are for testing the sse2neon.h file, which I did not do and ignored them. Then you need to commend out the #INCLUDE for xmmintrin.h and emmintrin.h and replace with/add:
Code: Select all
#include "SSE2NEON.h"
Code: Select all
-mfpu=neon
I really don't know if this made a big difference because then I ran into my next problem. You see the SSE2NEON.h has only implemented some (many but not all) the conversions between SSE instructions and NEON equivalent instructions. So I fell down when I next tried to build. There are two missing x86 instructions in SSE2NEON.h, which are: _mm_getcsr and _mm_setcsr. openEMS uses these in FDTD/engine_sse.cpp and FDTD/engine_multithread.cpp. See here for the _mm_getcsr instruction reference from Intel: _mm_getcsr. I went back to the blog post and looked at the reference he based his header file on, which was an intel repo and blog post about porting in the reverse direction - ARM to x86. Apparently you cannot just reverse the instruction mapping but I couldn't be sure how to implement _mm_getcsr and _mm_setcsr without breaking something. I was too scared to try for now. But I also noticed something else. Apparently all this fuss is about handling "denormals" and I noticed a few places say that ARM handles "denormals" in the way we want without having to translate. So I looked at the openEMS code and I commented out the _mm_getcsr and _mm_setcsr lines and tried to build again. It appeared to work and I managed to run the patch antenna example. The octave monitor looked like it was behaving normally, so at the moment I assume this was successful, although I haven't visualised the results yet.
A warning from the gcc docs about using NEON:
Do the developers think this is an issue? As I said, because I didn't know where to use it, I didn't even use this flag, so I don't know if it is relevant!"If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=‘neon’), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision."
If one of the openEMS developers can solve the _mm_getcsr and _mm_setcsr problem or confirm that they can be ignored for ARM, that would be wonderful, as I think some people would like to try lower power HPC on ARM SBCs.
Re: Raspberry Pi Cluster - could it run openEMS?
I would put the -mfpu-neon in the CMakefile of openems:
If you want to optimize for raspberry 3/4, try "-march=armv8-a -mtune=cortex-a53 -mfpu=crypto-neon-fp-armv8 -O2".
Openems has two kinds of engines: basic and sse. The basic code is not accelerated but runs everywhere, the sse code runs only on intel. If you remove all files with intel-specific sse code from the CMakeFiles and #ifdef all intel-specific code in openems.cc you get an openems which is not accelerated, but runs on all architectures.
I did that once: removed everything which was intel-specific, with the idea of getting the basic engine running first, and later adding code optimized for neon. All of openems compiled cleanly on raspberry. I could run a tutorial right up to RunOpenEMS(). But when I ran RunOpenEMS() in matlab, the openems process itself crashed in libtinyxml, in Parse_XML_FDTDSetup IIRC.
Code: Select all
diff -c openEMS/openEMS/CMakeLists.txt.orig openEMS/openEMS/CMakeLists.txt
*** openEMS/openEMS/CMakeLists.txt.orig 2019-10-18 09:40:04.320275767 +0100
--- openEMS/openEMS/CMakeLists.txt 2019-10-18 09:45:48.475160390 +0100
***************
*** 147,152 ****
--- 147,153 ----
INCLUDE_DIRECTORIES (${VTK_INCLUDE_DIR})
#set(CMAKE_CXX_FLAGS "-msse -march=native")
+ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mfpu=neon -march=native")
# independent tool
ADD_SUBDIRECTORY( nf2ff )
Openems has two kinds of engines: basic and sse. The basic code is not accelerated but runs everywhere, the sse code runs only on intel. If you remove all files with intel-specific sse code from the CMakeFiles and #ifdef all intel-specific code in openems.cc you get an openems which is not accelerated, but runs on all architectures.
I did that once: removed everything which was intel-specific, with the idea of getting the basic engine running first, and later adding code optimized for neon. All of openems compiled cleanly on raspberry. I could run a tutorial right up to RunOpenEMS(). But when I ran RunOpenEMS() in matlab, the openems process itself crashed in libtinyxml, in Parse_XML_FDTDSetup IIRC.
Re: Raspberry Pi Cluster - could it run openEMS?
I've made compiling the (accelerated) sse code conditional. The resulting source compiles and runs openems on raspberry.
However, the resulting binary is very limited in features. The included patch is more like a list of items that need looking at to get a functional openems on arm.
CSXcad and paraview give error messages.
- At this moment there are no cylindrical coordinages, only rectangular. The engines for cylindrical coordinates all use sse. Am I wrong or is there no "basic" engine for cylindrical coordinates?
- CSXcad and paraview need looking at. The raspberry Qt libraries provide OpenGL ES "Embedded System". You can find Qt5, compiled with desktop OpenGL, here: https://github.com/koendv/qt5-opengl-raspberrypi. Consider compiling CSXcad / paraview against these Qt libraries.
However, the resulting binary is very limited in features. The included patch is more like a list of items that need looking at to get a functional openems on arm.
Code: Select all
----------------------------------------------------------------------
| openEMS 32bit -- version v0.0.35-45-gde23172
| (C) 2010-2018 Thorsten Liebig <thorsten.liebig@gmx.de> GPL license
----------------------------------------------------------------------
Used external libraries:
CSXCAD -- Version: v0.6.2-85-g55899d0
hdf5 -- Version: 1.10.4
compiled against: HDF5 library version: 1.10.4
tinyxml -- compiled against: 2.6.2
fparser
boost -- compiled against: 1_67
vtk -- Version: 6.3.0
compiled against: 6.3.0
Create FDTD operator
Create a steady state detection using a period of 1e-07 s
Operartor::CalcECOperator: Decreasing timestep by 0.1% to 1.92308e-09 (1.92583e-09) to match periodic signal
FDTD simulation size: 21x21x41 --> 18081 FDTD cells
FDTD timestep is: 1.92308e-09 s; Nyquist rate: 25 timesteps @1.04e+07 Hz
Excitation signal period is: 51 timesteps (1e-07s)
Max. number of timesteps: 100 ( --> 0.961538 * Excitation signal period)
openEMS::SetupFDTD: Warning, max. number of timesteps is smaller than three times the excitation signal period.
You may want to choose a higher number of max. timesteps...
Create FDTD engine
Running FDTD engine... this may take a while... grab a cup of coffee?!?
Time for 100 iterations with 18081 cells : 0.361227 sec
Speed: 5.00544 MCells/s
- At this moment there is no acceleration. Only the "basic" engine runs.libEGL warning: DRI2: failed to create dri screen
libEGL warning: DRI2: failed to create dri screen
invoking AppCSXCAD, exit to continue script...
libEGL warning: DRI2: failed to create dri screen
libEGL warning: DRI2: failed to create dri screen
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-pi'
QCSXCAD - disabling editing
libGL error: failed to create dri screen
libGL error: failed to load driver: vc4
libGL error: failed to create dri screen
libGL error: failed to load driver: vc4
libGL error: failed to create dri screen
libGL error: failed to load driver: vc4
libGL error: failed to create dri screen
libGL error: failed to load driver: vc4
- At this moment there are no cylindrical coordinages, only rectangular. The engines for cylindrical coordinates all use sse. Am I wrong or is there no "basic" engine for cylindrical coordinates?
- CSXcad and paraview need looking at. The raspberry Qt libraries provide OpenGL ES "Embedded System". You can find Qt5, compiled with desktop OpenGL, here: https://github.com/koendv/qt5-opengl-raspberrypi. Consider compiling CSXcad / paraview against these Qt libraries.
- Attachments
-
openems-raspbian.patch
- patch
- (17.8 KiB) Downloaded 339 times
-
openems-build.txt
- build notes
- (1.55 KiB) Downloaded 358 times
-
- csxcad
- 2019-10-21-122738_1920x1200_scrot.png (102.17 KiB) Viewed 7010 times
-
- Posts: 31
- Joined: Thu 23 May 2019, 18:05
Re: Raspberry Pi Cluster - could it run openEMS?
Hi, kdv, I firstly wanted to say thank you and sorry for not saying thank you earlier. I hadn't seen your posts for some reason until I returned to this topic this summer.
I have tried your patch and can get it to build successfully. I tried a basic example, based on the patch antenna example, that doesn't attempt to visualise the model in AppCSXCAD or plot any graphs (to run from terminal for easy testing) but I had a strange printout. I can get this kind of expected initial output:
but then there is no cout for the openEMS "splash", like:
And then it finishes with:
...so cerr looks like it is working
So then I tried with my method (#include SSE2NEON.h etc) and I had the same problem. Do you know any reason why cout might not be working for this?
Using the patch antenna example I get the same issue (although warnings/errors from vtk7 before launching and displaying well in AppCSXCAD) and then:
.. suggesting that cout is working from some files. Finally, the patch antenna example crashes with
..when I ctrl+C.
I'm not very interested in the rendering of plots or mesh models, so that side isn't important but if anyone (@Thorsten?) might be able to offer some insight into the failure to show progress during FDTD, I'd be very grateful. Even getting the final MC/s score up and visible would be helpful for now.
I have tried your patch and can get it to build successfully. I tried a basic example, based on the patch antenna example, that doesn't attempt to visualise the model in AppCSXCAD or plot any graphs (to run from terminal for easy testing) but I had a strange printout. I can get this kind of expected initial output:
Code: Select all
warning: lines: 1@-12.5 2@-8.7762 3@-7.5349
found resolution decrease smaller than ratio: 0.33333 < 1/1.75=0.57143
warning: called from
CheckMesh at line 59 column 13
SmoothMeshLines at line 121 column 5
bench_patch_1 at line 77 column 8
args = "patch_ant.xml" --numThreads=1
Code: Select all
----------------------------------------------------------------------
| openEMS 32bit -- version v0.0.35-45-gde23172
| (C) 2010-2018 Thorsten Liebig <thorsten.liebig@gmx.de> GPL license
----------------------------------------------------------------------
Used external libraries:
CSXCAD -- Version: v0.6.2-85-g55899d0
hdf5 -- Version: 1.10.4
compiled against: HDF5 library version: 1.10.4
tinyxml -- compiled against: 2.6.2
fparser
boost -- compiled against: 1_67
vtk -- Version: 6.3.0
compiled against: 6.3.0
Create FDTD operator
Create a steady state detection using a period of 1e-07 s
Operartor::CalcECOperator: Decreasing timestep by 0.1% to 1.92308e-09 (1.92583e-09) to match periodic signal
FDTD simulation size: 21x21x41 --> 18081 FDTD cells
FDTD timestep is: 1.92308e-09 s; Nyquist rate: 25 timesteps @1.04e+07 Hz
Excitation signal period is: 51 timesteps (1e-07s)
Max. number of timesteps: 100 ( --> 0.961538 * Excitation signal period)
openEMS::SetupFDTD: Warning, max. number of timesteps is smaller than three times the excitation signal period.
You may want to choose a higher number of max. timesteps...
Create FDTD engine
Running FDTD engine... this may take a while... grab a cup of coffee?!?
Time for 100 iterations with 18081 cells : 0.361227 sec
Speed: 5.00544 MCells/s
Code: Select all
RunFDTD: Warning: Max. number of timesteps was reached before the end-criteria of -50dB was reached...
You may want to choose a higher number of max. timesteps...
So then I tried with my method (#include SSE2NEON.h etc) and I had the same problem. Do you know any reason why cout might not be working for this?
Using the patch antenna example I get the same issue (although warnings/errors from vtk7 before launching and displaying well in AppCSXCAD) and then:
Code: Select all
----------------------------------------------------------------------
| nf2ff, near-field to far-field transformation for openEMS
| (C) 2012-2014 Thorsten Liebig <thorsten.liebig@gmx.de> GPL license
----------------------------------------------------------------------
warning: opengl_renderer: Error 'invalid enumerant' (1280) occurred in init_gl_context
warning: opengl_renderer: Error 'invalid operation' (1282) occurred drawing 'text' object
warning: opengl_renderer: Error 'invalid enumerant' (1280) occurred drawing 'text' object
warning: opengl_renderer: Error 'invalid enumerant' (1280) occurred drawing 'text' object
warning: opengl_renderer: Error 'invalid operation' (1282) occurred drawing 'line' object
warning: function "h5readatt_octave" not found, trying to run "setup"
warning: called from
ReadHDF5Attribute at line 15 column 9
ReadNF2FF at line 22 column 12
CalcNF2FF at line 140 column 7
Patch_Antenna at line 189 column 7
setting up openEMS matlab/octave interface
compiling oct files
HDF5 library path found at: /usr/lib/arm-linux-gnueabihf/hdf5/serial/libhdf5.so
/usr/lib/arm-linux-gnueabihf/hdf5/openmpi
HDF5 include path found at: /usr/include/hdf5/serial
sh: 2: /usr/lib/arm-linux-gnueabihf/hdf5/openmpi: Permission denied
warning: mkoctfile: building exited with failure status
radiated power: Prad = 1.6686e-26 Watt
directivity: Dmax = 6.7544 dBi
efficiency: nu_rad = 95.6953 %
Code: Select all
warning: opengl_renderer: Error 'invalid enumerant' (1280) occurred in init_gl_context
warning: opengl_renderer: Error 'invalid operation' (1282) occurred drawing 'text' object
warning: opengl_renderer: Error 'invalid enumerant' (1280) occurred drawing 'text' object
warning: opengl_renderer: Error 'invalid enumerant' (1280) occurred drawing 'text' object
warning: opengl_renderer: Error 'invalid operation' (1282) occurred drawing 'line' object
warning: opengl_renderer: Error 'invalid operation' (1282) occurred drawing 'line' object
warning: opengl_renderer: Error 'invalid operation' (1282) occurred drawing 'text' object
warning: opengl_renderer: Error 'invalid enumerant' (1280) occurred drawing 'text' object
warning: opengl_renderer: Error 'invalid enumerant' (1280) occurred drawing 'text' object
warning: opengl_renderer: Error 'invalid operation' (1282) occurred drawing 'line' object
warning: opengl_renderer: Error 'invalid operation' (1282) occurred drawing 'line' object
terminate called after throwing an instance of 'octave::interrupt_exception'
fatal: caught signal Aborted -- stopping myself...
Aborted
I'm not very interested in the rendering of plots or mesh models, so that side isn't important but if anyone (@Thorsten?) might be able to offer some insight into the failure to show progress during FDTD, I'd be very grateful. Even getting the final MC/s score up and visible would be helpful for now.