Raspberry Pi Cluster - could it run openEMS?

Install support for openEMS

Moderators: thorsten, sebastian

Post Reply
smerrett79
Posts: 27
Joined: Thu 23 May 2019, 18:05

Raspberry Pi Cluster - could it run openEMS?

Post by smerrett79 » Fri 21 Jun 2019, 15:15

Hi,

We'd like to build some hardware optimised to running openEMS in a distributed way. To build up towards this, I am wondering if it is possible to install openEMS on raspberry pis configured on the same network. We'd be prepared not to have AppCSXCAD running, so graphics aren't a problem - visual verification of the meshes etc can be done on a different machine. The aim of this is to discover how we can use e.g. parallel, MPI etc to improve simulation times of a batch of antenna designs.

Any thoughts would be welcome.

smerrett79
Posts: 27
Joined: Thu 23 May 2019, 18:05

Re: Raspberry Pi Cluster - could it run openEMS?

Post by smerrett79 » Sat 22 Jun 2019, 23:15

Just in case anyone is wondering, I got openEMS running on a raspberry pi 3b. It's not fast on its own but if you want to know how I did it, please say. They key parts were in the SSE vs NEON / x86 vs ARM side. Learned far more about architecture differences than I wanted to!

thorsten
Posts: 1380
Joined: Mon 27 Jun 2011, 12:26

Re: Raspberry Pi Cluster - could it run openEMS?

Post by thorsten » Sun 23 Jun 2019, 10:19

Well sounds interesting, Did you have to make changes to openEMS? If so can we maybe include them in the main openEMS repositories to make this easier in the future?

smerrett79
Posts: 27
Joined: Thu 23 May 2019, 18:05

Re: Raspberry Pi Cluster - could it run openEMS?

Post by smerrett79 » Mon 24 Jun 2019, 11:08

I had to make some changes. And this obviously hasn't been thoroughly tested. I used the patch antenna example without any App2CSXCAD visualisation and I did not attempt to post process any of the results (so can't tell yet if they made sense!).

I used a Raspberry Pi 3B. I installed Linux Ubuntu MATE, as I thought that this would be as close as possible to a "normal" OS for octave/openEMS. My first fail was when I forgot to install Octave ( :oops: ). All the insight into my troubleshooting came from the log files that are made for each build - this was a great resource when I was trying to see what I needed to change to make openEMS work on ARM.

The first thing the build complained about before giving up was "xmmintrin.h". This appears in FDTD/engine_sse.cpp and FDTD/engine_multithread.cpp. I googled xmmintrin and found this is linked to x86 SSE instructions and ARM uses NEON instead (SIMD might be the common term to describe what SSE and NEON are but I'm a noob at this - just recording what I noticed in case it helps someone else here). I found a blog post on porting to ARM from x86 here and this led to a repo for a new header file you can swap into any file that uses xmmintrin.h and another one called emmintrin.h (which appears in FDTD/engine_sse_compressed.cpp). The instructions for this new header are simple and I just put the new header SSE2NEON.h file in the openEMS FDTD folder where engine_sse.cpp etc are found. All the other files in the repo are for testing the sse2neon.h file, which I did not do and ignored them. Then you need to commend out the #INCLUDE for xmmintrin.h and emmintrin.h and replace with/add:

Code: Select all

#include "SSE2NEON.h"
Note that the <> (for <xmmintrin.h>) has been replaced with "" because you are looking for the local version of this header file. Please note I did not include the g++/gcc

Code: Select all

-mfpu=neon
because I don't know which file to put that in!

I really don't know if this made a big difference because then I ran into my next problem. You see the SSE2NEON.h has only implemented some (many but not all) the conversions between SSE instructions and NEON equivalent instructions. So I fell down when I next tried to build. There are two missing x86 instructions in SSE2NEON.h, which are: _mm_getcsr and _mm_setcsr. openEMS uses these in FDTD/engine_sse.cpp and FDTD/engine_multithread.cpp. See here for the _mm_getcsr instruction reference from Intel: _mm_getcsr. I went back to the blog post and looked at the reference he based his header file on, which was an intel repo and blog post about porting in the reverse direction - ARM to x86. Apparently you cannot just reverse the instruction mapping but I couldn't be sure how to implement _mm_getcsr and _mm_setcsr without breaking something. I was too scared to try for now. But I also noticed something else. Apparently all this fuss is about handling "denormals" and I noticed a few places say that ARM handles "denormals" in the way we want without having to translate. So I looked at the openEMS code and I commented out the _mm_getcsr and _mm_setcsr lines and tried to build again. It appeared to work and I managed to run the patch antenna example. The octave monitor looked like it was behaving normally, so at the moment I assume this was successful, although I haven't visualised the results yet.

A warning from the gcc docs about using NEON:
"If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=‘neon’), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision."
Do the developers think this is an issue? As I said, because I didn't know where to use it, I didn't even use this flag, so I don't know if it is relevant!

If one of the openEMS developers can solve the _mm_getcsr and _mm_setcsr problem or confirm that they can be ignored for ARM, that would be wonderful, as I think some people would like to try lower power HPC on ARM SBCs.

kdv
Posts: 7
Joined: Tue 07 Aug 2012, 22:16

Re: Raspberry Pi Cluster - could it run openEMS?

Post by kdv » Fri 18 Oct 2019, 11:15

I would put the -mfpu-neon in the CMakefile of openems:

Code: Select all

diff -c openEMS/openEMS/CMakeLists.txt.orig openEMS/openEMS/CMakeLists.txt
*** openEMS/openEMS/CMakeLists.txt.orig	2019-10-18 09:40:04.320275767 +0100
--- openEMS/openEMS/CMakeLists.txt	2019-10-18 09:45:48.475160390 +0100
***************
*** 147,152 ****
--- 147,153 ----
  INCLUDE_DIRECTORIES (${VTK_INCLUDE_DIR})
  
  #set(CMAKE_CXX_FLAGS "-msse -march=native")
+ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mfpu=neon -march=native")
  
  # independent tool
  ADD_SUBDIRECTORY( nf2ff )
  
If you want to optimize for raspberry 3/4, try "-march=armv8-a -mtune=cortex-a53 -mfpu=crypto-neon-fp-armv8 -O2".

Openems has two kinds of engines: basic and sse. The basic code is not accelerated but runs everywhere, the sse code runs only on intel. If you remove all files with intel-specific sse code from the CMakeFiles and #ifdef all intel-specific code in openems.cc you get an openems which is not accelerated, but runs on all architectures.

I did that once: removed everything which was intel-specific, with the idea of getting the basic engine running first, and later adding code optimized for neon. All of openems compiled cleanly on raspberry. I could run a tutorial right up to RunOpenEMS(). But when I ran RunOpenEMS() in matlab, the openems process itself crashed in libtinyxml, in Parse_XML_FDTDSetup IIRC.

kdv
Posts: 7
Joined: Tue 07 Aug 2012, 22:16

Re: Raspberry Pi Cluster - could it run openEMS?

Post by kdv » Mon 21 Oct 2019, 13:04

I've made compiling the (accelerated) sse code conditional. The resulting source compiles and runs openems on raspberry.
However, the resulting binary is very limited in features. The included patch is more like a list of items that need looking at to get a functional openems on arm.

Code: Select all

 ---------------------------------------------------------------------- 
 | openEMS 32bit -- version v0.0.35-45-gde23172
 | (C) 2010-2018 Thorsten Liebig <thorsten.liebig@gmx.de>  GPL license
 ---------------------------------------------------------------------- 
	Used external libraries:
		CSXCAD -- Version: v0.6.2-85-g55899d0
		hdf5   -- Version: 1.10.4
		          compiled against: HDF5 library version: 1.10.4
		tinyxml -- compiled against: 2.6.2
		fparser
		boost  -- compiled against: 1_67
		vtk -- Version: 6.3.0
		       compiled against: 6.3.0

Create FDTD operator
Create a steady state detection using a period of 1e-07 s
Operartor::CalcECOperator: Decreasing timestep by 0.1% to 1.92308e-09 (1.92583e-09) to match periodic signal
FDTD simulation size: 21x21x41 --> 18081 FDTD cells 
FDTD timestep is: 1.92308e-09 s; Nyquist rate: 25 timesteps @1.04e+07 Hz
Excitation signal period is: 51 timesteps (1e-07s)
Max. number of timesteps: 100 ( --> 0.961538 * Excitation signal period)
openEMS::SetupFDTD: Warning, max. number of timesteps is smaller than three times the excitation signal period. 
	You may want to choose a higher number of max. timesteps... 
Create FDTD engine
Running FDTD engine... this may take a while... grab a cup of coffee?!?
Time for 100 iterations with 18081 cells : 0.361227 sec
Speed: 5.00544 MCells/s 
CSXcad and paraview give error messages.
libEGL warning: DRI2: failed to create dri screen
libEGL warning: DRI2: failed to create dri screen
invoking AppCSXCAD, exit to continue script...
libEGL warning: DRI2: failed to create dri screen
libEGL warning: DRI2: failed to create dri screen
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-pi'
QCSXCAD - disabling editing
libGL error: failed to create dri screen
libGL error: failed to load driver: vc4
libGL error: failed to create dri screen
libGL error: failed to load driver: vc4
libGL error: failed to create dri screen
libGL error: failed to load driver: vc4
libGL error: failed to create dri screen
libGL error: failed to load driver: vc4
- At this moment there is no acceleration. Only the "basic" engine runs.
- At this moment there are no cylindrical coordinages, only rectangular. The engines for cylindrical coordinates all use sse. Am I wrong or is there no "basic" engine for cylindrical coordinates?
- CSXcad and paraview need looking at. The raspberry Qt libraries provide OpenGL ES "Embedded System". You can find Qt5, compiled with desktop OpenGL, here: https://github.com/koendv/qt5-opengl-raspberrypi. Consider compiling CSXcad / paraview against these Qt libraries.
Attachments
openems-raspbian.patch
patch
(17.8 KiB) Downloaded 82 times
openems-build.txt
build notes
(1.55 KiB) Downloaded 79 times
2019-10-21-122738_1920x1200_scrot.png
csxcad
2019-10-21-122738_1920x1200_scrot.png (102.17 KiB) Viewed 2158 times

Post Reply