On my PC with an overclocked Ryzen 9 5950X with dual channel 3600MHz DDR4 RAM, I ran a memory benchmark and got between 15-20GB/s. In the same system I have an RTX 3070 GPU that has a quoted 448GB/s memory bandwidth, over 20x (!) more than the CPU, not including cache, which both processors have.
The fastest I could get openEMS to run on this CPU was with 8 threads (out of 32), maxing out at only 16% CPU usage, clearly memory bandwidth-bound.
I have a little bit of experience messing with CUDA, so I spent a day working on a new CUDA-based engine: https://github.com/aWZHY0yQH81uOYvH/openEMS-CUDA
It uses "managed" memory, which makes life a lot easier in interfacing with the rest of the non-GPU code (the GPU driver will copy back and forth over PCIe seamlessly based on access to the same pointers). So far, I've only ported the field propagation routines (no extensions), and it's very inefficient with GPU utilization, with a huge amount of overhead from kernel launches and other stuff. The code on that repo is literally the first thing that worked.
To compare to the CPU, I commented out everything but the main field updates in the multithreaded CPU engine so both engines would do the same work. With my test file with 132k grid cells, the CPU was running at ~2300MC/s and the GPU was running at ~1400MC/s. Not great, but I also observed the following:
- One CPU core and all the GPU cores were pinned at 100%, presumably because it was launching two kernels per grid cell per iteration, creating and destroying billions of kernels per second.
- NVIDIA Night said reads were occupying 43% of the GPU memory bus and writes were occupying 12%. I assume the reads are so much higher because of huge numbers of cache misses due to the threads being spread across thousands of GPU cores.
I'm going to be resuming my undergrad studies soon, so I won't have a lot of time to continue work on this at the moment. I wanted to share my findings so far so others can play with it, or tell me I'm wasting my time because I don't know what I'm doing.