My God, It’s Full of Stars: Implementing the implicit Euler-Method with STL

Welcome back to another exciting part of the n-body-problem post series. This time we are going to implement the Euler-Method to make the tests, we defined in the last post, pass. If you just dropped in or need to fresh up your memory see the links to the previous posts below. I also stumbled upon a very good presentation of the n-body-problem topic. Even if it’s quite mathy, it serves us as good background information.

  1. My God, It’s Full Of Stars: The N-Body-Problem Series
  2. My God, It’s Full Of Stars: And then there was CMake
  3. My God, It’s Full Of Stars: Testing Point Mass Attraction with Catch2

In the last post we implemented two simple tests but while implementing the solver I realized we need clearly a bit more tests. Not only to test the solver, but we will need also some sort of Vector2D (later we might need a 3D variant) because we have to store and manipulate the position, velocity, and acceleration of a point mass. And all of these properties can be expressed as 2-dimensional vectors in a 2D space. To be useful Vector2D has to provide a couple of typical operations, such as addition, subtraction, and multiplication and division with scalars. Additional we need to perform some negative tests for comparing a Vector2D. We would also need to test a division by zero case, which is asserted by operator/(), but as far as I found out Catch2 can’t handle such cases at version 2.6.1 which we are using. This is how the test of Vector2D looks like.

With the Vector2D class, we are fulfilling our tests and make them pass. The operator overloads are used to provide arithmetic operations between mathematical 2-dimensional vectors and scalars. Important to note is that the equality operator==() is comparing the absolute subtraction of both vectors. If they are equal, the result of the subtraction would be close to zero which is checked by comparing the result against std::numeric_limits<double>::min(). This utility template is representing the smallest possible finite value of a specific type and is part of the limits header. An alternative would be to check against relations of both values, but for the moment this would cause too many following issues, such as division by zero. Bruce Dawson is suggesting a comparison against an absolute epsilon based value, in case of comparisons against zero, which we are doing with the std::numeric_limits<double>::min().

We also need to extend the solver tests which are now running a couple of performance tests as well. We are doing this because we want to roughly determine how effective our algorithm is. Additional this gives us the ability to compare different algorithms against each other later on. And according to the paper of Christoph Schäfer, we expect a computational complexity around O(N^2) . Even if the PERFORMANCE clause is in a beta state, it works quite well.

Because we will need to generate a random number of point masses we also introduced a simple ParticleBuilder utility class, based on an adapted and simplified builder pattern with a fluent interface, which can generate single particles or any arbitrary number of particles needed. In our case, we use this to generate our benchmark particles. The basic principle of the builder pattern is that each method, which is configuring a specific property of a point mass, is returning a reference to itself. This way we can concatenate all (or only some if necessary) methods together and build at the end the point mass. The actual generation of the point mass is done at the very end of the definition process which is called lazy initialization. To generate the random values of the point mass properties we take, as suggested by Vorbrodt’s article about random number generators, the std::random_device, which we use as a random seed of the std::mersenne_twister_engine::mt19937 to produce the actual random number we need. The range of the random numbers will be uniformly distributed between 1 and the number of point masses to generate.

The Particle class is then basically straight forward. The only important topic that we have to somehow define is how are we determine if two particles are actually the same. This can’t be done by simply comparing their reference, because we never altering the particles them self, but producing new ones after each operation. All particles are handled as immutable. Because of this, we need to somehow assign a unique ID to every point mass which we are doing by simply assigning a class IDCounter onto the id member and incrementing IDCounter afterward.

Last but not least we have the actual implementation of the Euler-Method. The Solver class gets initialized by the necessary time step \epsilon and has only one simple interface method which takes a std::vector<Particle> as parameter and returns, after computation is done, the result. In order to execute its computation, the solve method is using the methods calculateAcceleration, calculateVelocity, and calculatePosition, which basically all have the same interface. The essence of the computation, the calculation of the acceleration of a point mass by the gravitational force of all the other point masses, is then done inside the static function AccumulateAcceleration. You might realize that we strongly focused on using functionality provided by the STL, there is even no real for loop. This is done that way because I had the idea of using the execution policy’s of the STL, later on, but then realized that this is not supported until now by libc++ (P0024R2).

We accomplished quite a bit up to this point. We got now a simple but working algorithm which is computing us a solution for a discrete time step of the n-body-problem. But how does our first draft perform? First of all, all tests are passing, great. And we are even able to calculate with a reasonable number of point masses. But we have to remind our selves that we need for only one time step with 25.6K point masses around 45 seconds. That’s quite a lot. If we just see the numbers we might realize that the statement of the computational complexity of the Euler-Method is O(N^2) , the diagram below shows it even more drastic. And it’s totally true. If we would exchange the std::transform, of the calculateAcceleration method, and the std::for_each algorithm, of the AccumulateAcceleration function, with simple for loops, we would easily realize that we have a nested for loop of N point masses. The rest is then just math, N \cdot N=N^2 .


Till this point, we focused only on readability as much as possible, and I think we could even more. But I would like to focus on two certain issues on the next post, the efficiency of the algorithm and parallelization. If you would like to download the project at this state, feel free to get v0.4.0

Did you like this post?

What are your thoughts?

Feel free to comment and share the post.

Success! You're on the list.

10 thoughts on “My God, It’s Full of Stars: Implementing the implicit Euler-Method with STL

  1. From my many years coding 3D computer graphics I remember an optimization you could employ when comparing vector lengths. Instead of computing the length using `std::sqrt` call compare lengths squared. Have a method called lengthSquared that just returns (x*x + y*y). Because squared lengths of identical vectors are… identical 🙂 this way is faster since it skips the expensive `std::sqrt` call 😉


  2. Why are you using a std::for_each in Solver::AccumulateAcceleration instead of a range-based for? The only case I could see the std::for_each over a ranged for if you plan on using the parallel algorithms. Also what are you using as testing framework?


  3. Hi, first off thanks for the interesting articles; it’s interesting to see numerical algorithms in a C++ context.

    A couple questions/notes: The title includes “implicit Euler-Method”, but this seems to be explicit Euler. Is the intention to use an implicit time stepper eventually, or is this just a typo in the title?

    You mention parallelization as the next step for speed-up, but as you noted the runtime with this approach is O(N^2). Wouldn’t it be better to work on improving the scalability first with something like Barnes-Hut or a Fast Multipole Method for O(N log N) or O(N) scaling? Parallelization doesn’t really solve the underlying problem.

    A final note/question: Why is your measured performance so poor? I thought your benchmarked times seemed odd so I threw together a simple Matlab script:

    Runtime for 12.8k particles was 1.74 seconds, 25.6k particles was 6.3 seconds; this is nearly 7 times faster than your benchmarks. Perhaps it is worth looking at using a Linear Algebra package instead of the STL for fast vector/matrix operations?


    1. Yes your right, it’s a typo. Yes the algorithm has very poor performance, Barnes-Hut would be much better. But I want to first provide an “optimized” brute force approach and then giving the possibility to switch algorithm later on. The poor performance is coming by the way from several problems I plan to address in next particle post (e.g. the operator overloading is prohibiting sabotaging done by the compiler).


      1. Ah, clears things up thanks! I look forward to the next article; my C++ knowledge is below average so seeing the discussion on compiler behavior will be illuminating 🙂 I’d be curious too to see some profiling results to see where the code spends most of it’s time. Keep up the interesting posts!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.