Devblog 30: The Stopwatch Strikes Back

Month Nineteen

We often worry about time. Usually it’s a gnawing anxiety that we aren’t spending hours wisely, or that we might be a few minutes late to something important. But have you ever worried about a millisecond? One friend is a professional athlete, and even he measures success in seconds. Not once have I heard him explain that he plans to improve by half a millisecond. Nor would I take him seriously if he did.

And yet, here we are, worrying about milliseconds. It all began when we realised that a group of one hundred units was slowing the game down to 10-20 Frames Per Second (FPS). Considering that a hundred units was benchmarked last year running at 130-160 FPS, this was somewhat concerning.

So Jamie and I began our investigation, at first with a bit of pair programming. Unity has profiling tools, which provide a visual breakdown of CPU usage, showing where to start. This can be confirmed by simply ripping features out, and seeing if that helps. Unfortunately, the vast majority of time is spent combing through the code using C#’s stopwatch feature. This allows developers to see how long something has taken in milliseconds (thousandth of a second).

Optimisation can be surmised using an old environmentalist slogan: ‘reduce, reuse, recycle’. The biggest problem is when code repeats itself. This often occurs during initial development, when it may not be obvious just how time critical an operation is. Outcomes are especially bad when iterating over the same list multiple times, so the most important thing is to reduce the number of loops.

Recycling code via caching is good practice, but also complex. Generally, the smaller the object, the more efficient it is to reuse. In one example, we found it was twice as fast to cache (and reuse) an array with four items instead of creating a new array each time. Large collections however are expensive to reset, because the operation is O(n), meaning it may be better to create a new collection instead. The only way you’ll know is by testing.

Programmers must remember that these decisions are never free, and are always a trade: spending more memory for less CPU usage, or vice versa. Discarding more items makes the ‘garbage collector’ work harder to delete old resources, which is itself a non-trivial cost.

Another somewhat obvious surprise was about deep references. The more times code must call references, especially ones external to the object, the more expensive referencing becomes. It may not be immediately obvious that this is happening, but the simplest way of thinking about it is counting the number of dots in a function. More dots, more references, more expense. Caching references locally may not seem worthwhile, but often is. Consider the following example from within a loop:

            int newWorldX = Self.Coordinate.worldX + direction.worldX;
            int newWorldY = Self.Coordinate.worldY + direction.worldY;
            int newWorldZ = Self.Coordinate.worldZ + direction.worldZ;

That doesn’t seem too bad. But let’s cache the Coordinate object:

            Coordinate selfCoordinate = Self.Coordinate;
            int newWorldX = selfCoordinate.worldX + direction.worldX;
            int newWorldY = selfCoordinate.worldY + direction.worldY;
            int newWorldZ = selfCoordinate.worldZ + direction.worldZ;

Now, count the dots. Nine and seven respectively. Doesn’t seem like much, but that change improved performance by between 0.1 and 0.2ms. Of course, a superior solution would be to avoid calculating new coordinates each time entirely, which requires Coordinate objects to cache their neighbours. With modern computing, that may be an acceptable cost in terms of memory.

Now, some of you will read 0.1ms and think I’ve gone mad (perhaps just descending further into madness). A hundred nanoseconds Richard, really? One phrase used by programmers is a helpful guide: “premature optimisation is the root of all evil” (Sir Tony Hoare). However, in our case, this isn’t premature and we really do need simulation code to be as performant as possible.

With careful attention to detail; reducing, reusing, recycling, we have improved the median time it takes per tick to simulate 100 units [at rest] from 8ms to 2.5ms. The biggest improvement was probably fixing spatial partitioning, which was returning far too many neighbours. But that could never be enough, and every line of code involved in unit movement has to be examined with our reliable friend the stopwatch.

Further optimisation requires less calculations per second. In Chris Park’s excellent development blog post: ‘Optimizing 30,000+ Ships in Realtime in C#‘, he provides many wise tips and tricks. The most important is probably that a ship’s rotation is only calculated four times a second, and players didn’t notice the difference. At any rate, the simulation doesn’t need to calculate rotation every tick. This will further improve performance from 2.5ms, and so achieving a sub 1ms result is possible.

In other news, I am delighted to report that Norn Industries’ application for funding from local government (The Pixel Mill) has been successful. This funding will allow us to hire another programmer, and to extend pre-publisher development into 2022. Success would have been impossible without Rory Clifford’s help, the Pixel Mill has been an invaluable resource.

Hiring is a relief, as Jamie and I need another programmer with games development experience. As Norn Industries is a small organisation, our hiring policy is like the British Army’s Special Air Service: you can’t sign up, we will ask you.