Devblog 32: Synchronisation Solved (for now)

Month Twenty One

This month the team finally resolved a significant technical challenge, and have (so far) fixed network determinism. This is the proud (and nervous) conclusion of over a year’s work on multiple interrelated systems: lockstep, movement, combat, fog of war, user interface, etc. Additionally, Jamie has integrated weapon art (and many other animations besides), while James and I worked to refactor the terrain system (preparing for load/save features).

One day we were sat around a table trying to figure out why our games were desyncing. Normally a multiplayer game must be synchronised, which is to say, the game state must be exactly the same for each player. But something happened, and our game states diverged. For example, on one person’s screen an agent decided to move, on another they decided to shoot. When this happens the game has desynced and is unplayable.

At first we suspected this was due to combat. Group movement wasn’t causing any problems, but as soon as fights began the game state would fall apart. While we understood the likely problem region, this involved so many systems that trying to isolate the real cause was non-trivial.

What we needed was more data. A lot more data. At first we started printing out the game state (each agent’s position, rotation, target id, combat state, etc), and comparing log files, but this manual process was slow and prone to human error.

What we really needed was some help, and so I created a command line application called Olaf (I’d been playing Northgard). Olaf’s job is to automate game state checks, and they achieve this by comparing multiple log files, line by line. As soon as anomalies occur Olaf tells us exactly what happened.

Olaf finds a problem.

However, this problem was often complicated by the fact that when one thing failed, a lot of other things also failed simultaneously. What at first appeared to be an issue with movement was later revealed to be an issue with the agent’s combat state. We then realised that combat targeting was the issue, and it looked like fog of war was involved.

The fog of war system was rendering an asynchronous outcome, and so James and Jamie investigated. After much deliberation and testing, Jamie tracked down the issue; fog rendering wasn’t synchronised with the lockstep frame rate. That meant that each machine was rendering fog of war at different rates, and so agents were registering targets as visible on one machine and not another.

After we were satisfied that this seemed to have fixed the issue, Jamie and I conducted an online test. He hosted a game from home, and I joined remotely via a direct IP connection. We then had a battle, stopped the game, and exchanged log files. Olaf then compared 130,000 lines of output, finding no issues.

Well, that was a relief! With synchronisation assured, it won’t be long until we can begin internal playtesting.