Optimizing a Distributed DB for Game Developers
This article originally appeared in the Cockroach Labs blog.
You’ve built a cool multiplayer game. Now how do you scale it?
Distributed SQL databases are an increasingly popular choice for game devs because they make scaling and fault tolerance easy without sacrificing transactional consistency. But as developers make the switch to distributed databases, there are opportunities for optimization that can be easy to overlook if they haven’t adapted to the distributed mindset.
Chris Molozian, the CEO and co-founder of Heroic Labs, knows a lot about optimizing distributed databases for game development. Heroic Labs has made gaming infrastructure its business, and its Nakama gaming servers are powering games from studios such as mobile gaming powerhouse Zynga and PC strategy gaming titan Paradox.
Nakama servers use CockroachDB, a distributed SQL database, for “all core data.” And that’s a lot of data: currently, the largest game using Heroic Labs infrastructure has about 300 million players. At that kind of scale, even the tiniest optimizations can make a huge difference.
In a recent conversation with Cockroach Labs Principal Product Evangelist Jim Walker, Molozian highlighted two examples of the subtleties of distributed database optimization for game developers.
Garbage collection for game developers
“One area that’s always a challenge is garbage collection,” Molozian says. Developers tend to think about this issue at the programming language level, but it’s something that needs to be considered at the database level as well.
This is particularly critical for scaling games, Molozian says, because of the write workloads that arise when storing core game data like a game save file or progress data.
Game devs, Molozian says, “will take [that data] out of the game engine and create it as a tree structure. It’ll be serialized to JSON, usually, and then they’ll want to write that [to the database]. They’ll be writing that very, very frequently.”
This can lead to very hot rows within the database. To maintain transactional compliance, the SQL engine needs to keep track of which version of that player progress data is current. When rows are updated so frequently that there are multiple versions of the same row on disk, performance can be impacted if garbage collection hasn’t been tuned to minimize the amount of outdated data the tables in question are storing.
CockroachDB allows developers to tune garbage collection parameters at the cluster, database, or table level, but this is something that game developers may not think to optimize until after they notice their game’s performance is suffering as they scale.
“You’ll have to know a little bit about the underlying details in the way the row versioning is managed in CockroachDB to be able to optimize for this kind of gaming use case,” Molozian says.
In other words: while a distributed SQL database such as CockroachDB offers the same ACID compliance as a traditional single-instance relational database, it achieves that compliance in a different way due to its distributed nature. That’s something that game developers will have to keep in mind as they optimize to get the best performance for their game’s workloads.
Why use UUIDs for game development
Another area where game developers sometimes need to adapt their thinking to a distributed approach is in row (and thus player) identification.
Game devs can be hesitant to use UUIDs because they can seem like a waste of storage – why should I be using 16 bytes to uniquely identify a player when I could accomplish the same thing in eight bytes? The inclination can be to simply use an incrementing identifier, so that the first player to sign up for your game has the id 1, the second is 2, etc.
This approach doesn’t work well in distributed systems because as you scale, you’ll end up with a single “hot spot” node doing the incrementing, which can create performance bottlenecks. UUIDs enable you to generate unique IDs without losing the advantages of the database being distributed.
But for game developers, Molozian says, it actually goes beyond that. Using UUIDs can actually provide an advantage for common game-related tasks such as finding opponents for a multiplayer game:
Because [a UUID v4 is] a pseudo-randomly generated identifier, you can dive into a dataset using key set work clauses that that dive straight into the positioning on the B-tree index and end up range scanning say 100 or 150 pseudo-randomly-selected user accounts, which works really well when you’re trying to then do a little bit of in memory filtering to say ‘Find me ten other opponents that I can play against.'
You can actually turn the consequence of storing it with UUIDs into a benefit in the way you design your APIs. That’s something we’ve done [with our Nakama servers].
You can actually turn the consequence of storing it with UUIDs into a benefit in the way you design your APIs. That’s something we’ve done [with our Nakama servers].
To learn more about how Heroic Labs is powering truly massive social gaming experiences with CockroachDB, check out our case study. And don’t miss the webinar recording for more discussion of game development with a distributed mindset!
To learn more about how Heroic Labs is powering truly massive social gaming experiences with CockroachDB, check out our case study. And don’t miss the webinar recording for more discussion of game development with a distributed mindset!
About the author
Charlie is a former teacher, tech journalist, and filmmaker who’s now combined those three professions into writing and making videos about databases and application development (and occasionally messing with NLP and Python to create weird things in his spare time).