Nakama deployments

A Nakama deployment is a fully dedicated infrastructure stack, isolated from all other deployments.

Overview #

When you provision a Nakama deployment, the following stack is deployed for you:

  • A new meshed Nakama cluster (Nakama on Heroic Cloud)
  • A new Postgres-compatible database
  • Various initialization scripts
  • Network isolation boundaries
  • A new SSL certificate, provisioned and attached to a new dedicated load balancer on the cloud platform
  • Log aggregation rules, which direct the underlying shared logging infrastructure
  • Metrics scraping rules, which are applied to the underlying shared metrics storage
  • Alerting rules, optionally applied when a support tier with an SLA is purchased

Creating a deployment #

When creating a Nakama deployment, you make four permanent choices:

  1. Deployment type is either development or production. This is a permanent choice—it can’t be changed later without recreating the deployment entirely, which will require manual data migration. See Scaling for the differences.
  2. Deployment zone is the region where your instance will run. Choose a zone close to your players to minimize latency. The zone can’t be changed after creation.
  3. Instance name must be unique and becomes part of your DNS hostname. Names may contain dashes (for example, mygame-nakama-prod). Once set, the instance name can’t be changed.
  4. Nakama version follows semantic versioning. The -r suffix indicates revisions specific to Heroic Cloud patches. Unless advised otherwise by the support team, always use the latest -r release.

Plan your naming convention and region strategy before creating production deployments.

Deployment zones #

Heroic Cloud runs on both GCP and AWS.

GCP regions:

RegionZone identifier
US Eastus-east1
EU Westeu-west1
Asia North Eastasia-northeast3

AWS regions (available on request—see below):

RegionZone identifier
US Eastus-east-1
EU Westeu-west-2

If you need to run on AWS specifically, contact Heroic Labs and explain why your deployment requires that cloud provider. Custom DNS records aren’t supported; your deployment will always use a Heroic Cloud-managed domain.

Your deployment hostname will always be provided by Heroic Cloud. Custom DNS records aren’t supported due to domain verification rules around SSL certificates.

Nakama versions and upgrades #

When you create a builder or trigger a build, you select the Nakama version to compile against. The version is baked into the image, so upgrading Nakama means building a new image against the newer version and deploying it.

The -r suffix (for example, 3.21.1-r3) indicates revisions specific to Heroic Cloud patches. Unless you’ve been in contact with the support team, always use the latest -r release.

Run different Nakama versions across environments. For example, your development instance might run a newer version while production stays on the current stable release until you’ve validated the upgrade in QA.

Version support policy #

Heroic Cloud officially supports the last 4 official released versions (not including revisions) of Nakama. For example, if the current latest version is 3.37, versions 3.33 through 3.37 are fully supported. Older versions may continue to work because Nakama doesn’t break backwards compatibility, but they’re unsupported by Heroic Cloud.

Downgrading Nakama #

Version downgrades require intervention by the Heroic Cloud team as automated downgrading could cause issues with data integrity (for example, dropping database tables). If you need to roll back to an older Nakama version, contact Heroic Labs (support@heroiclabs.com).

Connecting your game client #

Each deployment gets a unique hostname in the format {instance-name}.{zone}.heroiclabs.com on port 443 (TLS encrypted). You also get two authentication keys:

  • Server key is used by game clients to authenticate with the Nakama server. Anyone with this key can connect to your backend as a client. Treat it as a shared secret between your game client and the server.
  • Runtime (HTTP) key is used for server-to-server API calls. This key grants higher-privilege access than the server key, including the ability to call administrative runtime functions. It should never be included in game client code.

Both keys are sensitive credentials. Leaking the server key means anyone can connect to your backend. Leaking the runtime key means anyone can call administrative APIs. For this reason, access to these keys is gated behind the Secrets permission. Only users and service users with this permission can view or copy them. See Access control.

SDK code snippets are available for JavaScript, C# / Unity, and Godot. For full client SDK guides, API references, and connection examples, see the Nakama documentation.

Nakama Console #

Each deployment includes a separate management console for inspecting players, storage objects, leaderboards, matches, and other Nakama features directly. This is useful for debugging, customer support, and verifying that your server logic is working correctly.

Deploying images #

Deployments use a rolling update strategy: Nakama instances restart one at a time, so the deployment stays available during updates.

Development-tier deployments run on a single node. Image changes on development-tier instances cause a brief period of downtime. Production deployments without high availability are also subject to downtime during image changes.

If a deploy fails (for example, if the new image crashes on startup), the rolling update stops and the healthy Nakama instances continue serving traffic. Deploy a known-good image to recover.

CrashLoopBackOff #

If a Nakama instance crashes on startup, the system enters a CrashLoopBackOff state: the instance is killed, restarted, crashes again, and so on, with increasing delays between restart attempts. This prevents a crash from consuming resources in a tight loop.

You don’t need to wait for the system to settle before taking action. Update your deployment configuration, deploy a different image, or make other changes while CrashLoopBackOff is in progress.

A 60-minute timeout applies before the operation is marked as failed. If the deployment hasn’t stabilized within 60 minutes, deploy a known-good image to recover.

Common causes of CrashLoopBackOff include misconfigured environment variables, a startup crash in your custom server code, or an incompatible Nakama version. Check the deployment logs first to identify the cause.

Runtime variables #

Environment variables are injected into your Nakama processes. Use these for feature flags, API keys, secrets, or any setting that differs between environments. Changing a runtime variable requires a rolling reboot to take effect.

Nakama configurations #

Custom settings that override Nakama defaults. Configure these through the dashboard.

Some configuration values are treated as secret configs. Secret configs are stored separately from regular configuration values but are presented to Nakama as part of the regular configuration—they’re merged with your standard config at runtime. This means Nakama sees a single unified configuration, while sensitive values remain protected. Secret configs are set through the dashboard UI and are typically used for IAP settings, runtime.env values, or other credentials that shouldn’t be visible to all users with Edit access.

Changing most configuration values requires a rolling reboot to take effect, since the Nakama processes need to restart with the new settings. See the Rolling reboot section below.

Delete protection #

Prevents accidental deletion of the deployment. When enabled, the deployment can’t be deleted until protection is explicitly turned off. Enable this for every production instance. No reason not to.

Rolling reboot #

Restarts each Nakama instance one at a time, maintaining availability throughout. Use this after runtime variable changes or to recover from an unhealthy state. Scaling changes also trigger a rolling reboot automatically.

During a rolling reboot, Heroic Cloud routes new incoming connections only to nodes that aren’t being restarted. Existing connections on a restarting node drop when that node shuts down.

Clients must implement reconnection logic. No mechanism exists to transparently move an existing WebSocket connection from one Nakama node to another. When a client’s connection drops during a reboot, the client is expected to reconnect to the server. Standard SDK reconnection patterns are sufficient for this.

The duration of a rolling reboot depends on the configured grace period shutdown in your Nakama configuration. The grace period is the time Nakama waits before forcefully shutting down to allow in-flight operations to complete. The default is approximately 10 seconds. If you’ve configured a longer grace period (for example, 60 minutes for authoritative matches), each node takes that long to cycle. Rolling reboots can’t be cancelled.

Authoritative multiplayer and reboots #

If you have authoritative server-side match logic running inside Nakama and a node reboots, in-flight matches on that node will be interrupted. To handle this gracefully in Nakama on Heroic Cloud, use two Nakama features in combination:

  1. Match terminate signal in your match handler: when triggered, your logic can create a new match on another node and forward the new match connection information to the clients, allowing them to reconnect transparently.
  2. Grace period shutdown configuration: gives in-flight matches enough time to complete the migration before the node shuts down.

Heroic Cloud systems are aware of both the match terminate signal and the grace period shutdown configuration. New connections are automatically routed only to non-rebooting nodes during the process. This combination allows you to transparently migrate match state between nodes on reboot.

Authoritative match state migration using match terminate is only available in Nakama on Heroic Cloud.

Scaling #

Scaling is entirely platform-managed. The first step—going from 1 CPU to 2 CPUs—always results in high availability with two Nakama nodes. Beyond 2 CPUs, the cluster topology depends on availability and capacity at the time of scaling.

Nakama nodes and the database scale independently. Scale up or down at any time. Scaling triggers a rolling reboot.

Scale the database up but not down—a minimum CPU count is tied to the disk provisioned for that instance. Plan your database tier carefully.

See Scaling for tiers, costs, and timing.

Operations queue #

When you trigger an action that takes time to complete—such as deploying an image, scaling a deployment, triggering a build, or running a data export—it enters the operations queue. Operations run one at a time per resource. If you trigger a second operation while the first is still running, it queues behind it.

Each operation shows its current status (pending, in progress, completed, or failed), who triggered it, and when. If an operation fails, the resource remains in its previous state—retry or take corrective action.

Heroic Cloud sends email notifications when operations complete successfully or fail, so you don’t need to watch the queue for results.

Monitoring #

Built-in time-series charts cover load balancer request count (by HTTP status code), Nakama CPU and memory utilization (per node), database CPU utilization, and database query load. A top database queries view surfaces the most expensive SQL queries with an impact indicator.

These built-in metrics are good for day-to-day monitoring. For custom dashboards, alerting, and long-term retention, use the metric exporting add-on to feed data into your own Prometheus/Grafana stack. See Metric exporting.

Logs #

Deployment logs are available in UTC with full-text search, severity filtering, and date range selection. Logs can be exported on demand. For continuous log shipping to your own infrastructure, see Log exporting.

Individual log lines are truncated at 5KB per line. If your game module produces log output longer than this (for example, large JSON payloads or verbose stack traces), the line will be cut off. Structure your logging to stay within this limit.

Data export #

Download a complete snapshot of your PostgreSQL database at any time. This is your data. Only one export can run at a time, and download links expire. See Data exporting for the full guide.

Audit #

Every user action on the deployment is logged: who deployed an image, who changed configuration, who triggered a reboot. This is scoped to the individual deployment. For the organization-wide audit log, see Audit log.

Billing #

Nakama deployment usage is measured at intervals throughout the day. The highest CPU count recorded during a given day is used to calculate that day’s charge. Billing is settled at the end of each day. See Billing for billing details.

Permissions #

Each deployment has its own permissions that control who can view, edit, deploy, delete, export, scale, and access secrets. Resource-level permissions override title-level and organization-level settings. This is how you lock down production while keeping dev and QA open. See Access control.