What happens when you make a move in lichess.org?

@dreis_sw|September 23, 2024 (2m ago)

Have you ever wondered what goes on behind the scenes every time you make a move in your favorite online chess platform?

Lichess.org is a popular, free, and open-source chess platform that attracts millions of players worldwide. Its seamless, real-time gameplay experience must be powered by a robust backend infrastructure. In this post, we'll peek behind the curtain and explore the technical processes involved when you play a move.

Inspecting Network Activity with Chrome DevTools

To understand the flow of data when making a move, we'll start by utilizing Chrome DevTools, particularly the Network tab, which allows us to monitor communication between the client (your browser) and the server.

WebSocket Connections

The first notable network activity is a WebSocket connection to a URL similar to:

wss://socket2.lichess.org/play/H5uHz0egyvIA/v6?sri=bt6QzcyOiZg5&v=0

The protocol, wss, indicates an encrypted websockets connection using TLS.

No surprises here, WebSockets are the obvious choice for real-time browser apps like online chess because they allow for full-duplex communication, enabling instant updates between the client and server without the overhead of repeated HTTP requests.

A few notable parameters in the URL:

H5uHz0egyvIA: The unique game ID. This ID is used to identify the game that we're connecting to.
v: An incrementing counter representing the game's version from the client's perspective. More on this later.
sri: A randomly generated string on each page load, possibly for session tracking or security purposes.

Let's observe the network activity when we make a move.

Local Player's Turn

When we make a move, several packets of data are exchanged. Here's the first packet, sent by us:

// Sent at 22:51:35.280
{ 
  "t": "move",
  "d": {
    "u": "d2d4",
    "l": 32,
    "a": 1
  }
}

Clear enough, right? Let's break it down:

t: "move": Indicates the type of message.
d: Data payload containing:
- u: The move in UCI[1] format.
- l: Probably some length?
- a: Acknowledgment counter to track message acknowledgments.

We then receive the following message:

// Received at 22:51:35.312 
{ 
  "t": "ack",
  "d": 1
}

This message from the server acknowledges that it has received our move. Notice that the d field contains the same acknowledgment counter we sent in our move message, in this case, 1.

Right after, we receive the following message:

// Received at 22:51:35.312 
{
  "t": "move",
  "v": 1,
  "d": {
    "uci": "d2d4",
    "san": "d4",
    "fen": "rnbqkbnr/pppppppp/8/8/3P4/8/PPP1PPPP/RNBQKBNR",
    "ply": 1,
    "clock": {
      "white": 300,
      "black": 300
    }
  }
}

This message provides details about the move we just made and the updated game state.

uci: Move in UCI[1] format.
san: Move in SAN[2] format.
fen: FEN[3] string representing the current state of the game.
ply: Number of half-moves made in the game.
clock: Remaining time for each player. Since the game just started, both players have 300 seconds.

In summary, when a move is made, the client sends a move message, receives an acknowledgment, and then receives a detailed update about the move and the game's new state.

Opponent's Turn

When the opponent makes a move, we receive a similar packet from the server.

// Received at 22:51:43.489 
{
  "t": "move",
  "v": 2,
  "d": {
    "uci": "d7d5",
    "san": "d5",
    "fen": "rnbqkbnr/ppp1pppp/8/3p4/3P4/8/PPP1PPPP/RNBQKBNR",
    "ply": 2,
    "dests": {
      "c2": "c3c4",
      "g2": "g3g4"
      // .. additional possible moves
    },
    "clock": {
      "white": 300,
      "black": 300
    }
  }
}

One key difference is the dests parameter, which lists all possible moves available from the current position. This parameter is used to highlight the possible moves on the chessboard that we can make after our opponent's move.

While these moves could be calculated client-side, providing them server-side ensures consistency - especially for complex or esoteric chess variants - and optimizes performance on clients with limited processing capabilities or energy restrictions.

Now that we've analysed the network activity, let's look into the backend architecture that powers these real-time interactions.

Interested in blog posts like this one?

Subscribe to a monthly curation of the best technical posts, hand-picked by me.

No ads, ever. One-click unsubscribe. Seriously, I know how annoying this can be.

Lichess's Architecture

Lichess's real-time playing system is primarily composed of two main services (both written in Scala):

lila: Core service that manages game logic, state, user interactions, and other fundamental functionalities. Without it, the lichess.org website would be unable to function.
lila-ws: Specialized service responsible for handling WebSocket connections, acting as a bridge between the client and lila. This service knows as little as possible about chess.

Architecture Overview

From the lila-ws readme:

lila <-> redis <-> lila-ws <-> websocket <-> client

As this implies, lila communicates with lila-ws through Redis, which in turn manages WebSocket connections with clients.

If lila is momentarily down, lila-ws can still handle WebSocket connections and maintain real-time communication with clients (perhaps features like chat would still work!). Conversely, if lila-ws is down, the lichess.org website will still be online, but you won't be able to play games.

This separation also allows for independent scaling of the two components.

Communication using Redis Pub/Sub

The move event is published to a Redis Pub/Sub channel, to which lila is subscribed and processes the move.

Redis Pub/Sub offers at-most-once delivery. This means that each message is delivered no more than once to each subscriber, if at all. If a subscriber fails while processing a message, that message is lost and not re-delivered. This has the benefit that once a message is delivered, it can be removed from Redis, reducing memory usage. However, it also means that message loss is possible.

Redis Streams could be used to provide at-least-once delivery, ensuring that messages are not lost even if a subscriber fails. However, this would increase memory usage and complexity.

Eventual Data Persistence with MongoDB

While lila primarily stores game states in MongoDB, it optimizes database load by not saving every single move immediately. Instead, it buffers progress by accumulating moves and periodically saving them unless a significant event occurs, such as a game conclusion. This strategy lightens the load on the database while maintaining data integrity.

def save(progress: Progress): Funit =
  // Saves the latest game state to a variable. Used for basing the following moves on.
  set(progress.game)
  
  // Prepares the dirty progress to be saved to the database.
  dirtyProgress = dirtyProgress.fold(progress.dropEvents)(_.withGame(progress.game)).some
  
  // If there are special events, such as a resignation, the game state is flushed.
  if shouldFlushProgress(progress) then flushProgress() 
  
  // Otherwise, the game state is scheduled to be stored to the DB after a delay.
  else fuccess(scheduleFlushProgress())

Joining a Game In Progress

As mentioned earlier, when a player connects, they provide the v parameter, which tells the system the latest version of the game they know about. Since the game state in MongoDB syncs up eventually (and your opponent might move just as you join), a player might initially get a state that's a bit behind the latest action.

To handle this, lila-ws uses a trusty java.util.concurrent.ConcurrentHashMap:

Key: Game ID.
Value: List of ordered and versioned events for that game.

This setup keeps track of all events for any ongoing game and clears them once the game wraps up. It helps players reconnect to an active game by giving them all the events they need from point v onwards, without missing any or doubling up. Understandably, it has to be a concurrency-safe data structure, since multiple threads can be serving multiple players at the same time.

Wrapping Up

We are now ready to sum up the process of making a move in Lichess:

Client Connection
The client establishes a WebSocket connection to lila-ws at a URL like wss://socket2.lichess.org/play/....
Sending a Move
When a player makes a move, the client sends a move event to lila-ws with details like the move's UCI string and acknowledgment counter.
Acknowledgment
lila-ws responds with an acknowledgment (ack) to confirm that the move has been received.
Publishing to Redis
The move event is published to a Redis Pub/Sub channel, to which lila is listening and processes the move.
Updating Game State
lila receives the move, updates the game state accordingly, and (eventually) stores it in a MongoDB database. An updated game state is then sent back through lila-ws to the client.
Client Receives Update
The client receives the updated game state, reflecting the new move and any changes in the game's status.

Thank you for reading! If you have any questions or further insights, feel free to reach out.

1. ^ UCI (Universal Chess Interface): A standard format for encoding chess moves, useful for computer processing. Learn more.

2. ^ SAN (Standard Algebraic Notation): A human-readable format for chess moves. Compared to UCI, this move requires chess knowledge, since it doesn't specify the piece moved. Learn more.

3. ^ FEN (Forsyth-Edwards Notation): Encodes the entire game state, including piece positions, turn order, castling rights, and more. Learn more.