Stock Exchange Systems Design Case Study

This article is a case study in systems design. The premise of the project is fictitious, but the concepts demonstrate how you might design a system like Robinhood or Fidelity.

This particular case study is presented from a Full Stack perspective and dives into details of both the back end and front end.

Product Requirements

Background

We need to design a mobile-friendly trading app that allows a user to buy and sell an asset.

We are making use of an exchange API, so the technical complexities of creating and executing an order book are not covered in this design.

User Stories

As a user, I want to be able to log into a demo account to access the MVP.

As a user, I want to be able to view a list of the top stocks by market cap.

As a user, I want to be able to see a list of the assets I own.

As a user, I want to be able to select an asset and view the real-time price.

As a user, I want to be able to place a limit or market order to buy the asset.

As a user, I want to be able to place a limit or market order to sell the asset (if I already own it, short-selling is not supported yet).

Non-Functional Requirements

All major modern browsers should be supported, except for internet explorer.

The first release will be an MVP, so we do not need to worry too much about scalability yet, although we should make sure that we can support the scale in the future.

Our target audience is located in the US in one region.
We do not need to support internationalization or worry too much about global availability for the MVP.
We should balance designing for the scale that we intend to reach in the future with the MVP timeline.
We will make use of an existing Auth as-a-service provider for the MVP, such as Auth0 or Firebase.

The MVP should be installable on a mobile phone.

Acceptable to install through a bookmark flow, but in the future the app should be in the app store.
The design should be mobile-first and will not be used on a desktop during the MVP. Responsive design should be used to support desktop use in the future.

Latency is important, we do not want a user to be surprised by sudden price swings and should make our best effort to secure the lowest price.

Prices should be updated in real-time so that a user can make a decision based on confidence in the current market price.

User Interaction Considerations

Loading States

Skeleton on first load for App, Page, components, form inputs in order to minimize Cumulative Layout Shift once data is loaded.

Error States

Include multiple levels of error boundaries for App, Page, Component, etc.
Provide human-readable reasons for failure and provide resolution via retry or submitting a support request.

Animations / Motion - Page transitions

Performance - Must abide by core web vital recommended ranges

Minimize Cumulative Layout Shift (less than 0.1)
Largest Contentful Paint (less than 2.5 seconds)
First Input Delay (Time to interaction - 100ms or less)

Accessibility

WCAG 2.1 AA compliance is important.
An accessibility auditing tool should be used.
Semantic HTML elements should be used.
WAI-Aria best-practices should be followed.
The application should make use of skip links to skip over navigation content for screen readers.

Internationalization - We do not need to worry about internationalization yet, but will in the future.

The Submit Order button should be disabled immediately to avoid sending multiple requests and should show a loading indicator while the request is being processed.

The user should see the correct number of shares that they own per asset on the Home Page.

High-level Design Mocks

Technical Requirements

Tech Stack

Front End

ReactJS

TypeScript

React Query for data-fetching and client side caching

Back End

TypeScript / NodeJS - (for post MVP we may consider a lower-level language like Go or Rust.)

PostgresSQL - ACID transactions are required for balances and transactions.

Redis Queue - For supporting pub / sub async queues.

Nginx - For load balancing.

Kubernetes & Docker - For container orchestration.

Estimated Traffic

The system must be able to handle 100 million requests per day for the one region.


~100,000,000 requests per day / ~30,000 seconds in a trading day
~3000 transactions per second
Up to 10x during peaks = ~30,000 transactions per second

NOTE: This is approximately 1/10th of the scale of the final application.

Back End System Design Considerations

Architecture

In order to ensure high availability and reliability, the following architecture is proposed.

Architecture Breakdown

Load Balancer

To ensure high levels of availability during peaks, a load balancer is used. The load balancer uses a weighted round robin algorithm to distribute traffic to our API servers. The API server is responsible for routing to a pub/sub topic in an asynchronous task queue.

Task Queue

With the potential for thousands of orders placed every second, the trades table will grow quickly. For that reason, we need to figure out a robust way to execute our trades and update our tables making sure that we fulfill the considerations above for availability, while also maintaining strong consistency for balance updates.

We can design this part of the system using the pub/sub pattern with an asynchronous task queue framework, which gives us the following qualities:

At least once delivery based on when the trade was executed.

Ensures that we only process a single trade at a time for a given user.

When a customer makes a trade, the load balancing server will write a row to the database and create a message that gets routed to a specific topic for that customer.

Given the nature of a market order, we don’t know the exact dollar amount of a trade until the trade has executed, so we will not return a response to the user until the trade has succeeded or failed.

Worker Cluster

The worker cluster subscribes to pub/sub messages and interacts with the exchange API given the receipt of a pub/sub message from the asynchronous task queue. The system makes use of etcd to handle leader election to ensure that a new leader is chosen if a node goes offline. This ensures high availability given the high amount of traffic we are expecting during peaks. For example, if our leader node goes offline for any reason, another node will take its place and an alert will be created in an observability system.

After a trade is executed successfully, the database will be updated, including updating the balance of the user and the status of the trade. Given our rough estimations above, we estimate that we’ll need approximately 10 clusters and task queue instances to handle traffic peaks, given that a safe number per machine is approximately 300 requests per second.

Database

The proposed features of this application are write-intensive and require strong consistency. A SQL database should be used to benefit from data normalization and ACID transactions.

Data Entities

API Endpoints

Errors

There are many reasons for errors in this system, including:

Unauthorized / Unauthenticated
Exchange is closed
Trade exceeds balance
Server Error

Reason: Human friendly reason for the error
Status Code: 5XX | 4XX

Auth

Authorization: Bearer <TOKEN>

Exchange

It’s assumed that we are utilizing an exchange API to interact with the exchange

Assets

GET Top Assets

URL: /api/v1/assets/top?offset=5&after=0
Inputs:

limit - the number of top assets to return per page
offset - the id to offset the pagination query from

Outputs:

Paginated list of top assets.


{
  data: [{ name: 'Alphabet Inc.', ticker: 'GOOG', ... }],
  meta: { total: 100, offset: 0, limit: 5 }
}

ℹ️

NOTE: see below for a tradeoff discussion regarding the choice of pagination technology.

GET Search Assets

URL: /api/v1/assets?search=AAPL
Inputs:

search - The ticker to search

NOTE: post MVP we should support a fuzzy search and introduce a typeahead component.

Outputs:

Asset information


{
  data: { name: 'Alphabet Inc.', ticker: 'GOOG', ... },
}

GET My Assets /api/v1/my-assets?limit=5&offset=0

Inputs:

limit - the number of top assets to return per page
offset - the id to offset the pagination query from

Outputs:

List of top assets (paginated)


{
  data: [{ name: 'Alphabet Inc.', ticker: 'GOOG', shares: 10, ... }],
  meta: { total: 5, offset: 0, limit: 5 }
}

Transactions

GET Asset Price


const socket = new WebSocket("URL: wss://ws.fakeexchange.com")
// Connection opened -> Subscribe
socket.addEventListener('open', function (event) {
  socket.send(JSON.stringify({ 'type':'subscribe', 'symbol': 'AAPL' }))
});

// Listen for messages
socket.addEventListener('message', function (event) {
	console.log('Message from server ', event.data);
});

// Unsubscribe
const unsubscribe = function(symbol) {
  socket.send(JSON.stringify({'type':'unsubscribe','symbol': 'AAPL' }))
}

ℹ️

See below for a tradeoff discussion regarding streaming technologies

Place Order - POST /api/v1/transactions/place-order

Inputs:

body (Content-Type: application/json)

Outputs:

Success - Status 200


type Body = {
  symbol: 'AAPL',
  type: 'market' | 'limit',
  side: 'buy' | 'sell',
  limitPrice?: number,
}

Account

As mentioned above, it’s assumed that we are using an existing service for Auth, so we will not go into the design for now and assume that we will follow the design docs of the service we choose (Auth0, firebase, etc.)

Front End Components & State Hierarchy

Reusable Components

<Button />

<ToggleButton />

<List />

<Modal />

<Error />

<Input />

<Heading />

<Typography />

Front End Optimizations

Due to the dynamic nature of the application, it makes sense to use client-side rendering, although we may consider using a framework like NextJS to have the flexibility to add server-rendered components in the future, for example for the marketing, news feed, etc.

Caching

While the design above focuses on the write transactions, we can add several types of caches to improve read performance.

React Query’s caching mechanism would work well for maintaining a client-side cache.
In the future, a server-side caching technology like Redis may be warranted, especially if we add more read-intensive features.

We should make use of React Query to pre-fetch data needed for a screen when user indicates that they are about to navigate to that screen.

Lazy Loading

We should make use of lazy loading and code chunking to only load static assets when they are required.
We should make use of React’s Suspense API to preload component assets.

PWA App Manifest

We should make use of an App Manifest to allow installation of the application on mobile phones.
Stretch goal: we should add support for offline-use using Service Worker so that the user is alerted if the app goes offline. This would improve the user experience by not serving them a blank page and by alerting them that they are unable to submit orders until their internet connection is restored.

Image optimization

While we don’t have a lot of images, if in the future we do add images for companies, users, etc. we should consider various methods for image optimization such as:

Image sprites, compression via gzip or brotli, providing different image sizes depending on the device and making use of a CDN.

Tradeoffs

Cursor vs. Offset Pagination

There are two options for how we could manage pagination for the top assets list. Below we will explore the pros and cons of each.

Cursor Pagination

Cursor based pagination is a method to paginate long lists of data by maintaining a pointer to the last record received by a client. On subsequent requests, the cursor is passed, and data added after that cursor is returned to the client. Cursor pagination works very well for infinite scrolling, such as a news-feed.


SELECT *
FROM top_assets
WHERE id > '236UWIrPdkjY2FQ1pluzGm6amXs' -- ID of last asset
ORDER BY id ASC
LIMIT 10;

Offset Pagination

Offset pagination is a common technique used for pagination that involves passing two query parameters to an endpoint containing the limit and offset. For example:


-- GET https://api.com/top-assets?limit=5&offset=0
SELECT *
FROM top_assets
ORDER BY id ASC
LIMIT 5
OFFSET 0;

ㅤ	Pros	Cons
Cursor Pagination	- Much improved database performance for large datasets - Works well for infinite scrolling	- Unable to jump to a specific page
Offset Pagination	- Allows jumping to different pages - Works well for page-based pagination	- Poor database performance for large datasets. - Has sequencing issues for data that is updated regularly.

Decision

Because we are only dealing with 5 assets at a time and the data will not be changing frequently, offset pagination makes sense. This has the added benefit of allowing a user to skip ahead to a page. For example, they could navigate to the end of the list to see the last 5 assets.

Web sockets vs. Server-Sent Events (SSEs) vs. Polling

Web sockets

Web sockets are built on top of TCP/IP and can be used to create a real time two way communication channel between the client and server.

Polling

Polling uses HTTP requests called at an interval to update data in a near real-time manner. Long polling is a subtype of polling that requires keeping the request open until new data is available

Server-Sent Events

One way communication from the server to the client. Uses the web’s EventSource API. Clients subscribe to an event source with a callback that is run when that event is sent.

ㅤ	Pros	Cons
Long Polling	- Makes use of HTTP2 / 3. - Easy to parallelize w/ multiplexing - Lower memory foot print on the client. - Easier to implement on both the client and server.	- Not truly real time. - Unidirectional communication. - More resource intensive on the server. - Message ordering can be an issue. - Doesn’t handle multiple clients well.
Web Sockets	- Reduced payload size due to lack of headers. - Supports full duplexing and real time two way communication. - Faster than HTTP. - Support on multiple platforms. - Payloads are smaller due to lack of HTTP headers.	- Limited browser support. - More complex to implement. - Maintains a persistent connection. - Higher memory overhead on the client.
Server-Sent Events	- Less resource intensive than web sockets and long polling. - Very easy to implement	- Requires HTTP 2/3 otherwise you run the risk of having too many requests open - Unidirectional communication only - Less platform support, requires a polyfill for mobile for example.

Decision

For this use case, web sockets make sense to update the real time price of an asset in the UI. The best user experience involves near real-time pricing updates so that users have confidence in the market pricing when they make a trade.

During the window where a user is placing a trade, the price will be updated frequently. Because we only have one use case and we need to support instantaneous updates, it makes sense to rely on web sockets. Server-Sent events would also work, however, it would include wasted data due to the inclusion of HTTP headers with every requests. Long Polling would not be instantaneous enough.