TheStage AI Platform: Self-hosted Instances

Use case

If you have high-performance servers already in place, you can connect these self-hosted servers to TheStage AI’s platform. This integration allows you to leverage all the features of the platform on your private hardware, enabling efficient management and distribution of AI workloads across both rented and self-hosted resources, maximizing utilization and minimizing costs.

Self-hosted Server vs. Self-hosted Instance

To connect an existing server (self-hosted server) to TheStage AI’s computational cluster, you need to create a digital representation or profile for the server on TheStage AI platform (self-hosted instance). This profile acts as a bridge between the server’s physical resources and other rented or linked servers on TheStage AI, allowing for efficient management of tasks and resources.

Connecting a Self-hosted Server

Warning

Currently, TheStage AI supports servers equipped with NVIDIA GPUs and running Ubuntu.

To connect a self-hosted server to TheStage AI’s computational cluster, follow these steps:

  1. Set Up on TheStage AI: Begin by creating a self-hosted instance within TheStage AI. Initially, this instance will be marked as “awaiting setup”.

  2. Configure Your Server: Download the file containing the JWT token for TheStage AI worker daemon and place it in /etc/thestage-daemon-token on your server.

  3. Launch the Worker Daemon: Get TheStage AI worker daemon up and running on your server. The self-hosted instance status will be changed to “running”.

1. Creating a self-hosted instance entry

To create a self-hosted instance entry:

  1. Login to your account and navigate to the Self-hosted Instances section:

  2. Press the “Add instance” button.

  3. Fill our the required fields and press the “Next step” button:

  4. Important: Download the file with the daemon’s JWT token before closing this window. Once closed, you cannot return to this step.

  5. A self-hosted instance will be created and marked as “awaiting setup” while you configure your self-hosted server:

2. Configuring your server and launch the daemon

To configure your server:

  1. Make sure to install Docker and Git on your server. There are no specific version requirements for this software.

  2. Confirm that your server has an active internet connection.

  3. Ensure the file containing the daemon’s JWT token is correctly placed in /etc/thestage-daemon-tokenA JWT token is generated during the creation of a self-hosted instance on TheStage AI, and the file with this JWT token can be downloaded during the self-hosted instance setup.

  4. Start the Worker Daemon: Run the following commands in your server’s terminal to connect it to TheStage AI:

sudo curl https://thestage-daemon.s3.amazonaws.com/gpg.public.dearmored -o /etc/apt/keyrings/thestage.gpg
echo 'deb [signed-by=/etc/apt/keyrings/thestage.gpg] https://thestage.jfrog.io/artifactory/thestage-ai generic main' | sudo tee /etc/apt/sources.list.d/thestage.list
sudo apt update -yq

# This command installs and starts TheStage AI daemon
sudo apt install -yq thestage-daemon
  1. To check the daemon’s status, use:

systemctl status thestage-daemon

For the Tech-Savvy:

You can download the binary file to run it in any environment:

sudo curl https://thestage-daemon.s3.amazonaws.com/thestage-daemon_$(curl https://thestage-daemon.s3.amazonaws.com/version.txt)_$(uname -m).deb -o /usr/local/bin/thestage-daemon
sudo chmod +x /usr/local/bin/thestage-daemon
  1. Once activated, the daemon scans the server’s hardware and generates a unique identifier. This identifier, along with hardware details like RAM size, disk capacity, CPU count and model, number of CPU cores, GPU details, etc., is sent to TheStage AI. This action sets the self-hosted instance to “running” on TheStage AI and associates it with the unique ID, enabling the daemon to manage container tasks.

3. Updating the daemon

In some cases, you will need to update TheStage AI worker daemon running on your server(s) connected to TheStage AI platform. You can do it manually or in an unattended way.

Manual upgrade

To check if there is a new version of TheStage AI worker daemon, use the following command in the console:

sudo apt show thestage-daemon

To manually update the daemon, use the command:

sudo apt-get install --only-upgrade thestage-daemon

To check the daemon’s status, use:

systemctl status thestage-daemon

Unattended upgrade

  1. In the file /etc/apt/apt.conf.d/50unattended-upgrades add the Unattended-Upgrade::Origins-Pattern block or amend it if it already exists so it looks like the following:

Unattended-Upgrade::Origins-Pattern {
        "site=thestage.jfrog.io";
}
  1. Run crontab -e as root and put the following line in the editor that opens:

0 0 * * * DEBIAN_FRONTEND=noninteractive apt update -yq && apt install -yq thestage-daemon

This will update the daemon once every day at 00:00. If another timing is preferable, adjust the schedule accordingly.

Health Checks

The daemon regularly sends health checks to TheStage AI, which tracks the timing of these updates. If updates cease, TheStage AI marks the self-hosted instance as “terminated”.

Altering Self-hosted Server

Every time the TheStage AI daemon launches, such as after a server reboot, it conducts a fresh hardware scan and generates a new unique identifier based on the current hardware configuration. This new identifier is then sent to TheStage AI for verification against the previously stored identifier for that server. Any discrepancy, indicating a change in hardware, results in the server being disconnected from TheStage AI’s compute cluster and the instance status being set to “terminated”.

Important: Should you modify the hardware of your self-hosted server connected to TheStage AI and want to keep it connected, you will need to reconfigure the self-hosted instance in your TheStage AI account. This also applies if you replace a connected self-hosted server with a new one.

Termination

Stopping TheStage AI runner daemon or altering the server’s hardware disconnects the self-hosted server from TheStage AI computational cluster.

Self-hosted instances with “awaiting setup” and “terminated” statuses can be deleted.

Statuses

Possible statuses of self-hosted instances and their meanings:

  • awaiting setup: A self-hosted instance has been created and is awaiting configuration of your self-hosted server.

  • online: Your self-hosted server is connected to TheStage AI platform.

  • terminated: Your self-hosted server instance has been disconnected.