TheStage Apple SDK

Attention

Access to the Apple SDK requires an API token from the TheStage AI Platform.

License: See LICENSE

Overview

On-device speech, language and audio inference for iOS and macOS on Apple Silicon. The SDK ships compiled CoreML and MLX engines through HuggingFace, auto-detects the best backend per device (ANE / GPU / CPU), and exposes a unified infer / infer_stream API for every pipeline. No server in the hot path.

The SDK is distributed as a pre-built xcframework with a SwiftPM wrapper for native Swift apps and a Flutter plugin for iOS. Both surfaces share the same on-disk model cache and the same infer / infer_stream API.

Prerequisites

Requirement

Minimum

Tested with

macOS

15.0

15.6

iOS

18.0

18.6

Xcode

16.0

26.1

Swift

6.0

6.2.1

Flutter (only for the Flutter examples)

3.24

3.38.7

Dart

3.5

3.10.7

Hardware

Apple Silicon Mac or physical iPhone / iPad

The Simulator is not supported — MLX requires Metal on real hardware. The Flutter plugin and the two Flutter example apps are iOS-only; native Swift via SwiftPM runs on both iOS and macOS.

For the Flutter path you also need a Flutter toolchain (brew install flutter, then flutter config --enable-swift-package-manager).

Obtaining an API Token

Every SDK call begins with initialize(apiToken:). You need a valid API token from the TheStage AI Platform before any pipeline will load.

  1. Sign in at app.thestage.ai.

  2. Go to Profile → API tokens tab.

  3. Click Generate API token, add a description, and press Generate token.

  4. Copy the token immediately — you will not be able to view it again after leaving the page.

For the full walkthrough with screenshots, see SSH Keys and API Tokens.

Once you have a token, pass it to the SDK at startup:

Swift:

import TheStageSDK

let ai = TheStageAI.shared
try await ai.initialize(apiToken: "th_…")

Flutter:

import 'package:thestage_apple_sdk/thestage_apple_sdk.dart';

await TheStageFlutterSDK.initialize(api_token: 'th_…');

The token is validated once on first model start, then the SDK runs fully offline. If the device is temporarily disconnected, a 7-day grace window allows continued use without re-validation.

Attention

Keep your API token secret. Never commit it to source control. For Flutter apps, use --dart-define-from-file=secrets.json and add secrets.json to .gitignore. For macOS Swift apps, read it from an environment variable (e.g. TS_API_TOKEN).

Example Apps

The repository ships ready-to-run examples. Each has its own README.md with setup and run instructions — clone the repo and follow the guide for the example that matches your use case.

Example

Platform

Description

macos_swift_tts

macOS (Swift)

Minimal streaming TTS demo. Runs from the terminal with swift run — no Xcode, no device. Start here to hear the SDK work in under a minute.

tts_front_stream

iOS (Flutter)

Streaming neural TTS demo app for iPhone. Push text in, hear audio out in real time.

voice_agent

iOS (Flutter)

Full voice assistant loop: mic → VAD → STT → LLM → streaming TTS with barge-in. Requires an OpenAI API key for the LLM.

Note

The Simulator is not supported — MLX requires Metal on real hardware. Flutter examples are iOS-only; native Swift via SwiftPM runs on both iOS and macOS.

Integration: Native Swift (SwiftPM)

In Xcode: File → Add Package Dependencies…, paste the repo URL, and add the TheStageSDK product to your target. Or in Package.swift:

.package(url: "https://github.com/TheStageAI/AppleSDK.git", from: "1.0.0")

Then:

import TheStageSDK

let ai = TheStageAI.shared
try await ai.initialize(apiToken: "th_…")

let llm = try await TheStageLLM(
    engines_path: "TheStageAI/Qwen3-0.6B",
    on_load_progress: { p in
        print("[\(p.model)] \(p.phase) \(Int(p.fraction * 100))%")
    }
)

let result = llm.infer(
    prompt: "Give me a one-line haiku about Swift.",
    max_new_tokens: 64
)
print(result.text)

Every pipeline (TheStageLLM, WhisperPipeline, NeuTTSMultilingualPipeline, NeuTTSNanoPipeline) shares the same constructor shape. Prefer the singleton TheStageAI.shared.start_model(...) / infer(model_name:input_json:) flow when you want lifecycle and JSON dispatch (e.g. driving the SDK from Flutter). Both flows share the same on-disk cache and the same LoadProgress events.

Integration: Flutter Plugin

The plugin bundles the native framework — nothing to build or link. Three steps:

1. Add the git: dependency in your app’s pubspec.yaml, pinned to a tag:

dependencies:
  thestage_apple_sdk:
    git:
      url: https://github.com/TheStageAI/AppleSDK.git
      path: plugin/thestage_apple_sdk
      ref: v1.0.0

2. Configure the iOS project once: enable SwiftPM and set the deployment target to iOS 18.0+:

flutter config --enable-swift-package-manager
# then in Xcode: Runner target → General → Minimum Deployments → iOS 18.0

3. Use it (flutter pub get, then run on a physical device):

import 'package:thestage_apple_sdk/thestage_apple_sdk.dart';

await TheStageFlutterSDK.initialize(api_token: 'th_…');

await TheStageFlutterSDK.start_model(
  model_name: 'llm',
  engines_path: 'TheStageAI/Qwen3-0.6B',
);

final result = await TheStageFlutterSDK.infer(
  model_name: 'llm',
  input_json: {
    'prompt': 'Give me a one-line haiku about Swift.',
    'max_new_tokens': 64,
  },
);
print(result[0]['text']);

Swift / Flutter Parity

The Swift singleton (TheStageAI.shared) and the Flutter TheStageFlutterSDK mirror each other one-to-one. Pipeline constructors (TheStageLLM(...), WhisperPipeline(...), etc.) are Swift-only — Dart consumers always go through the JSON path.

Operation

Swift

Flutter (Dart)

Initialize

try await TheStageAI.shared.initialize(apiToken: "...")

await TheStageFlutterSDK.initialize(api_token: '...')

Start a model

try await ai.start_model(model_name:engines_path:config:on_load_progress:)

await TheStageFlutterSDK.start_model(model_name:, engines_path:, config:)

Stop a model

_ = try ai.stop_model(model_name: "llm")

await TheStageFlutterSDK.stop_model(model_name: 'llm')

Single-shot inference

try ai.infer(model_name:input_json:) -> [[String: Any]]

await TheStageFlutterSDK.infer(model_name:, input_json:) -> List<Map<String, dynamic>>

Streaming inference

try ai.infer_stream(model_name:input_json:) -> AsyncStream<InferenceStreamChunk>

TheStageFlutterSDK.infer_stream(model_name:, input_json:, stream_id:?) -> Stream<Map<String, dynamic>>

Push text into a TTS stream

streamer.send(text); streamer.stop_stream()

await TheStageFlutterSDK.send(stream_id:, text:); await TheStageFlutterSDK.finish_stream(stream_id:)

Cancel a running stream

streamer.stop_stream()

await TheStageFlutterSDK.stop_stream(stream_id:)

Load progress

on_load_progress: LoadProgressHandler? on start_model / constructors

Global stream TheStageFlutterSDK.on_progress ({model_name, phase, progress})

Audio buffer type

[Float]

Float32List (never Float64List)

Note

The Swift initializer is initialize(apiToken:) (camelCase), while the Flutter call is initialize(api_token:) (snake_case).

Load Progress

All public loaders accept an optional on_load_progress: LoadProgressHandler that fires through four phases with a monotonic fraction in 0...1:

Phase

Fraction band

Notes

downloading

0.00 – 0.70

HuggingFace repo download (skipped on cache hit)

extracting

0.70 – 0.85

Bundle unpack to local cache (skipped on cache hit)

loading

0.85 – 0.99

Pipeline construction

ready

1.00 (terminal)

Emitted on success only

Audio I/O Contract

All audio crossing the public SDK surface uses PCM [Float], mono, samples normalized to [-1.0, 1.0]. Sample rate depends on the pipeline:

Pipeline

Direction

Sample rate

Frame / chunking

SileroVAD

input

16 000 Hz

exactly 512 samples per infer (32 ms); stateful

WhisperPipeline

input

16 000 Hz

any length; auto-split into 10 s windows

NeuTTSMultilingualPipeline / NeuTTSNanoPipeline

output

24 000 Hz

streamer emits per-sentence chunks; batch emits one full [Float]

Secrets

The Flutter example apps read tokens at build time via String.fromEnvironment(...) and --dart-define-from-file=secrets.json. Each ships a secrets.example.json template — copy it to secrets.json and fill in your keys. secrets.json is covered by .gitignore; real keys never belong in source. The macOS example reads TS_API_TOKEN from the environment instead.