TheStage Apple SDK¶
Attention
Access to the Apple SDK requires an API token from the TheStage AI Platform.
Overview¶
On-device speech, language and audio inference for iOS and macOS on Apple Silicon.
The SDK ships compiled CoreML and MLX engines through HuggingFace, auto-detects the
best backend per device (ANE / GPU / CPU), and exposes a unified infer / infer_stream
API for every pipeline. No server in the hot path.
The SDK is distributed as a pre-built xcframework with a SwiftPM wrapper for native
Swift apps and a Flutter plugin for iOS. Both surfaces share the same on-disk model cache
and the same infer / infer_stream API.
Prerequisites¶
Requirement |
Minimum |
Tested with |
|---|---|---|
macOS |
15.0 |
15.6 |
iOS |
18.0 |
18.6 |
Xcode |
16.0 |
26.1 |
Swift |
6.0 |
6.2.1 |
Flutter (only for the Flutter examples) |
3.24 |
3.38.7 |
Dart |
3.5 |
3.10.7 |
Hardware |
Apple Silicon Mac or physical iPhone / iPad |
— |
The Simulator is not supported — MLX requires Metal on real hardware. The Flutter plugin and the two Flutter example apps are iOS-only; native Swift via SwiftPM runs on both iOS and macOS.
For the Flutter path you also need a Flutter toolchain
(brew install flutter, then flutter config --enable-swift-package-manager).
Obtaining an API Token¶
Every SDK call begins with initialize(apiToken:). You need a valid API token
from the TheStage AI Platform before any pipeline will load.
Sign in at app.thestage.ai.
Go to Profile → API tokens tab.
Click Generate API token, add a description, and press Generate token.
Copy the token immediately — you will not be able to view it again after leaving the page.
For the full walkthrough with screenshots, see SSH Keys and API Tokens.
Once you have a token, pass it to the SDK at startup:
Swift:
import TheStageSDK
let ai = TheStageAI.shared
try await ai.initialize(apiToken: "th_…")
Flutter:
import 'package:thestage_apple_sdk/thestage_apple_sdk.dart';
await TheStageFlutterSDK.initialize(api_token: 'th_…');
The token is validated once on first model start, then the SDK runs fully offline. If the device is temporarily disconnected, a 7-day grace window allows continued use without re-validation.
Attention
Keep your API token secret. Never commit it to source control. For Flutter
apps, use --dart-define-from-file=secrets.json and add secrets.json
to .gitignore. For macOS Swift apps, read it from an environment variable
(e.g. TS_API_TOKEN).
Example Apps¶
The repository ships ready-to-run examples. Each has its own README.md with
setup and run instructions — clone the repo and follow the guide for the example
that matches your use case.
Example |
Platform |
Description |
|---|---|---|
macOS (Swift) |
Minimal streaming TTS demo. Runs from the terminal with |
|
iOS (Flutter) |
Streaming neural TTS demo app for iPhone. Push text in, hear audio out in real time. |
|
iOS (Flutter) |
Full voice assistant loop: mic → VAD → STT → LLM → streaming TTS with barge-in. Requires an OpenAI API key for the LLM. |
Note
The Simulator is not supported — MLX requires Metal on real hardware. Flutter examples are iOS-only; native Swift via SwiftPM runs on both iOS and macOS.
Integration: Native Swift (SwiftPM)¶
In Xcode: File → Add Package Dependencies…, paste the repo URL, and add the
TheStageSDK product to your target. Or in Package.swift:
.package(url: "https://github.com/TheStageAI/AppleSDK.git", from: "1.0.0")
Then:
import TheStageSDK
let ai = TheStageAI.shared
try await ai.initialize(apiToken: "th_…")
let llm = try await TheStageLLM(
engines_path: "TheStageAI/Qwen3-0.6B",
on_load_progress: { p in
print("[\(p.model)] \(p.phase) \(Int(p.fraction * 100))%")
}
)
let result = llm.infer(
prompt: "Give me a one-line haiku about Swift.",
max_new_tokens: 64
)
print(result.text)
Every pipeline (TheStageLLM, WhisperPipeline, NeuTTSMultilingualPipeline,
NeuTTSNanoPipeline) shares the same constructor shape. Prefer the singleton
TheStageAI.shared.start_model(...) / infer(model_name:input_json:) flow when you
want lifecycle and JSON dispatch (e.g. driving the SDK from Flutter). Both flows share
the same on-disk cache and the same LoadProgress events.
Integration: Flutter Plugin¶
The plugin bundles the native framework — nothing to build or link. Three steps:
1. Add the git: dependency in your app’s pubspec.yaml, pinned to a tag:
dependencies:
thestage_apple_sdk:
git:
url: https://github.com/TheStageAI/AppleSDK.git
path: plugin/thestage_apple_sdk
ref: v1.0.0
2. Configure the iOS project once: enable SwiftPM and set the deployment target to iOS 18.0+:
flutter config --enable-swift-package-manager
# then in Xcode: Runner target → General → Minimum Deployments → iOS 18.0
3. Use it (flutter pub get, then run on a physical device):
import 'package:thestage_apple_sdk/thestage_apple_sdk.dart';
await TheStageFlutterSDK.initialize(api_token: 'th_…');
await TheStageFlutterSDK.start_model(
model_name: 'llm',
engines_path: 'TheStageAI/Qwen3-0.6B',
);
final result = await TheStageFlutterSDK.infer(
model_name: 'llm',
input_json: {
'prompt': 'Give me a one-line haiku about Swift.',
'max_new_tokens': 64,
},
);
print(result[0]['text']);
Swift / Flutter Parity¶
The Swift singleton (TheStageAI.shared) and the Flutter TheStageFlutterSDK
mirror each other one-to-one. Pipeline constructors (TheStageLLM(...),
WhisperPipeline(...), etc.) are Swift-only — Dart consumers always go through the
JSON path.
Operation |
Swift |
Flutter (Dart) |
|---|---|---|
Initialize |
|
|
Start a model |
|
|
Stop a model |
|
|
Single-shot inference |
|
|
Streaming inference |
|
|
Push text into a TTS stream |
|
|
Cancel a running stream |
|
|
Load progress |
|
Global stream |
Audio buffer type |
|
|
Note
The Swift initializer is initialize(apiToken:) (camelCase), while the Flutter
call is initialize(api_token:) (snake_case).
Load Progress¶
All public loaders accept an optional on_load_progress: LoadProgressHandler that
fires through four phases with a monotonic fraction in 0...1:
Phase |
Fraction band |
Notes |
|---|---|---|
|
0.00 – 0.70 |
HuggingFace repo download (skipped on cache hit) |
|
0.70 – 0.85 |
Bundle unpack to local cache (skipped on cache hit) |
|
0.85 – 0.99 |
Pipeline construction |
|
1.00 (terminal) |
Emitted on success only |
Audio I/O Contract¶
All audio crossing the public SDK surface uses PCM [Float], mono, samples
normalized to [-1.0, 1.0]. Sample rate depends on the pipeline:
Pipeline |
Direction |
Sample rate |
Frame / chunking |
|---|---|---|---|
|
input |
16 000 Hz |
exactly 512 samples per |
|
input |
16 000 Hz |
any length; auto-split into 10 s windows |
|
output |
24 000 Hz |
streamer emits per-sentence chunks; batch emits one full |
Secrets¶
The Flutter example apps read tokens at build time via String.fromEnvironment(...)
and --dart-define-from-file=secrets.json. Each ships a secrets.example.json
template — copy it to secrets.json and fill in your keys. secrets.json is
covered by .gitignore; real keys never belong in source. The macOS example reads
TS_API_TOKEN from the environment instead.