TheStage Apple SDK¶

Attention

Access to the Apple SDK requires an API token from the TheStage AI Platform. Pricing and Device Seats are by arrangement — open a Service Request at app.thestage.ai/contact. See TheStage Apple SDK Product Terms and Licensing & Device Identity.

GitHub: TheStageAI/AppleSDK

License / Product Terms: TheStage Apple SDK Product Terms (shipped LICENSE points here)

Overview ¶

On-device speech, language and audio inference for iOS and macOS on Apple Silicon. The SDK ships compiled CoreML and MLX engines through HuggingFace, auto-detects the best backend per device (ANE / GPU / CPU), and exposes a unified infer / infer_stream API for every pipeline. No server in the hot path.

The SDK is distributed as a pre-built xcframework with a SwiftPM wrapper for native Swift apps and a Flutter plugin for iOS. Both surfaces share the same on-disk model cache and the same infer / infer_stream API.

Prerequisites ¶

Requirement	Minimum	Tested with
macOS	15.0	15.6
iOS	18.0	18.6
Xcode	16.0	26.1
Swift	6.0	6.2.1
Flutter (only for the Flutter examples)	3.24	3.38.7
Dart	3.5	3.10.7
Hardware	Apple Silicon Mac or physical iPhone / iPad	—

The Simulator is not supported — MLX requires Metal on real hardware. The Flutter plugin and the two Flutter example apps are iOS-only; native Swift via SwiftPM runs on both iOS and macOS.

For the Flutter path you also need a Flutter toolchain (brew install flutter, then flutter config --enable-swift-package-manager).

Obtaining an API Token ¶

Every SDK call begins with initialize(apiToken:). You need a valid API token from the TheStage AI Platform before any pipeline will load.

Sign in at app.thestage.ai.
Go to Profile → API tokens tab.
Click Generate API token, add a description, and press Generate token.
Copy the token immediately — you will not be able to view it again after leaving the page.

For the full walkthrough with screenshots, see SSH Keys and API Tokens.

Once you have a token, pass it to the SDK at startup:

Swift:

import TheStageSDK

let ai = TheStageAI.shared
try await ai.initialize(apiToken: "th_…")

Flutter:

import 'package:thestage_apple_sdk/thestage_apple_sdk.dart';

await TheStageFlutterSDK.initialize(api_token: 'th_…');

The token is validated once on first model start, then the SDK runs fully offline. If the device is temporarily disconnected, a 7-day grace window allows continued use without re-validation.

Attention

Keep your API token secret. Never commit it to source control. For Flutter apps, use --dart-define-from-file=secrets.json and add secrets.json to .gitignore. For macOS Swift apps, read it from an environment variable (e.g. TS_API_TOKEN).

Example Apps ¶

The repository ships ready-to-run examples. Each has its own README.md with setup and run instructions — clone the repo and follow the guide for the example that matches your use case.

Example	Platform	Description
macos_swift_tts	macOS (Swift)	Minimal streaming TTS demo. Runs from the terminal with `swift run` — no Xcode, no device. Start here to hear the SDK work in under a minute.
tts_front_stream	iOS (Flutter)	Streaming neural TTS demo app for iPhone. Push text in, hear audio out in real time.
voice_agent	iOS (Flutter)	Full voice assistant loop: mic → VAD → STT → LLM → streaming TTS with barge-in. Requires an OpenAI API key for the LLM.

Note

The Simulator is not supported — MLX requires Metal on real hardware. Flutter examples are iOS-only; native Swift via SwiftPM runs on both iOS and macOS.

Integration: Native Swift (SwiftPM)¶

In Xcode: File → Add Package Dependencies…, paste the repo URL, and add the TheStageSDK product to your target. Or in Package.swift:

.package(url: "https://github.com/TheStageAI/AppleSDK.git", from: "1.0.0")

Then:

import TheStageSDK

let ai = TheStageAI.shared
try await ai.initialize(apiToken: "th_…")

let llm = try await TheStageLLM(
    engines_path: "TheStageAI/Qwen3-0.6B",
    on_load_progress: { p in
        print("[\(p.model)] \(p.phase) \(Int(p.fraction * 100))%")
    }
)

let result = llm.infer(
    prompt: "Give me a one-line haiku about Swift.",
    max_new_tokens: 64
)
print(result.text)

Every pipeline (TheStageLLM, WhisperPipeline, NeuTTSMultilingualPipeline, NeuTTSNanoPipeline) shares the same constructor shape. Prefer the singleton TheStageAI.shared.start_model(...) / infer(model_name:input_json:) flow when you want lifecycle and JSON dispatch (e.g. driving the SDK from Flutter). Both flows share the same on-disk cache and the same LoadProgress events.

Integration: Flutter Plugin ¶

The plugin bundles the native framework — nothing to build or link. Three steps:

1. Add the git: dependency in your app’s pubspec.yaml, pinned to a tag:

dependencies:
  thestage_apple_sdk:
    git:
      url: https://github.com/TheStageAI/AppleSDK.git
      path: plugin/thestage_apple_sdk
      ref: v1.0.0

2. Configure the iOS project once: enable SwiftPM and set the deployment target to iOS 18.0+:

flutter config --enable-swift-package-manager
# then in Xcode: Runner target → General → Minimum Deployments → iOS 18.0

3. Use it (flutter pub get, then run on a physical device):

import 'package:thestage_apple_sdk/thestage_apple_sdk.dart';

await TheStageFlutterSDK.initialize(api_token: 'th_…');

await TheStageFlutterSDK.start_model(
  model_name: 'llm',
  engines_path: 'TheStageAI/Qwen3-0.6B',
);

final result = await TheStageFlutterSDK.infer(
  model_name: 'llm',
  input_json: {
    'prompt': 'Give me a one-line haiku about Swift.',
    'max_new_tokens': 64,
  },
);
print(result[0]['text']);

Swift / Flutter Parity ¶

The Swift singleton (TheStageAI.shared) and the Flutter TheStageFlutterSDK mirror each other one-to-one. Pipeline constructors (TheStageLLM(...), WhisperPipeline(...), etc.) are Swift-only — Dart consumers always go through the JSON path.

Operation	Swift	Flutter (Dart)
Initialize	`try await TheStageAI.shared.initialize(apiToken: "...")`	`await TheStageFlutterSDK.initialize(api_token: '...')`
Start a model	`try await ai.start_model(model_name:engines_path:config:on_load_progress:)`	`await TheStageFlutterSDK.start_model(model_name:, engines_path:, config:)`
Stop a model	`_ = try ai.stop_model(model_name: "llm")`	`await TheStageFlutterSDK.stop_model(model_name: 'llm')`
Single-shot inference	`try ai.infer(model_name:input_json:) -> [[String: Any]]`	`await TheStageFlutterSDK.infer(model_name:, input_json:) -> List<Map<String, dynamic>>`
Streaming inference	`try ai.infer_stream(model_name:input_json:) -> AsyncStream<InferenceStreamChunk>`	`TheStageFlutterSDK.infer_stream(model_name:, input_json:, stream_id:?) -> Stream<Map<String, dynamic>>`
Push text into a TTS stream	`streamer.send(text); streamer.stop_stream()`	`await TheStageFlutterSDK.send(stream_id:, text:); await TheStageFlutterSDK.finish_stream(stream_id:)`
Cancel a running stream	`streamer.stop_stream()`	`await TheStageFlutterSDK.stop_stream(stream_id:)`
Load progress	`on_load_progress: LoadProgressHandler?` on `start_model` / constructors	Global stream `TheStageFlutterSDK.on_progress` (`{model_name, phase, progress}`)
Audio buffer type	`[Float]`	`Float32List` (never `Float64List`)

Note

The Swift initializer is initialize(apiToken:) (camelCase), while the Flutter call is initialize(api_token:) (snake_case).

Load Progress ¶

All public loaders accept an optional on_load_progress: LoadProgressHandler that fires through four phases with a monotonic fraction in 0...1:

Phase	Fraction band	Notes
`downloading`	0.00 – 0.70	HuggingFace repo download (skipped on cache hit)
`extracting`	0.70 – 0.85	Bundle unpack to local cache (skipped on cache hit)
`loading`	0.85 – 0.99	Pipeline construction
`ready`	1.00 (terminal)	Emitted on success only

Audio I/O Contract ¶

All audio crossing the public SDK surface uses PCM [Float], mono, samples normalized to [-1.0, 1.0]. Sample rate depends on the pipeline:

Pipeline	Direction	Sample rate	Frame / chunking
`SileroVAD`	input	16 000 Hz	exactly 512 samples per `infer` (32 ms); stateful
`WhisperPipeline`	input	16 000 Hz	any length; auto-split into 10 s windows
`NeuTTSMultilingualPipeline` / `NeuTTSNanoPipeline`	output	24 000 Hz	streamer emits per-sentence chunks; batch emits one full `[Float]`

Secrets ¶

The Flutter example apps read tokens at build time via String.fromEnvironment(...) and --dart-define-from-file=secrets.json. Each ships a secrets.example.json template — copy it to secrets.json and fill in your keys. secrets.json is covered by .gitignore; real keys never belong in source. The macOS example reads TS_API_TOKEN from the environment instead.

Also see ¶

Voice Agent — end-to-end voice assistant
Benchmarks — M2 Max NPU release numbers
Speaker embedding — enroll / cosine verify
Logging & crash breadcrumbs — session logs and support breadcrumbs
Licensing & Device Identity — device seats; links to Product Terms
TheStage Apple SDK Product Terms — commercial terms (pricing: contact TheStage AI)
Agent pack: llms.txt in the SDK docs tree (machine-readable page / symbol index)

TheStage Apple SDK