Maestro Actuation And Control Stack¶

This document defines how Arqon Maestro performs action in the world.

It is the formal stack reference for control and execution boundaries across desktop, browser, and mixed-interface environments.

Core Position¶

Maestro is a Voice Operating System, not a personal assistant shell.

The actuation model must therefore be:

deterministic first
layered by capability
explicit about fallback order
fail-closed when policy or identity is uncertain

AGO Boundary¶

Maestro and Nexus are separate sibling AGOs:

Maestro owns spoken operating, command interpretation, and governed actuation
Nexus owns assistant continuity, personal context, and long-horizon guidance

Nexus must not absorb Maestro's operating control boundary.

Why A Multi-Layer Stack¶

No single automation technology can reliably cover:

global desktop surfaces
structured web app automation
visually dynamic/OCR-only interfaces
mixed native + browser + remote UI environments

The stack is intentionally layered so each layer is used for what it is best at.

Layered Actuation Model¶

Layer 0: Native/Semantic Control First¶

Use deterministic semantic control whenever available:

explicit command grammars
MCP-routed skill execution
app/tool APIs and structured selectors

This is the default lane for trustworthy operating behavior.

Layer 1: Talon For Global Desktop Control¶

Talon is the adopted global desktop control layer for cross-application OS interaction, input synthesis, and broad desktop reach.

Use when:

command intent targets native desktop surfaces
no stronger app-specific semantic interface exists
low-friction global control is required

Layer 2: Playwright For Structured Web Automation¶

Playwright is the adopted structured browser/web automation layer.

Use when:

the target is a web page/app with stable DOM semantics
robust selector-driven control is possible
repeatable browser automation is needed

Layer 3: UI.Vision For Visual/OCR Fallback¶

UI.Vision is retained as a bounded visual/OCR fallback, not the primary universal controller.

Use when:

semantic control is unavailable or insufficient
the interface is visually rendered or selector-hostile
OCR/image matching is required for progress

Control Selection Policy¶

Actuation should follow this order:

native/semantic control
structured browser automation (Playwright)
global desktop control (Talon) when semantic routes are unavailable
visual/OCR fallback (UI.Vision) only when higher-confidence paths fail

Architectural rule:

prefer native/semantic control before visual fallback
do not bypass higher-confidence layers without explicit reason

Security And Identity¶

Voice identity is a first-class security signal in this stack.

High-impact actions should include:

speaker-aware authorization checks where policy requires it
mode-aware capability constraints
fail-closed refusal when identity/policy confidence is insufficient

This applies across all actuation layers, including fallback layers.

STT/TTS Contract Position¶

STT and TTS remain provider contracts:

STT: dual-profile routing (command-fast, dictation-accurate) with provider flexibility
TTS: persona-driven broker with fallback behavior and provider abstraction

Adopted providers are implementation defaults, not lock-in points.

Shell And Runtime Boundary¶

Shell and runtime are intentionally separated:

shell hosts UX, tray, and operator controls
runtime owns hot path, routing, governance, and actuation policy

Shell migration (Electron to Tauri) must not change actuation contracts.

Rejected Architecture¶

The stack rejects a single-tool control model.

Reasons:

brittle across heterogeneous targets
poor reliability in dynamic UIs
weak governance boundaries
high operational coupling and lock-in risk

Decision Cross-References¶

This document is the detailed rationale reference for:

ADM-040 through ADM-047 in the decision log
the Voice OS identity and layered control commitments