Roadmap¶
ArqonHPO follows semantic versioning. This document outlines our vision, planned features, and development priorities.
Current: v0.3.x (Stable)¶
Status: ✅ Released (January 2026)
The production-ready foundation for real-time hyperparameter optimization.
Core Capabilities¶
- PCR Algorithm — The Probe-Classify-Refine pipeline automatically selects strategies based on landscape structure
- Python Bindings —
ArqonSolverfor batch optimization,ArqonProbefor distributed LDS sampling - CLI & Observability — TUI, web Dashboard, Prometheus metrics, structured logging
- Safety Executor — Constitutional guarantees with Guardrails, Rollback Policy, and audit trail
- 91% Test Coverage — Comprehensive unit and integration tests
What Makes v0.3 Production-Ready¶
| Requirement | Status |
|---|---|
| Deterministic execution | ✅ ChaCha8Rng |
| Bounded memory | ✅ Ring buffers |
| Recoverable state | ✅ Snapshot + rollback |
| Observable | ✅ Metrics, traces, audit |
| Documented | ✅ 50+ pages |
Next: v0.4.0 (Q1 2026)¶
Status: 🔨 In Development
The "Infrastructure Phase" — making ArqonHPO deployable anywhere with comprehensive observability.
New Features¶
Adaptive Nelder-Mead with Restart Detection¶
The current Nelder-Mead implementation can stall when the simplex degenerates. v0.4 adds automatic restart detection:
// Detects when simplex volume collapses below threshold
// Automatically reinitializes with exploration phase
let nm = NelderMead::with_restart(config, restart_threshold);
- Problem: Simplex can collapse to a line in high dimensions
- Solution: Monitor simplex volume, trigger restart when degenerate
- Benefit: More robust convergence on ill-conditioned surfaces
Multi-Objective Optimization (MOO)¶
Support for Pareto-front optimization with multiple objectives:
config = {
"objectives": ["latency", "throughput"], # Both minimize
"mode": "pareto"
}
solver = ArqonSolver(json.dumps(config))
candidates = solver.ask() # Returns Pareto-optimal candidates
- Algorithm: NSGA-II style dominance sorting
- Output: Pareto front with configurable size
- Trade-off: User selects from Pareto set
ArqonShip v1.0 (Self-Healing CI)¶
The ArqonShip integration reaches v1.0 with:
- Automatic rollback on CI failure
- Bisection to find breaking commits
- Artifact correlation with test results
- Slack/Discord notifications
Infrastructure¶
Docker Images¶
Official multi-platform images:
docker pull ghcr.io/novelbytelabs/arqonhpo:0.4
docker pull ghcr.io/novelbytelabs/arqonhpo:0.4-slim # ~20MB
Platforms: linux/amd64, linux/arm64, darwin/arm64
Helm Chart¶
Production Kubernetes deployment:
helm repo add arqon https://novelbytelabs.github.io/charts
helm install arqon arqon/arqonhpo \
--set config.budget=1000 \
--set monitoring.enabled=true
Includes: ServiceMonitor, PodDisruptionBudget, HPA
OpenTelemetry Integration¶
Full OTel support for distributed tracing:
- Traces: ask/tell spans with parameter context
- Metrics: OTel metrics alongside Prometheus
- Logs: Structured logs with trace correlation
Future: v1.0.0 (Q2 2026)¶
Status: 🎯 Planned
The "Scale Phase" — GPU acceleration and distributed optimization for massive parameter spaces.
GPU-Accelerated Strategies¶
CUDA Backend¶
TPE kernel density estimation on GPU:
- Speedup: 100x for dim > 100
- Memory: Configurable batch size
- Fallback: Graceful CPU fallback
Metal Backend (Apple Silicon)¶
Native M1/M2/M3 support via Metal:
Distributed Optimization¶
Sharded Workers¶
Horizontal scaling across nodes:
# arqon-distributed.yaml
mode: distributed
coordinator:
address: arqon-coordinator:9090
workers:
count: 16
strategy: round_robin
sharding:
probe: by_index # ArqonProbe sharding
refine: by_region # Spatial partitioning
- Coordinator: Central state, Raft consensus
- Workers: Stateless evaluators
- Communication: gRPC with protobuf
Fault Tolerance¶
Automatic recovery from worker failures:
- Checkpoint every N evaluations
- Replay from last checkpoint
- Worker health checks with timeout
Full Bayesian TPE¶
The current TPE uses point estimates. v1.0 adds full Bayesian treatment:
- Prior: Configurable Gaussian or Student-t
- Acquisition: EI, PI, UCB, Thompson Sampling
- Uncertainty: Credible intervals on predictions
Ecosystem Integrations¶
| Integration | Status | Description |
|---|---|---|
| Ray Tune | ✅ v0.3 | Custom searcher |
| MLflow | ✅ v0.3 | Tracking plugin |
| W&B | 🎯 v1.0 | Callback + sweeps |
| Optuna | 🎯 v1.0 | Sampler adapter |
| Ax | 🎯 v1.0 | Backend provider |
Long-Term Vision (v2.0+)¶
ArqonHPO as Infrastructure¶
Our north star: Optimization should be as reliable as a database.
| Database Property | ArqonHPO Equivalent |
|---|---|
| ACID transactions | Atomic config updates |
| Point-in-time recovery | Snapshot rollback |
| Replication | Distributed coordinators |
| Query optimization | Auto-strategy selection |
Zero-Configuration Optimization¶
Fully automatic strategy selection:
# v2.0 vision: just specify the objective
solver = ArqonSolver.auto(budget=100)
# ArqonHPO infers bounds, selects strategy, tunes hyperparams
Sub-Microsecond Overhead¶
Target: <1μs latency for ask() in hot path:
| Version | ask() Latency |
|---|---|
| v0.3 | ~3ms |
| v0.4 | ~500μs |
| v1.0 | ~100μs |
| v2.0 | <1μs |
Constitutional AI Safety¶
Extending constitutional safety to AI-in-the-loop:
constitution:
max_regression_before_human: 3
human_approval_required_for:
- production_deploy
- safety_override
audit_retention: 90d
What We're NOT Building (Anti-Roadmap)¶
Clarity on scope:
| Feature | Why Not |
|---|---|
| Neural Architecture Search | Out of scope — use ArqonNAS |
| AutoML end-to-end | Arqon AutoGrokML |
| GUI configuration | CLI/API-first design |
| Cloud-hosted service | Self-hosted only (for now) |
Feature Requests & Voting¶
We prioritize based on community feedback:
- GitHub Discussions — Propose new features
- Vote on existing — 👍 reaction on issues
- Contribute — PRs welcome for roadmap items
Version Comparison¶
| Feature | v0.3 | v0.4 | v1.0 |
|---|---|---|---|
| PCR Algorithm | ✅ | ✅ | ✅ |
| Safety Executor | ✅ | ✅ | ✅ |
| Python Bindings | ✅ | ✅ | ✅ |
| Docker Images | ❌ | ✅ | ✅ |
| Helm Chart | ❌ | ✅ | ✅ |
| OpenTelemetry | ❌ | ✅ | ✅ |
| Multi-Objective | ❌ | ✅ | ✅ |
| GPU Acceleration | ❌ | ❌ | ✅ |
| Distributed | ❌ | ❌ | ✅ |
| Bayesian TPE | ❌ | ❌ | ✅ |
Release Timeline¶
gantt
title ArqonHPO Release Timeline
dateFormat YYYY-MM
section Releases
v0.3 (Current) :done, v03, 2026-01, 2026-01
v0.4 Development :active, v04, 2026-01, 2026-03
v0.4 Release :milestone, 2026-03, 0d
v1.0 Development : v10, 2026-03, 2026-06
v1.0 Release :milestone, 2026-06, 0d Next Steps¶
- Changelog — Detailed version history
- Contributing — How to help
- About — Our philosophy