Optimization Strategies¶

Different landscapes require different navigators.

A smooth, convex valley is best descended by a geometric solver. A noisy, chaotic mountain range requires a probabilistic explorer. ArqonHPO does not force you to choose one or the other—it chooses for you, dynamically.

The PCR (Probe-Classify-Refine) algorithm continuously analyzes the topology of your objective function. It calculates the fractal dimension of the response surface and automatically hot-swaps the underlying strategy to match the terrain. This allows ArqonHPO to transition seamlessly from global exploration to local polishing without human intervention.

Strategy Overview¶

Strategy	Best For	Characteristics
Nelder-Mead	Smooth, unimodal functions	Fast convergence, low overhead, requires structure
TPE	Noisy, expensive evaluations	Robust to noise, handles multimodal landscapes

Nelder-Mead Simplex¶

The Nelder-Mead algorithm is a derivative-free optimizer that maintains a simplex of N+1 points in N dimensions.

Algorithm Overview¶

Nelder-Mead iteratively transforms a simplex (a geometric shape with N+1 vertices in N dimensions) to find the minimum. The algorithm uses four operations:

Reflection: Move the worst point through the centroid of the remaining points
Expansion: If reflection finds a good point, extend further in that direction
Contraction: If reflection fails, pull the worst point toward the centroid
Shrink: If contraction fails, shrink the entire simplex toward the best point

Each iteration replaces the worst point with a better one, gradually moving the simplex toward the optimum.

Default Coefficients¶

ArqonHPO uses standard Nelder-Mead coefficients (verified from nelder_mead.rs):

Coefficient	Value	Purpose
`alpha` (α)	1.0	Reflection coefficient
`gamma` (γ)	2.0	Expansion coefficient
`rho` (ρ)	0.5	Contraction coefficient
`sigma` (σ)	0.5	Shrink coefficient

Convergence Criteria¶

The algorithm considers itself converged when the simplex diameter falls below a tolerance threshold—indicating that all points have collapsed to essentially the same location.

When Nelder-Mead Works Well¶

✅ Smooth, continuous objective functions
✅ Unimodal landscapes (single minimum)
✅ Continuous parameters
✅ Low to moderate dimensionality
✅ Deterministic evaluations (no noise)

When Nelder-Mead Struggles¶

❌ Noisy evaluations (can get confused by measurement noise)
❌ Multimodal landscapes (may converge to local minima)
❌ High-dimensional spaces (simplex becomes inefficient)
❌ Discontinuous or non-smooth functions

Warm-Starting from Probe¶

When PCR classifies a problem as "Structured," Nelder-Mead is initialized using the best points from the Probe phase. This creates an initial simplex centered on promising regions rather than starting blindly.

Tree-structured Parzen Estimator (TPE)¶

TPE is a Bayesian optimization algorithm that models the relationship between parameters and objective values probabilistically.

Algorithm Overview¶

Unlike traditional Bayesian optimization that models P(y|x), TPE models P(x|y)—the probability of parameters given "good" or "bad" outcomes.

Split observations: Divide all evaluated points into "good" (below a threshold) and "bad" (above threshold) groups based on objective values
Fit density estimators: Build kernel density estimates (KDEs) for both groups
Sample candidates: Draw samples that maximize the ratio l(x)/g(x), where l(x) is the "good" density and g(x) is the "bad" density
Evaluate and update: Evaluate the best candidates and add results to history

Bandwidth Selection¶

TPE uses kernel density estimation, which requires choosing an appropriate bandwidth (smoothing parameter). ArqonHPO supports multiple bandwidth rules (verified from tpe.rs):

Rule	Formula	Use Case
Scott's Rule (default)	σ = 1.06 × stddev × n^(-1/5)	Standard choice, adapts to distribution
Silverman's Rule	σ = 0.9 × min(stddev, IQR/1.34) × n^(-1/5)	More robust to outliers
Fixed	Percentage of range	Legacy behavior

When TPE Works Well¶

✅ Noisy objective functions (measurement noise, stochastic training)
✅ Expensive evaluations (makes efficient use of samples)
✅ Multimodal landscapes (explores multiple regions)
✅ Mixed or categorical parameters (handles discrete choices)

When TPE Is Less Efficient¶

⚠️ Higher per-candidate computational overhead
⚠️ Slower convergence on smooth, well-behaved functions
⚠️ Small sample sizes (needs sufficient data to build good density models)

Online Optimization¶

TPE is the strategy used by ask_one() for online/real-time optimization. Its incremental nature makes it well-suited for streaming scenarios where evaluations arrive one at a time.

Automatic Strategy Selection¶

ArqonHPO's PCR algorithm automatically selects the appropriate strategy based on landscape analysis:

graph TD
    A[Probe Phase] --> B[Classify Phase]
    B --> C{Residual Decay α}
    C -->|α > 0.5| D[Nelder-Mead]
    C -->|α ≤ 0.5| E[TPE]

    D --> F[Refine Phase]
    E --> F

Classification Logic¶

The ResidualDecayClassifier examines how objective values improve across the probe samples. Functions with exploitable structure show geometric decay in residuals, while noisy or multimodal functions show irregular patterns.

Signal	Interpretation	Strategy
High α (> 0.5)	Geometric residual decay	Nelder-Mead
Low α (≤ 0.5)	Flat or irregular residuals	TPE

For details on the classification algorithm, see PCR Algorithm.

Multi-Start Nelder-Mead¶

Not Currently Used by PCR

Multi-Start NM is available in the codebase but is not currently selected by the PCR classifier. It remains available for potential future use or manual configuration.

For multimodal functions with multiple local minima, Multi-Start NM launches several simplex searches from different starting points.

How It Works¶

Probe phase samples the landscape
Identify multiple promising regions
Launch independent NM searches per region
Return the best result across all runs

This approach is more expensive than single NM but can escape local minima.

Strategy Comparison¶

Aspect	Nelder-Mead	TPE
Mechanism	Geometric simplex operations	Probabilistic density modeling
Per-step overhead	Minimal	Higher (density estimation)
Sample efficiency on smooth functions	Excellent	Good
Noise tolerance	Poor	Excellent
Incremental/online use	Difficult (needs full simplex)	Natural fit
Warm-start support	Yes (from probe points)	Yes (from all history)

Next Steps¶

PCR Algorithm

How Probe-Classify-Refine works

Batch vs. Online

When to use ask() vs. ask_one()

Probe Deep Dive

Mathematics of the sampling phase