Optimization Strategies¶
Different landscapes require different navigators.
A smooth, convex valley is best descended by a geometric solver. A noisy, chaotic mountain range requires a probabilistic explorer. ArqonHPO does not force you to choose one or the other—it chooses for you, dynamically.
The PCR (Probe-Classify-Refine) algorithm continuously analyzes the topology of your objective function. It calculates the fractal dimension of the response surface and automatically hot-swaps the underlying strategy to match the terrain. This allows ArqonHPO to transition seamlessly from global exploration to local polishing without human intervention.
Strategy Overview¶
| Strategy | Best For | Characteristics |
|---|---|---|
| Nelder-Mead | Smooth, unimodal functions | Fast convergence, low overhead, requires structure |
| TPE | Noisy, expensive evaluations | Robust to noise, handles multimodal landscapes |
Nelder-Mead Simplex¶
The Nelder-Mead algorithm is a derivative-free optimizer that maintains a simplex of N+1 points in N dimensions.
Algorithm Overview¶
Nelder-Mead iteratively transforms a simplex (a geometric shape with N+1 vertices in N dimensions) to find the minimum. The algorithm uses four operations:
- Reflection: Move the worst point through the centroid of the remaining points
- Expansion: If reflection finds a good point, extend further in that direction
- Contraction: If reflection fails, pull the worst point toward the centroid
- Shrink: If contraction fails, shrink the entire simplex toward the best point
Each iteration replaces the worst point with a better one, gradually moving the simplex toward the optimum.
Default Coefficients¶
ArqonHPO uses standard Nelder-Mead coefficients (verified from nelder_mead.rs):
| Coefficient | Value | Purpose |
|---|---|---|
alpha (α) | 1.0 | Reflection coefficient |
gamma (γ) | 2.0 | Expansion coefficient |
rho (ρ) | 0.5 | Contraction coefficient |
sigma (σ) | 0.5 | Shrink coefficient |
Convergence Criteria¶
The algorithm considers itself converged when the simplex diameter falls below a tolerance threshold—indicating that all points have collapsed to essentially the same location.
When Nelder-Mead Works Well¶
- ✅ Smooth, continuous objective functions
- ✅ Unimodal landscapes (single minimum)
- ✅ Continuous parameters
- ✅ Low to moderate dimensionality
- ✅ Deterministic evaluations (no noise)
When Nelder-Mead Struggles¶
- ❌ Noisy evaluations (can get confused by measurement noise)
- ❌ Multimodal landscapes (may converge to local minima)
- ❌ High-dimensional spaces (simplex becomes inefficient)
- ❌ Discontinuous or non-smooth functions
Warm-Starting from Probe¶
When PCR classifies a problem as "Structured," Nelder-Mead is initialized using the best points from the Probe phase. This creates an initial simplex centered on promising regions rather than starting blindly.
Tree-structured Parzen Estimator (TPE)¶
TPE is a Bayesian optimization algorithm that models the relationship between parameters and objective values probabilistically.
Algorithm Overview¶
Unlike traditional Bayesian optimization that models P(y|x), TPE models P(x|y)—the probability of parameters given "good" or "bad" outcomes.
- Split observations: Divide all evaluated points into "good" (below a threshold) and "bad" (above threshold) groups based on objective values
- Fit density estimators: Build kernel density estimates (KDEs) for both groups
- Sample candidates: Draw samples that maximize the ratio l(x)/g(x), where l(x) is the "good" density and g(x) is the "bad" density
- Evaluate and update: Evaluate the best candidates and add results to history
Bandwidth Selection¶
TPE uses kernel density estimation, which requires choosing an appropriate bandwidth (smoothing parameter). ArqonHPO supports multiple bandwidth rules (verified from tpe.rs):
| Rule | Formula | Use Case |
|---|---|---|
| Scott's Rule (default) | σ = 1.06 × stddev × n^(-1/5) | Standard choice, adapts to distribution |
| Silverman's Rule | σ = 0.9 × min(stddev, IQR/1.34) × n^(-1/5) | More robust to outliers |
| Fixed | Percentage of range | Legacy behavior |
When TPE Works Well¶
- ✅ Noisy objective functions (measurement noise, stochastic training)
- ✅ Expensive evaluations (makes efficient use of samples)
- ✅ Multimodal landscapes (explores multiple regions)
- ✅ Mixed or categorical parameters (handles discrete choices)
When TPE Is Less Efficient¶
- ⚠️ Higher per-candidate computational overhead
- ⚠️ Slower convergence on smooth, well-behaved functions
- ⚠️ Small sample sizes (needs sufficient data to build good density models)
Online Optimization¶
TPE is the strategy used by ask_one() for online/real-time optimization. Its incremental nature makes it well-suited for streaming scenarios where evaluations arrive one at a time.
Automatic Strategy Selection¶
ArqonHPO's PCR algorithm automatically selects the appropriate strategy based on landscape analysis:
graph TD
A[Probe Phase] --> B[Classify Phase]
B --> C{Residual Decay α}
C -->|α > 0.5| D[Nelder-Mead]
C -->|α ≤ 0.5| E[TPE]
D --> F[Refine Phase]
E --> F Classification Logic¶
The ResidualDecayClassifier examines how objective values improve across the probe samples. Functions with exploitable structure show geometric decay in residuals, while noisy or multimodal functions show irregular patterns.
| Signal | Interpretation | Strategy |
|---|---|---|
| High α (> 0.5) | Geometric residual decay | Nelder-Mead |
| Low α (≤ 0.5) | Flat or irregular residuals | TPE |
For details on the classification algorithm, see PCR Algorithm.
Multi-Start Nelder-Mead¶
Not Currently Used by PCR
Multi-Start NM is available in the codebase but is not currently selected by the PCR classifier. It remains available for potential future use or manual configuration.
For multimodal functions with multiple local minima, Multi-Start NM launches several simplex searches from different starting points.
How It Works¶
- Probe phase samples the landscape
- Identify multiple promising regions
- Launch independent NM searches per region
- Return the best result across all runs
This approach is more expensive than single NM but can escape local minima.
Strategy Comparison¶
| Aspect | Nelder-Mead | TPE |
|---|---|---|
| Mechanism | Geometric simplex operations | Probabilistic density modeling |
| Per-step overhead | Minimal | Higher (density estimation) |
| Sample efficiency on smooth functions | Excellent | Good |
| Noise tolerance | Poor | Excellent |
| Incremental/online use | Difficult (needs full simplex) | Natural fit |
| Warm-start support | Yes (from probe points) | Yes (from all history) |
Next Steps¶
How Probe-Classify-Refine works
When to use ask() vs. ask_one()
Mathematics of the sampling phase