Cookbook: ML Model Tuning¶
Tune sklearn or PyTorch hyperparameters with TPE.
The PCR Algorithm for ML¶
While ArqonHPO is known for fast simulation tuning, it excels at ML tuning via the PCR (Probe-Classify-Refine) algorithm:
- Probe: Scans the hyperparameter space.
- Classify: ResidualDecayClassifier detects that ML loss surfaces are chaotic/noisy (slow residual decay, $\alpha \le 0.5$).
- Refine: Automatically selects TPE (Tree-structured Parzen Estimator) instead of Nelder-Mead.
When probe samples show flat or irregular residual patterns (no geometric decay), ArqonHPO classifies the landscape as Chaotic and selects TPE:
- α ≤ 0.5 → Many local optima, noisy evaluations → TPE
- TPE models "good" (l(x)) and "bad" (g(x)) distributions using kernel density estimation
- Samples are drawn to maximize Expected Improvement (EI)
Example: Sklearn RandomForest¶
import json
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from arqonhpo import ArqonSolver
# Data
X, y = load_iris(return_X_y=True)
def objective(params):
clf = RandomForestClassifier(
n_estimators=int(params["n_estimators"]),
max_depth=int(params["max_depth"]),
random_state=42
)
# Cross-validation score (higher is better, so negate for minimization)
score = cross_val_score(clf, X, y, cv=3).mean()
return -score # Minimize negative accuracy
# Config
config = {
"seed": 42,
"budget": 100,
"probe_ratio": 0.3, # More probing to detect noise
"bounds": {
"n_estimators": {"min": 10, "max": 200},
"max_depth": {"min": 2, "max": 20}
}
}
solver = ArqonSolver(json.dumps(config))
best = {"value": float('inf')}
while True:
batch = solver.ask()
if batch is None:
break
results = []
for params in batch:
loss = objective(params)
if loss < best["value"]:
best = {"params": params, "value": loss}
results.append({
"eval_id": 0,
"params": params,
"value": loss,
"cost": 0.5
})
solver.tell(json.dumps(results))
print(f"Best: n_estimators={int(best['params']['n_estimators'])}, max_depth={int(best['params']['max_depth'])}")
print(f"Accuracy: {-best['value']:.4f}")
# Best: n_estimators=120, max_depth=8
# Accuracy: 0.9667
Why TPE?¶
TPE builds probabilistic models of "good" and "bad" regions of the hyperparameter space, making it robust to noise and efficient at exploration.
Scott's Rule Bandwidth¶
ArqonHPO uses Scott's Rule for adaptive kernel bandwidth in TPE:
This provides:
- Automatic adaptation: Bandwidth shrinks as more samples are collected
- Data-driven scaling: Uses sample standard deviation, not fixed percentages
- Asymptotic optimality: Minimizes mean integrated squared error for Gaussian kernels
Compared to fixed bandwidth (e.g., 10% of range):
| Method | Pros | Cons |
|---|---|---|
| Scott's Rule | Adapts to data distribution, optimal for smooth densities | May under-smooth in tails |
| Fixed 10% | Simple, predictable | Ignores data structure, often suboptimal |
ArqonHPO defaults to Scott's Rule but supports alternatives via BandwidthRule: