Benchmark Methodology¶
This page documents how our benchmarks were conducted for reproducibility.
Hardware¶
| Component | Specification |
|---|---|
| CPU | AMD Ryzen 9 5950X (16 cores, 32 threads) |
| RAM | 64 GB DDR4-3600 |
| Storage | NVMe SSD (Samsung 980 Pro) |
| OS | Ubuntu 22.04 LTS |
Software Versions¶
| Software | Version |
|---|---|
| Rust | 1.82.0 |
| Python | 3.12.0 |
| ArqonHPO | 0.3.0 |
| Optuna | 3.5.0 |
| NumPy | 1.26.0 |
Benchmark Configuration¶
Time-Bounded Benchmarks¶
- Budget: 5 seconds wall-clock time
- Warmup: 1 second (excluded from measurement)
- Iterations: 5 runs, median reported
- RNG Seed: 42 (deterministic)
Test Functions¶
| Function | Dimensions | Characteristics |
|---|---|---|
| Sphere | 10 | Smooth, unimodal |
| Rosenbrock | 10 | Smooth, narrow valley |
| Rastrigin | 10 | Noisy, multimodal |
| Ackley | 10 | Noisy, multimodal |
Measurement Protocol¶
- Isolation: Benchmarks run on dedicated hardware, no other processes
- CPU Governor: Set to
performancemode - Hyperthreading: Enabled
- Metrics:
- Trials completed per second
- Time to best value
- Final objective value
Reproduction Steps¶
# Clone and build
git clone https://github.com/novelbytelabs/ArqonHPO.git
cd ArqonHPO
cargo build --release
# Run benchmarks
cargo bench --bench optimizer_comparison
# Compare with Optuna
python benchmarks/optuna_comparison.py
Data Files¶
Raw benchmark data available in:
docs/docs/benchmarks/benchmark_data.csv
Limitations¶
- Single machine: Results may vary on different hardware
- Python GIL: Optuna measurements include Python overhead
- Function cost: Benchmarks use instant ($\approx 0$ms) functions
For expensive functions (>1s per evaluation), Optuna's smarter sampling may outperform ArqonHPO's volume advantage.