benchmark_comparison_pipeline#

Multi-run benchmark comparison and analysis pipeline.

Provides tools for comparing results across multiple benchmark runs, enabling systematic evaluation of model improvements, parameter tuning effects, and cross-validation analysis. Supports aggregated analysis at global, group, and individual target levels.

The comparison pipeline operates on existing benchmark results, allowing retrospective analysis without re-running expensive computations.