Kaira Benchmarking System
The Kaira benchmarking system provides standardized benchmarks for evaluating communication system components and deep learning models. This system enables fair comparison of different approaches and reproducible performance evaluation.
Overview
The benchmarking system consists of:
Base classes for creating custom benchmarks
Standard benchmarks for common communication tasks
Metrics for evaluating performance
Runners for executing benchmarks in different modes
Configuration management for reproducible experiments
CLI tool for command-line usage
Quick Start
Basic usage with the new organized results system:
from kaira.benchmarks import get_benchmark, StandardRunner, BenchmarkConfig
# Create a benchmark
ber_benchmark = get_benchmark("ber_simulation")(modulation="bpsk")
# Configure the benchmark
config = BenchmarkConfig(
snr_range=list(range(-5, 11)),
num_bits=100000
)
# Run the benchmark with automatic result organization
runner = StandardRunner()
result = runner.run_benchmark(ber_benchmark, **config.to_dict())
# Results are automatically saved to organized directory structure
print(f"BER results: {result.metrics['ber_simulated']}")
# Access saved results using the results manager
saved_files = runner.save_all_results(experiment_name="ber_evaluation")
print(f"Results saved to: {saved_files}")
Traditional usage (still supported):
# Manual result saving
result.save("benchmark_result.json")
Available Benchmarks
Standard Communication Benchmarks
ber_simulation: Bit Error Rate simulation for various modulation schemes
channel_capacity: Shannon channel capacity calculations
throughput_test: System throughput evaluation
latency_test: System latency measurement
model_complexity: Model computational complexity analysis
Custom Benchmarks
You can create custom benchmarks by inheriting from BaseBenchmark:
from kaira.benchmarks import BaseBenchmark, register_benchmark
@register_benchmark("my_benchmark")
class MyBenchmark(BaseBenchmark):
def setup(self, **kwargs):
super().setup(**kwargs)
# Initialize benchmark
def run(self, **kwargs):
# Run benchmark and return metrics
return {"success": True, "metric_value": 42}
Configuration
Predefined Configurations
fast: Quick testing configuration
accurate: High-accuracy configuration for publication results
comprehensive: Full evaluation with all metrics
gpu: GPU-optimized configuration
minimal: Minimal configuration for CI/CD
Custom Configuration:
config = BenchmarkConfig(
name="my_config",
num_trials=10,
snr_range=list(range(-10, 16)),
device="cuda",
verbose=True
)
Benchmark Execution
Sequential Execution:
runner = StandardRunner(verbose=True)
result = runner.run_benchmark(benchmark, **config.to_dict())
Parallel Execution:
runner = ParallelRunner(max_workers=4)
results = runner.run_benchmarks(benchmarks, **config.to_dict())
Benchmark Suites:
suite = BenchmarkSuite("My Suite")
suite.add_benchmark(benchmark1)
suite.add_benchmark(benchmark2)
results = runner.run_suite(suite, **config.to_dict())
Comparison and Analysis:
runner = ComparisonRunner()
results = runner.run_comparison(
[benchmark1, benchmark2],
"Algorithm Comparison",
**config.to_dict()
)
Metrics and Analysis
Standard Metrics
The StandardMetrics class provides common communication system metrics:
Bit Error Rate (BER)
Block Error Rate (BLER)
Signal-to-Noise Ratio (SNR)
Mutual Information
Throughput
Latency statistics
Channel capacity
Confidence intervals
Example:
from kaira.benchmarks import StandardMetrics
ber = StandardMetrics.bit_error_rate(transmitted, received)
snr = StandardMetrics.signal_to_noise_ratio(signal, noise)
capacity = StandardMetrics.channel_capacity(snr_db=10.0)
Results Management
Kaira provides an organized results management system that automatically structures benchmark results in a clean directory hierarchy.
Results Directory Structure
The benchmark system creates the following directory structure:
results/
├── benchmarks/ # Individual benchmark results
│ ├── experiment_name/
│ └── benchmark_files.json
├── suites/ # Benchmark suite results
│ ├── suite_name/
│ └── summary.json
├── experiments/ # Experimental runs
├── comparisons/ # Comparative studies
├── archives/ # Archived old results
├── configs/ # Configuration files
├── logs/ # Execution logs
└── summaries/ # Summary reports
Using the Results Manager
The new results management system provides automated organization:
from kaira.benchmarks import StandardRunner, BenchmarkResultsManager
# Create a results manager (uses 'results/' directory by default)
results_manager = BenchmarkResultsManager("my_results")
# Create a runner with the results manager
runner = StandardRunner(results_manager=results_manager)
# Run benchmarks - results are automatically saved and organized
result = runner.run_benchmark(benchmark, experiment_name="my_experiment")
# Results are automatically saved to:
# my_results/benchmarks/my_experiment/benchmark_name_timestamp_id.json
Manual Results Management
You can also manage results manually:
# Save individual result with automatic organization
results_manager = BenchmarkResultsManager()
saved_path = results_manager.save_benchmark_result(
result,
category="benchmarks",
experiment_name="my_experiment"
)
# Save suite results
saved_files = results_manager.save_suite_results(
results_list,
suite_name="performance_suite",
experiment_name="my_experiment"
)
# List available results
all_results = results_manager.list_results()
experiment_results = results_manager.list_results(
category="benchmarks",
experiment_name="my_experiment"
)
# Load results
result = results_manager.load_benchmark_result(result_path)
Loading and Analysis
# Load results using the results manager results_manager = BenchmarkResultsManager() result_paths = results_manager.list_results(category=”benchmarks”)
for path in result_paths: result = results_manager.load_benchmark_result(path) print(f”Result: {result.name}, Time: {result.execution_time:.2f}s”)
# Create comparison reports comparison_path = results_manager.create_comparison_report( result_paths[:3], “algorithm_comparison” )
Results Maintenance
The system includes maintenance features for long-term management:
# Archive old results (older than 30 days)
results_manager.archive_old_results(days_old=30)
# Clean up empty directories
results_manager.cleanup_empty_directories()
Command Line Interface
The kaira-benchmark CLI tool provides easy access to benchmarks:
# List available benchmarks
kaira-benchmark --list
# Run a single benchmark
kaira-benchmark --benchmark ber_simulation --config fast
# Run multiple benchmarks in parallel
kaira-benchmark --benchmark ber_simulation throughput_test --parallel
# Run benchmark suite
kaira-benchmark --suite --config comprehensive --output ./results
# Custom parameters
kaira-benchmark --benchmark ber_simulation --snr-range -5 10 --num-bits 50000
Best Practices
Use appropriate configurations for your use case (fast for development, accurate for publications)
Set random seeds for reproducible results:
config = BenchmarkConfig(seed=42)
Save raw data for important experiments:
config = BenchmarkConfig(save_raw_data=True)
Use confidence intervals for statistical analysis:
config = BenchmarkConfig( calculate_confidence_intervals=True, confidence_level=0.95 )
Monitor memory usage for large experiments:
config = BenchmarkConfig(memory_limit_mb=8192)
Examples
See the examples/benchmarks/ directory for comprehensive examples:
basic_usage.py: Basic benchmark usagecomparison_example.py: Comparing different approachescustom_benchmark.py: Creating custom benchmarksdemo_new_results_system.py: New results management system demonstration
Results Management Example
The demo_new_results_system.py example demonstrates the complete workflow:
# Create and configure results manager
results_manager = BenchmarkResultsManager("example_results")
# Run benchmarks with automatic result organization
runner = StandardRunner(results_manager=results_manager)
# Create and run benchmark suites
suite = BenchmarkSuite("Performance Suite")
# ... add benchmarks to suite
results = runner.run_suite(suite, experiment_name="demo_experiment")
# Results are automatically organized in structured directories
API Reference
Kaira Benchmarking System.
This module provides standardized benchmarks for evaluating communication system components and deep learning models in Kaira.
- class kaira.benchmarks.BaseBenchmark(name: str, description: str = '')[source]
Bases:
ABCBase class for all benchmarks.
- __init__(name: str, description: str = '')[source]
Initialize base benchmark.
- Parameters:
name – Name of the benchmark
description – Description of what the benchmark tests
- execute(**kwargs) BenchmarkResult[source]
Execute the full benchmark pipeline.
- class kaira.benchmarks.BenchmarkResult(benchmark_id: str, name: str, description: str, metrics: ~typing.Dict[str, ~typing.Any], execution_time: float, timestamp: str, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)[source]
Bases:
objectContainer for benchmark results.
- __init__(benchmark_id: str, name: str, description: str, metrics: ~typing.Dict[str, ~typing.Any], execution_time: float, timestamp: str, metadata: ~typing.Dict[str, ~typing.Any] = <factory>) None
- benchmark_id: str
- name: str
- description: str
- execution_time: float
- timestamp: str
- class kaira.benchmarks.BenchmarkSuite(name: str, description: str = '')[source]
Bases:
objectCollection of benchmarks that can be run together.
- __init__(name: str, description: str = '')[source]
Initialize benchmark suite.
- Parameters:
name – Name of the benchmark suite
description – Description of the suite
- add_benchmark(benchmark: BaseBenchmark) None[source]
Add a benchmark to the suite.
- run_all(**kwargs) List[BenchmarkResult][source]
Run all benchmarks in the suite.
- class kaira.benchmarks.BenchmarkRegistry[source]
Bases:
objectRegistry for managing benchmark classes and instances.
- __init__()
- classmethod create_benchmark(name: str, **kwargs) BaseBenchmark | None[source]
Create an instance of a registered benchmark.
- classmethod get(name: str) Type[BaseBenchmark] | None[source]
Get a registered benchmark class.
- classmethod register(name: str, benchmark_class: Type[BaseBenchmark]) None[source]
Register a benchmark class.
- kaira.benchmarks.get_benchmark(name: str) Type[BaseBenchmark] | None[source]
Get a registered benchmark class.
- kaira.benchmarks.create_benchmark(name: str, **kwargs) BaseBenchmark | None[source]
Create an instance of a registered benchmark.
- class kaira.benchmarks.StandardMetrics[source]
Bases:
objectCollection of standard metrics for communication system evaluation.
- __init__()
- static bit_error_rate(transmitted: Tensor, received: Tensor) float[source]
Calculate Bit Error Rate (BER).
- static block_error_rate(transmitted: Tensor, received: Tensor, block_size: int) float[source]
Calculate Block Error Rate (BLER).
- static channel_capacity(snr_db: float, bandwidth: float = 1.0) float[source]
Calculate Shannon channel capacity.
- static computational_complexity(model: Module, input_shape: tuple) Dict[str, Any][source]
Estimate computational complexity of a PyTorch model.
- static confidence_interval(data: Tensor, confidence: float = 0.95) tuple[source]
Calculate confidence interval for data.
- static mutual_information(x: Tensor, y: Tensor, bins: int = 50) float[source]
Estimate mutual information between two variables.
- class kaira.benchmarks.StandardRunner(verbose: bool = True, save_results: bool = True, results_manager: BenchmarkResultsManager | None = None)[source]
Bases:
objectStandard sequential benchmark runner.
- __init__(verbose: bool = True, save_results: bool = True, results_manager: BenchmarkResultsManager | None = None)[source]
Initialize standard benchmark runner.
- Parameters:
verbose – Whether to print verbose output
save_results – Whether to save results automatically
results_manager – Custom results manager (creates default if None)
- run_benchmark(benchmark: BaseBenchmark, **kwargs) BenchmarkResult[source]
Run a single benchmark.
- run_suite(suite: BenchmarkSuite, **kwargs) List[BenchmarkResult][source]
Run a benchmark suite.
- class kaira.benchmarks.ParallelRunner(max_workers: int | None = None, verbose: bool = True)[source]
Bases:
objectParallel benchmark runner using thread pool.
- __init__(max_workers: int | None = None, verbose: bool = True)[source]
Initialize parallel benchmark runner.
- Parameters:
max_workers – Maximum number of worker threads (None for default)
verbose – Whether to print verbose output
- run_benchmarks(benchmarks: List[BaseBenchmark], **kwargs) List[BenchmarkResult][source]
Run multiple benchmarks in parallel.
- class kaira.benchmarks.ComparisonRunner(verbose: bool = True)[source]
Bases:
objectRunner for comparing multiple benchmarks on the same task.
- __init__(verbose: bool = True)[source]
Initialize comparison runner.
- Parameters:
verbose – Whether to print verbose output
- get_comparison_summary(comparison_name: str) Dict[str, Any][source]
Get summary of comparison results.
- run_comparison(benchmarks: List[BaseBenchmark], comparison_name: str, **kwargs) Dict[str, BenchmarkResult][source]
Run comparison between multiple benchmarks.
- class kaira.benchmarks.ParametricRunner(verbose: bool = True)[source]
Bases:
objectRunner for sweeping parameters across benchmarks.
- __init__(verbose: bool = True)[source]
Initialize parametric runner.
- Parameters:
verbose – Whether to print verbose output
- run_parameter_sweep(benchmark: BaseBenchmark, parameter_grid: Dict[str, List[Any]]) Dict[str, List[BenchmarkResult]][source]
Run benchmark with parameter sweep.
- class kaira.benchmarks.BenchmarkConfig(name: str = 'default', description: str = '', seed: int = 42, device: str = 'auto', num_trials: int = 1, timeout_seconds: float | None = None, verbose: bool = True, save_results: bool = True, output_directory: str = './benchmark_results', save_plots: bool = True, save_raw_data: bool = False, batch_size: int = 1000, num_workers: int = 1, memory_limit_mb: float | None = None, snr_range: ~typing.List[float] = <factory>, block_length: int = 1000, code_rate: float = 0.5, model_precision: str = 'float32', compile_model: bool = False, calculate_confidence_intervals: bool = True, confidence_level: float = 0.95, custom_params: ~typing.Dict[str, ~typing.Any] = <factory>)[source]
Bases:
objectConfiguration for benchmark execution.
- __init__(name: str = 'default', description: str = '', seed: int = 42, device: str = 'auto', num_trials: int = 1, timeout_seconds: float | None = None, verbose: bool = True, save_results: bool = True, output_directory: str = './benchmark_results', save_plots: bool = True, save_raw_data: bool = False, batch_size: int = 1000, num_workers: int = 1, memory_limit_mb: float | None = None, snr_range: ~typing.List[float] = <factory>, block_length: int = 1000, code_rate: float = 0.5, model_precision: str = 'float32', compile_model: bool = False, calculate_confidence_intervals: bool = True, confidence_level: float = 0.95, custom_params: ~typing.Dict[str, ~typing.Any] = <factory>) None
- batch_size: int = 1000
- block_length: int = 1000
- calculate_confidence_intervals: bool = True
- code_rate: float = 0.5
- compile_model: bool = False
- confidence_level: float = 0.95
- description: str = ''
- device: str = 'auto'
- classmethod from_dict(config_dict: Dict[str, Any]) BenchmarkConfig[source]
Create config from dictionary.
- classmethod from_json(json_str: str) BenchmarkConfig[source]
Create config from JSON string.
- classmethod load(filepath: str | Path) BenchmarkConfig[source]
Load configuration from file.
- model_precision: str = 'float32'
- name: str = 'default'
- num_trials: int = 1
- num_workers: int = 1
- output_directory: str = './benchmark_results'
- save_plots: bool = True
- save_raw_data: bool = False
- save_results: bool = True
- seed: int = 42
- verbose: bool = True
- kaira.benchmarks.get_config(name: str) BenchmarkConfig[source]
Get a predefined configuration.
- class kaira.benchmarks.BenchmarkResultsManager(base_dir: str | Path = 'results')[source]
Bases:
objectManages benchmark results with improved directory structure and organization.
- __init__(base_dir: str | Path = 'results')[source]
Initialize the results manager.
- Parameters:
base_dir – Base directory for storing all benchmark results
- archive_old_results(days_old: int = 30) None[source]
Archive benchmark results older than specified days.
- Parameters:
days_old – Number of days after which to archive results
- create_comparison_report(result_paths: List[Path], report_name: str) Path[source]
Create a comparison report from multiple benchmark results.
- Parameters:
result_paths – List of paths to benchmark result files
report_name – Name for the comparison report
- Returns:
Path to the generated report
- list_results(category: str | None = None, experiment_name: str | None = None) List[Path][source]
List available benchmark result files.
- Parameters:
category – Specific category to list (benchmarks, suites, etc.)
experiment_name – Specific experiment to list
- Returns:
List of result file paths (excludes summary files and comparison reports)
- load_benchmark_result(filepath: str | Path) BenchmarkResult[source]
Load a benchmark result from file.
- save_benchmark_result(result: BenchmarkResult, category: str = 'benchmarks', experiment_name: str | None = None, add_timestamp: bool = True) Path[source]
Save a single benchmark result with improved organization.
- Parameters:
result – The benchmark result to save
category – Category (benchmarks, suites, experiments, etc.)
experiment_name – Optional experiment name for grouping
add_timestamp – Whether to add timestamp to filename
- Returns:
Path to the saved file
- save_suite_results(results: List[BenchmarkResult], suite_name: str, experiment_name: str | None = None) Dict[str, Path][source]
Save multiple benchmark results from a suite.
- Parameters:
results – List of benchmark results
suite_name – Name of the benchmark suite
experiment_name – Optional experiment name
- Returns:
Dictionary mapping result names to file paths
- class kaira.benchmarks.BenchmarkVisualizer(figsize: tuple = (10, 6), dpi: int = 100)[source]
Bases:
objectVisualizer for benchmark results.
- __init__(figsize: tuple = (10, 6), dpi: int = 100)[source]
Initialize visualizer.
- Parameters:
figsize – Figure size in inches (width, height)
dpi – Figure resolution
- create_benchmark_report(results_file: str, output_dir: str = 'benchmark_plots')[source]
Create a comprehensive visual report from benchmark results.
- Parameters:
results_file – Path to JSON file containing benchmark results
output_dir – Directory to save plots
- plot_benchmark_summary(results_file: str, save_path: str | None = None) Figure[source]
Plot summary of multiple benchmark results.
- Parameters:
results_file – Path to JSON file containing benchmark results
save_path – Optional path to save the figure
- Returns:
Matplotlib figure object
- plot_ber_curve(results: Dict[str, Any], save_path: str | None = None) Figure[source]
Plot BER vs SNR curve.
- Parameters:
results – Benchmark results containing SNR and BER data
save_path – Optional path to save the figure
- Returns:
Matplotlib figure object
- plot_coding_gain(results: Dict[str, Any], save_path: str | None = None) Figure[source]
Plot coding gain vs SNR.
- Parameters:
results – Benchmark results containing coding gain data
save_path – Optional path to save the figure
- Returns:
Matplotlib figure object
- plot_constellation(constellation: Tensor, received_symbols: Tensor | None = None, save_path: str | None = None) Figure[source]
Plot constellation diagram.
- Parameters:
constellation – Ideal constellation points
received_symbols – Optional received symbols to overlay
save_path – Optional path to save the figure
- Returns:
Matplotlib figure object