Kaira Benchmarking System

The Kaira benchmarking system provides standardized benchmarks for evaluating communication system components and deep learning models. This system enables fair comparison of different approaches and reproducible performance evaluation.

Overview

The benchmarking system consists of:

Base classes for creating custom benchmarks
Standard benchmarks for common communication tasks
Metrics for evaluating performance
Runners for executing benchmarks in different modes
Configuration management for reproducible experiments
CLI tool for command-line usage

Quick Start

Basic usage with the new organized results system:

from kaira.benchmarks import get_benchmark, StandardRunner, BenchmarkConfig

# Create a benchmark
ber_benchmark = get_benchmark("ber_simulation")(modulation="bpsk")

# Configure the benchmark
config = BenchmarkConfig(
    snr_range=list(range(-5, 11)),
    num_bits=100000
)

# Run the benchmark with automatic result organization
runner = StandardRunner()
result = runner.run_benchmark(ber_benchmark, **config.to_dict())

# Results are automatically saved to organized directory structure
print(f"BER results: {result.metrics['ber_simulated']}")

# Access saved results using the results manager
saved_files = runner.save_all_results(experiment_name="ber_evaluation")
print(f"Results saved to: {saved_files}")

Traditional usage (still supported):

# Manual result saving
result.save("benchmark_result.json")

Available Benchmarks

Standard Communication Benchmarks

ber_simulation: Bit Error Rate simulation for various modulation schemes
channel_capacity: Shannon channel capacity calculations
throughput_test: System throughput evaluation
latency_test: System latency measurement
model_complexity: Model computational complexity analysis

Custom Benchmarks

You can create custom benchmarks by inheriting from BaseBenchmark:

from kaira.benchmarks import BaseBenchmark, register_benchmark

@register_benchmark("my_benchmark")
class MyBenchmark(BaseBenchmark):
    def setup(self, **kwargs):
        super().setup(**kwargs)
        # Initialize benchmark

    def run(self, **kwargs):
        # Run benchmark and return metrics
        return {"success": True, "metric_value": 42}

Configuration

Predefined Configurations

fast: Quick testing configuration
accurate: High-accuracy configuration for publication results
comprehensive: Full evaluation with all metrics
gpu: GPU-optimized configuration
minimal: Minimal configuration for CI/CD

Custom Configuration:

config = BenchmarkConfig(
    name="my_config",
    num_trials=10,
    snr_range=list(range(-10, 16)),
    device="cuda",
    verbose=True
)

Benchmark Execution

Sequential Execution:

runner = StandardRunner(verbose=True)
result = runner.run_benchmark(benchmark, **config.to_dict())

Parallel Execution:

runner = ParallelRunner(max_workers=4)
results = runner.run_benchmarks(benchmarks, **config.to_dict())

Benchmark Suites:

suite = BenchmarkSuite("My Suite")
suite.add_benchmark(benchmark1)
suite.add_benchmark(benchmark2)

results = runner.run_suite(suite, **config.to_dict())

Comparison and Analysis:

runner = ComparisonRunner()
results = runner.run_comparison(
    [benchmark1, benchmark2],
    "Algorithm Comparison",
    **config.to_dict()
)

Metrics and Analysis

Standard Metrics

The StandardMetrics class provides common communication system metrics:

Bit Error Rate (BER)
Block Error Rate (BLER)
Signal-to-Noise Ratio (SNR)
Mutual Information
Throughput
Latency statistics
Channel capacity
Confidence intervals

Example:

from kaira.benchmarks import StandardMetrics

ber = StandardMetrics.bit_error_rate(transmitted, received)
snr = StandardMetrics.signal_to_noise_ratio(signal, noise)
capacity = StandardMetrics.channel_capacity(snr_db=10.0)

Results Management

Kaira provides an organized results management system that automatically structures benchmark results in a clean directory hierarchy.

Results Directory Structure

The benchmark system creates the following directory structure:

results/
├── benchmarks/          # Individual benchmark results
│   ├── experiment_name/
│   └── benchmark_files.json
├── suites/             # Benchmark suite results
│   ├── suite_name/
│   └── summary.json
├── experiments/        # Experimental runs
├── comparisons/        # Comparative studies
├── archives/          # Archived old results
├── configs/           # Configuration files
├── logs/              # Execution logs
└── summaries/         # Summary reports

Using the Results Manager

The new results management system provides automated organization:

from kaira.benchmarks import StandardRunner, BenchmarkResultsManager

# Create a results manager (uses 'results/' directory by default)
results_manager = BenchmarkResultsManager("my_results")

# Create a runner with the results manager
runner = StandardRunner(results_manager=results_manager)

# Run benchmarks - results are automatically saved and organized
result = runner.run_benchmark(benchmark, experiment_name="my_experiment")

# Results are automatically saved to:
# my_results/benchmarks/my_experiment/benchmark_name_timestamp_id.json

Manual Results Management

You can also manage results manually:

# Save individual result with automatic organization
results_manager = BenchmarkResultsManager()
saved_path = results_manager.save_benchmark_result(
    result,
    category="benchmarks",
    experiment_name="my_experiment"
)

# Save suite results
saved_files = results_manager.save_suite_results(
    results_list,
    suite_name="performance_suite",
    experiment_name="my_experiment"
)

# List available results
all_results = results_manager.list_results()
experiment_results = results_manager.list_results(
    category="benchmarks",
    experiment_name="my_experiment"
)

# Load results
result = results_manager.load_benchmark_result(result_path)

Loading and Analysis

# Load results using the results manager results_manager = BenchmarkResultsManager() result_paths = results_manager.list_results(category=”benchmarks”)

for path in result_paths: result = results_manager.load_benchmark_result(path) print(f”Result: {result.name}, Time: {result.execution_time:.2f}s”)

# Create comparison reports comparison_path = results_manager.create_comparison_report( result_paths[:3], “algorithm_comparison” )

Results Maintenance

The system includes maintenance features for long-term management:

# Archive old results (older than 30 days)
results_manager.archive_old_results(days_old=30)

# Clean up empty directories
results_manager.cleanup_empty_directories()

Command Line Interface

The kaira-benchmark CLI tool provides easy access to benchmarks:

# List available benchmarks
kaira-benchmark --list

# Run a single benchmark
kaira-benchmark --benchmark ber_simulation --config fast

# Run multiple benchmarks in parallel
kaira-benchmark --benchmark ber_simulation throughput_test --parallel

# Run benchmark suite
kaira-benchmark --suite --config comprehensive --output ./results

# Custom parameters
kaira-benchmark --benchmark ber_simulation --snr-range -5 10 --num-bits 50000

Best Practices

Use appropriate configurations for your use case (fast for development, accurate for publications)
Set random seeds for reproducible results:
```
config = BenchmarkConfig(seed=42)
```

Save raw data for important experiments:

config = BenchmarkConfig(save_raw_data=True)

Use confidence intervals for statistical analysis:

config = BenchmarkConfig(
    calculate_confidence_intervals=True,
    confidence_level=0.95
)

Monitor memory usage for large experiments:

config = BenchmarkConfig(memory_limit_mb=8192)

Examples

See the examples/benchmarks/ directory for comprehensive examples:

basic_usage.py: Basic benchmark usage
comparison_example.py: Comparing different approaches
custom_benchmark.py: Creating custom benchmarks
demo_new_results_system.py: New results management system demonstration

Results Management Example

The demo_new_results_system.py example demonstrates the complete workflow:

# Create and configure results manager
results_manager = BenchmarkResultsManager("example_results")

# Run benchmarks with automatic result organization
runner = StandardRunner(results_manager=results_manager)

# Create and run benchmark suites
suite = BenchmarkSuite("Performance Suite")
# ... add benchmarks to suite
results = runner.run_suite(suite, experiment_name="demo_experiment")

# Results are automatically organized in structured directories

API Reference

Kaira Benchmarking System.

This module provides standardized benchmarks for evaluating communication system components and deep learning models in Kaira.

class kaira.benchmarks.BaseBenchmark(name: str, description: str = '')[source]

Bases: ABC

Base class for all benchmarks.

__init__(name: str, description: str = '')[source]

Initialize base benchmark.

Parameters:

name – Name of the benchmark
description – Description of what the benchmark tests

execute(**kwargs) → BenchmarkResult[source]: Execute the full benchmark pipeline.

abstractmethod run(**kwargs) → Dict[str, Any][source]: Run the benchmark and return metrics.

abstractmethod setup(**kwargs) → None[source]: Setup benchmark environment.

teardown() → None[source]: Clean up after benchmark.

class kaira.benchmarks.BenchmarkResult(benchmark_id: str, name: str, description: str, metrics: ~typing.Dict[str, ~typing.Any], execution_time: float, timestamp: str, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: object

Container for benchmark results.

__init__(benchmark_id: str, name: str, description: str, metrics: ~typing.Dict[str, ~typing.Any], execution_time: float, timestamp: str, metadata: ~typing.Dict[str, ~typing.Any] = <factory>) → None

save(filepath: str | Path) → None[source]: Save result to JSON file.

to_dict() → Dict[str, Any][source]: Convert result to dictionary.

to_json() → str[source]: Convert result to JSON string.

benchmark_id: str

name: str

description: str

metrics: Dict[str, Any]

execution_time: float

timestamp: str

metadata: Dict[str, Any]

class kaira.benchmarks.BenchmarkSuite(name: str, description: str = '')[source]

Bases: object

Collection of benchmarks that can be run together.

__init__(name: str, description: str = '')[source]

Initialize benchmark suite.

Parameters:

name – Name of the benchmark suite
description – Description of the suite

add_benchmark(benchmark: BaseBenchmark) → None[source]: Add a benchmark to the suite.

get_summary() → Dict[str, Any][source]: Get summary statistics for all results.

run_all(**kwargs) → List[BenchmarkResult][source]: Run all benchmarks in the suite.

save_results(directory: str | Path) → None[source]: Save all results to a directory.

class kaira.benchmarks.BenchmarkRegistry[source]

Bases: object

Registry for managing benchmark classes and instances.

__init__()

classmethod clear() → None[source]: Clear all registered benchmarks.

classmethod create_benchmark(name: str, **kwargs) → BaseBenchmark | None[source]: Create an instance of a registered benchmark.

classmethod get(name: str) → Type[BaseBenchmark] | None[source]: Get a registered benchmark class.

classmethod list_available() → List[str][source]: List all available benchmark names.

classmethod register(name: str, benchmark_class: Type[BaseBenchmark]) → None[source]: Register a benchmark class.

kaira.benchmarks.register_benchmark(name: str)[source]: Decorator to register a benchmark class.

kaira.benchmarks.get_benchmark(name: str) → Type[BaseBenchmark] | None[source]: Get a registered benchmark class.

kaira.benchmarks.list_benchmarks() → List[str][source]: List all available benchmark names.

kaira.benchmarks.create_benchmark(name: str, **kwargs) → BaseBenchmark | None[source]: Create an instance of a registered benchmark.

class kaira.benchmarks.StandardMetrics[source]

Bases: object

Collection of standard metrics for communication system evaluation.

__init__()

static bit_error_rate(transmitted: Tensor, received: Tensor) → float[source]: Calculate Bit Error Rate (BER).

static block_error_rate(transmitted: Tensor, received: Tensor, block_size: int) → float[source]: Calculate Block Error Rate (BLER).

static channel_capacity(snr_db: float, bandwidth: float = 1.0) → float[source]: Calculate Shannon channel capacity.

static computational_complexity(model: Module, input_shape: tuple) → Dict[str, Any][source]: Estimate computational complexity of a PyTorch model.

static confidence_interval(data: Tensor, confidence: float = 0.95) → tuple[source]: Calculate confidence interval for data.

static latency_statistics(latencies: Tensor) → Dict[str, float][source]: Calculate latency statistics.

static mutual_information(x: Tensor, y: Tensor, bins: int = 50) → float[source]: Estimate mutual information between two variables.

static signal_to_noise_ratio(signal: Tensor, noise: Tensor) → float[source]: Calculate Signal-to-Noise Ratio (SNR) in dB.

static throughput(bits_transmitted: int, time_elapsed: float) → float[source]: Calculate throughput in bits per second.

class kaira.benchmarks.StandardRunner(verbose: bool = True, save_results: bool = True, results_manager: BenchmarkResultsManager | None = None)[source]

Bases: object

Standard sequential benchmark runner.

__init__(verbose: bool = True, save_results: bool = True, results_manager: BenchmarkResultsManager | None = None)[source]

Initialize standard benchmark runner.

Parameters:

verbose – Whether to print verbose output
save_results – Whether to save results automatically
results_manager – Custom results manager (creates default if None)

run_benchmark(benchmark: BaseBenchmark, **kwargs) → BenchmarkResult[source]: Run a single benchmark.

run_suite(suite: BenchmarkSuite, **kwargs) → List[BenchmarkResult][source]: Run a benchmark suite.

save_all_results(experiment_name: str | None = None) → Dict[str, Path][source]

Save all results using the results manager.

Parameters:: experiment_name – Optional experiment name for grouping results
Returns:: Dictionary mapping result names to saved file paths

class kaira.benchmarks.ParallelRunner(max_workers: int | None = None, verbose: bool = True)[source]

Bases: object

Parallel benchmark runner using thread pool.

__init__(max_workers: int | None = None, verbose: bool = True)[source]

Initialize parallel benchmark runner.

Parameters:

max_workers – Maximum number of worker threads (None for default)
verbose – Whether to print verbose output

run_benchmarks(benchmarks: List[BaseBenchmark], **kwargs) → List[BenchmarkResult][source]: Run multiple benchmarks in parallel.

class kaira.benchmarks.ComparisonRunner(verbose: bool = True)[source]

Bases: object

Runner for comparing multiple benchmarks on the same task.

__init__(verbose: bool = True)[source]

Initialize comparison runner.

Parameters:: verbose – Whether to print verbose output

get_comparison_summary(comparison_name: str) → Dict[str, Any][source]: Get summary of comparison results.

run_comparison(benchmarks: List[BaseBenchmark], comparison_name: str, **kwargs) → Dict[str, BenchmarkResult][source]: Run comparison between multiple benchmarks.

class kaira.benchmarks.ParametricRunner(verbose: bool = True)[source]

Bases: object

Runner for sweeping parameters across benchmarks.

__init__(verbose: bool = True)[source]

Initialize parametric runner.

Parameters:: verbose – Whether to print verbose output

run_parameter_sweep(benchmark: BaseBenchmark, parameter_grid: Dict[str, List[Any]]) → Dict[str, List[BenchmarkResult]][source]: Run benchmark with parameter sweep.

class kaira.benchmarks.BenchmarkConfig(name: str = 'default', description: str = '', seed: int = 42, device: str = 'auto', num_trials: int = 1, timeout_seconds: float | None = None, verbose: bool = True, save_results: bool = True, output_directory: str = './benchmark_results', save_plots: bool = True, save_raw_data: bool = False, batch_size: int = 1000, num_workers: int = 1, memory_limit_mb: float | None = None, snr_range: ~typing.List[float] = <factory>, block_length: int = 1000, code_rate: float = 0.5, model_precision: str = 'float32', compile_model: bool = False, calculate_confidence_intervals: bool = True, confidence_level: float = 0.95, custom_params: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: object

Configuration for benchmark execution.

__init__(name: str = 'default', description: str = '', seed: int = 42, device: str = 'auto', num_trials: int = 1, timeout_seconds: float | None = None, verbose: bool = True, save_results: bool = True, output_directory: str = './benchmark_results', save_plots: bool = True, save_raw_data: bool = False, batch_size: int = 1000, num_workers: int = 1, memory_limit_mb: float | None = None, snr_range: ~typing.List[float] = <factory>, block_length: int = 1000, code_rate: float = 0.5, model_precision: str = 'float32', compile_model: bool = False, calculate_confidence_intervals: bool = True, confidence_level: float = 0.95, custom_params: ~typing.Dict[str, ~typing.Any] = <factory>) → None

batch_size: int = 1000

block_length: int = 1000

calculate_confidence_intervals: bool = True

code_rate: float = 0.5

compile_model: bool = False

confidence_level: float = 0.95

description: str = ''

device: str = 'auto'

classmethod from_dict(config_dict: Dict[str, Any]) → BenchmarkConfig[source]: Create config from dictionary.

classmethod from_json(json_str: str) → BenchmarkConfig[source]: Create config from JSON string.

get(key: str, default: Any = None) → Any[source]: Get configuration parameter.

classmethod load(filepath: str | Path) → BenchmarkConfig[source]: Load configuration from file.

memory_limit_mb: float | None = None

model_precision: str = 'float32'

name: str = 'default'

num_trials: int = 1

num_workers: int = 1

output_directory: str = './benchmark_results'

save(filepath: str | Path) → None[source]: Save configuration to file.

save_plots: bool = True

save_raw_data: bool = False

save_results: bool = True

seed: int = 42

timeout_seconds: float | None = None

to_dict() → Dict[str, Any][source]: Convert config to dictionary.

to_json() → str[source]: Convert config to JSON string.

update(**kwargs) → None[source]: Update configuration parameters.

verbose: bool = True

snr_range: List[float]

custom_params: Dict[str, Any]

kaira.benchmarks.get_config(name: str) → BenchmarkConfig[source]: Get a predefined configuration.

kaira.benchmarks.list_configs() → List[str][source]: List available predefined configurations.

class kaira.benchmarks.BenchmarkResultsManager(base_dir: str | Path = 'results')[source]

Bases: object

Manages benchmark results with improved directory structure and organization.

__init__(base_dir: str | Path = 'results')[source]

Initialize the results manager.

Parameters:: base_dir – Base directory for storing all benchmark results

archive_old_results(days_old: int = 30) → None[source]

Archive benchmark results older than specified days.

Parameters:: days_old – Number of days after which to archive results

cleanup_empty_directories() → None[source]: Remove empty directories in the results structure.

create_comparison_report(result_paths: List[Path], report_name: str) → Path[source]

Create a comparison report from multiple benchmark results.

Parameters:

result_paths – List of paths to benchmark result files
report_name – Name for the comparison report

Returns:

Path to the generated report

list_results(category: str | None = None, experiment_name: str | None = None) → List[Path][source]

List available benchmark result files.

Parameters:

category – Specific category to list (benchmarks, suites, etc.)
experiment_name – Specific experiment to list

Returns:

List of result file paths (excludes summary files and comparison reports)

load_benchmark_result(filepath: str | Path) → BenchmarkResult[source]: Load a benchmark result from file.

save_benchmark_result(result: BenchmarkResult, category: str = 'benchmarks', experiment_name: str | None = None, add_timestamp: bool = True) → Path[source]

Save a single benchmark result with improved organization.

Parameters:

result – The benchmark result to save
category – Category (benchmarks, suites, experiments, etc.)
experiment_name – Optional experiment name for grouping
add_timestamp – Whether to add timestamp to filename

Returns:

Path to the saved file

save_suite_results(results: List[BenchmarkResult], suite_name: str, experiment_name: str | None = None) → Dict[str, Path][source]

Save multiple benchmark results from a suite.

Parameters:

results – List of benchmark results
suite_name – Name of the benchmark suite
experiment_name – Optional experiment name

Returns:

Dictionary mapping result names to file paths

class kaira.benchmarks.BenchmarkVisualizer(figsize: tuple = (10, 6), dpi: int = 100)[source]

Bases: object

Visualizer for benchmark results.

__init__(figsize: tuple = (10, 6), dpi: int = 100)[source]

Initialize visualizer.

Parameters:

figsize – Figure size in inches (width, height)
dpi – Figure resolution

create_benchmark_report(results_file: str, output_dir: str = 'benchmark_plots')[source]

Create a comprehensive visual report from benchmark results.

Parameters:

results_file – Path to JSON file containing benchmark results
output_dir – Directory to save plots

plot_benchmark_summary(results_file: str, save_path: str | None = None) → Figure[source]

Plot summary of multiple benchmark results.

Parameters:

results_file – Path to JSON file containing benchmark results
save_path – Optional path to save the figure

Returns:

Matplotlib figure object

plot_ber_curve(results: Dict[str, Any], save_path: str | None = None) → Figure[source]

Plot BER vs SNR curve.

Parameters:

results – Benchmark results containing SNR and BER data
save_path – Optional path to save the figure

Returns:

Matplotlib figure object

plot_coding_gain(results: Dict[str, Any], save_path: str | None = None) → Figure[source]

Plot coding gain vs SNR.

Parameters:

results – Benchmark results containing coding gain data
save_path – Optional path to save the figure

Returns:

Matplotlib figure object

plot_constellation(constellation: Tensor, received_symbols: Tensor | None = None, save_path: str | None = None) → Figure[source]

Plot constellation diagram.

Parameters:

constellation – Ideal constellation points
received_symbols – Optional received symbols to overlay
save_path – Optional path to save the figure

Returns:

Matplotlib figure object

plot_latency_distribution(results: Dict[str, Any], save_path: str | None = None) → Figure[source]

Plot latency distribution.

Parameters:

results – Benchmark results containing latency data
save_path – Optional path to save the figure

Returns:

Matplotlib figure object

plot_throughput_comparison(results: Dict[str, Any], save_path: str | None = None) → Figure[source]

Plot throughput comparison.

Parameters:

results – Benchmark results containing throughput data
save_path – Optional path to save the figure

Returns:

Matplotlib figure object