Kaira Benchmarking System

The Kaira benchmarking system provides standardized benchmarks for evaluating communication system components and deep learning models. This system enables fair comparison of different approaches and reproducible performance evaluation.

Overview

The benchmarking system consists of:

  • Base classes for creating custom benchmarks

  • Standard benchmarks for common communication tasks

  • Metrics for evaluating performance

  • Runners for executing benchmarks in different modes

  • Configuration management for reproducible experiments

  • CLI tool for command-line usage

Quick Start

Basic usage with the new organized results system:

from kaira.benchmarks import get_benchmark, StandardRunner, BenchmarkConfig

# Create a benchmark
ber_benchmark = get_benchmark("ber_simulation")(modulation="bpsk")

# Configure the benchmark
config = BenchmarkConfig(
    snr_range=list(range(-5, 11)),
    num_bits=100000
)

# Run the benchmark with automatic result organization
runner = StandardRunner()
result = runner.run_benchmark(ber_benchmark, **config.to_dict())

# Results are automatically saved to organized directory structure
print(f"BER results: {result.metrics['ber_simulated']}")

# Access saved results using the results manager
saved_files = runner.save_all_results(experiment_name="ber_evaluation")
print(f"Results saved to: {saved_files}")

Traditional usage (still supported):

# Manual result saving
result.save("benchmark_result.json")

Available Benchmarks

Standard Communication Benchmarks

  • ber_simulation: Bit Error Rate simulation for various modulation schemes

  • channel_capacity: Shannon channel capacity calculations

  • throughput_test: System throughput evaluation

  • latency_test: System latency measurement

  • model_complexity: Model computational complexity analysis

Custom Benchmarks

You can create custom benchmarks by inheriting from BaseBenchmark:

from kaira.benchmarks import BaseBenchmark, register_benchmark

@register_benchmark("my_benchmark")
class MyBenchmark(BaseBenchmark):
    def setup(self, **kwargs):
        super().setup(**kwargs)
        # Initialize benchmark

    def run(self, **kwargs):
        # Run benchmark and return metrics
        return {"success": True, "metric_value": 42}

Configuration

Predefined Configurations

  • fast: Quick testing configuration

  • accurate: High-accuracy configuration for publication results

  • comprehensive: Full evaluation with all metrics

  • gpu: GPU-optimized configuration

  • minimal: Minimal configuration for CI/CD

Custom Configuration:

config = BenchmarkConfig(
    name="my_config",
    num_trials=10,
    snr_range=list(range(-10, 16)),
    device="cuda",
    verbose=True
)

Benchmark Execution

Sequential Execution:

runner = StandardRunner(verbose=True)
result = runner.run_benchmark(benchmark, **config.to_dict())

Parallel Execution:

runner = ParallelRunner(max_workers=4)
results = runner.run_benchmarks(benchmarks, **config.to_dict())

Benchmark Suites:

suite = BenchmarkSuite("My Suite")
suite.add_benchmark(benchmark1)
suite.add_benchmark(benchmark2)

results = runner.run_suite(suite, **config.to_dict())

Comparison and Analysis:

runner = ComparisonRunner()
results = runner.run_comparison(
    [benchmark1, benchmark2],
    "Algorithm Comparison",
    **config.to_dict()
)

Metrics and Analysis

Standard Metrics

The StandardMetrics class provides common communication system metrics:

  • Bit Error Rate (BER)

  • Block Error Rate (BLER)

  • Signal-to-Noise Ratio (SNR)

  • Mutual Information

  • Throughput

  • Latency statistics

  • Channel capacity

  • Confidence intervals

Example:

from kaira.benchmarks import StandardMetrics

ber = StandardMetrics.bit_error_rate(transmitted, received)
snr = StandardMetrics.signal_to_noise_ratio(signal, noise)
capacity = StandardMetrics.channel_capacity(snr_db=10.0)

Results Management

Kaira provides an organized results management system that automatically structures benchmark results in a clean directory hierarchy.

Results Directory Structure

The benchmark system creates the following directory structure:

results/
├── benchmarks/          # Individual benchmark results
│   ├── experiment_name/
│   └── benchmark_files.json
├── suites/             # Benchmark suite results
│   ├── suite_name/
│   └── summary.json
├── experiments/        # Experimental runs
├── comparisons/        # Comparative studies
├── archives/          # Archived old results
├── configs/           # Configuration files
├── logs/              # Execution logs
└── summaries/         # Summary reports

Using the Results Manager

The new results management system provides automated organization:

from kaira.benchmarks import StandardRunner, BenchmarkResultsManager

# Create a results manager (uses 'results/' directory by default)
results_manager = BenchmarkResultsManager("my_results")

# Create a runner with the results manager
runner = StandardRunner(results_manager=results_manager)

# Run benchmarks - results are automatically saved and organized
result = runner.run_benchmark(benchmark, experiment_name="my_experiment")

# Results are automatically saved to:
# my_results/benchmarks/my_experiment/benchmark_name_timestamp_id.json

Manual Results Management

You can also manage results manually:

# Save individual result with automatic organization
results_manager = BenchmarkResultsManager()
saved_path = results_manager.save_benchmark_result(
    result,
    category="benchmarks",
    experiment_name="my_experiment"
)

# Save suite results
saved_files = results_manager.save_suite_results(
    results_list,
    suite_name="performance_suite",
    experiment_name="my_experiment"
)

# List available results
all_results = results_manager.list_results()
experiment_results = results_manager.list_results(
    category="benchmarks",
    experiment_name="my_experiment"
)

# Load results
result = results_manager.load_benchmark_result(result_path)

Loading and Analysis

# Load results using the results manager results_manager = BenchmarkResultsManager() result_paths = results_manager.list_results(category=”benchmarks”)

for path in result_paths: result = results_manager.load_benchmark_result(path) print(f”Result: {result.name}, Time: {result.execution_time:.2f}s”)

# Create comparison reports comparison_path = results_manager.create_comparison_report( result_paths[:3], “algorithm_comparison” )

Results Maintenance

The system includes maintenance features for long-term management:

# Archive old results (older than 30 days)
results_manager.archive_old_results(days_old=30)

# Clean up empty directories
results_manager.cleanup_empty_directories()

Command Line Interface

The kaira-benchmark CLI tool provides easy access to benchmarks:

# List available benchmarks
kaira-benchmark --list

# Run a single benchmark
kaira-benchmark --benchmark ber_simulation --config fast

# Run multiple benchmarks in parallel
kaira-benchmark --benchmark ber_simulation throughput_test --parallel

# Run benchmark suite
kaira-benchmark --suite --config comprehensive --output ./results

# Custom parameters
kaira-benchmark --benchmark ber_simulation --snr-range -5 10 --num-bits 50000

Best Practices

  1. Use appropriate configurations for your use case (fast for development, accurate for publications)

  2. Set random seeds for reproducible results:

    config = BenchmarkConfig(seed=42)
    
  3. Save raw data for important experiments:

    config = BenchmarkConfig(save_raw_data=True)
    
  4. Use confidence intervals for statistical analysis:

    config = BenchmarkConfig(
        calculate_confidence_intervals=True,
        confidence_level=0.95
    )
    
  5. Monitor memory usage for large experiments:

    config = BenchmarkConfig(memory_limit_mb=8192)
    

Examples

See the examples/benchmarks/ directory for comprehensive examples:

  • basic_usage.py: Basic benchmark usage

  • comparison_example.py: Comparing different approaches

  • custom_benchmark.py: Creating custom benchmarks

  • demo_new_results_system.py: New results management system demonstration

Results Management Example

The demo_new_results_system.py example demonstrates the complete workflow:

# Create and configure results manager
results_manager = BenchmarkResultsManager("example_results")

# Run benchmarks with automatic result organization
runner = StandardRunner(results_manager=results_manager)

# Create and run benchmark suites
suite = BenchmarkSuite("Performance Suite")
# ... add benchmarks to suite
results = runner.run_suite(suite, experiment_name="demo_experiment")

# Results are automatically organized in structured directories

API Reference

Kaira Benchmarking System.

This module provides standardized benchmarks for evaluating communication system components and deep learning models in Kaira.

class kaira.benchmarks.BaseBenchmark(name: str, description: str = '')[source]

Bases: ABC

Base class for all benchmarks.

__init__(name: str, description: str = '')[source]

Initialize base benchmark.

Parameters:
  • name – Name of the benchmark

  • description – Description of what the benchmark tests

execute(**kwargs) BenchmarkResult[source]

Execute the full benchmark pipeline.

abstractmethod run(**kwargs) Dict[str, Any][source]

Run the benchmark and return metrics.

abstractmethod setup(**kwargs) None[source]

Setup benchmark environment.

teardown() None[source]

Clean up after benchmark.

class kaira.benchmarks.BenchmarkResult(benchmark_id: str, name: str, description: str, metrics: ~typing.Dict[str, ~typing.Any], execution_time: float, timestamp: str, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: object

Container for benchmark results.

__init__(benchmark_id: str, name: str, description: str, metrics: ~typing.Dict[str, ~typing.Any], execution_time: float, timestamp: str, metadata: ~typing.Dict[str, ~typing.Any] = <factory>) None
save(filepath: str | Path) None[source]

Save result to JSON file.

to_dict() Dict[str, Any][source]

Convert result to dictionary.

to_json() str[source]

Convert result to JSON string.

benchmark_id: str
name: str
description: str
metrics: Dict[str, Any]
execution_time: float
timestamp: str
metadata: Dict[str, Any]
class kaira.benchmarks.BenchmarkSuite(name: str, description: str = '')[source]

Bases: object

Collection of benchmarks that can be run together.

__init__(name: str, description: str = '')[source]

Initialize benchmark suite.

Parameters:
  • name – Name of the benchmark suite

  • description – Description of the suite

add_benchmark(benchmark: BaseBenchmark) None[source]

Add a benchmark to the suite.

get_summary() Dict[str, Any][source]

Get summary statistics for all results.

run_all(**kwargs) List[BenchmarkResult][source]

Run all benchmarks in the suite.

save_results(directory: str | Path) None[source]

Save all results to a directory.

class kaira.benchmarks.BenchmarkRegistry[source]

Bases: object

Registry for managing benchmark classes and instances.

__init__()
classmethod clear() None[source]

Clear all registered benchmarks.

classmethod create_benchmark(name: str, **kwargs) BaseBenchmark | None[source]

Create an instance of a registered benchmark.

classmethod get(name: str) Type[BaseBenchmark] | None[source]

Get a registered benchmark class.

classmethod list_available() List[str][source]

List all available benchmark names.

classmethod register(name: str, benchmark_class: Type[BaseBenchmark]) None[source]

Register a benchmark class.

kaira.benchmarks.register_benchmark(name: str)[source]

Decorator to register a benchmark class.

kaira.benchmarks.get_benchmark(name: str) Type[BaseBenchmark] | None[source]

Get a registered benchmark class.

kaira.benchmarks.list_benchmarks() List[str][source]

List all available benchmark names.

kaira.benchmarks.create_benchmark(name: str, **kwargs) BaseBenchmark | None[source]

Create an instance of a registered benchmark.

class kaira.benchmarks.StandardMetrics[source]

Bases: object

Collection of standard metrics for communication system evaluation.

__init__()
static bit_error_rate(transmitted: Tensor, received: Tensor) float[source]

Calculate Bit Error Rate (BER).

static block_error_rate(transmitted: Tensor, received: Tensor, block_size: int) float[source]

Calculate Block Error Rate (BLER).

static channel_capacity(snr_db: float, bandwidth: float = 1.0) float[source]

Calculate Shannon channel capacity.

static computational_complexity(model: Module, input_shape: tuple) Dict[str, Any][source]

Estimate computational complexity of a PyTorch model.

static confidence_interval(data: Tensor, confidence: float = 0.95) tuple[source]

Calculate confidence interval for data.

static latency_statistics(latencies: Tensor) Dict[str, float][source]

Calculate latency statistics.

static mutual_information(x: Tensor, y: Tensor, bins: int = 50) float[source]

Estimate mutual information between two variables.

static signal_to_noise_ratio(signal: Tensor, noise: Tensor) float[source]

Calculate Signal-to-Noise Ratio (SNR) in dB.

static throughput(bits_transmitted: int, time_elapsed: float) float[source]

Calculate throughput in bits per second.

class kaira.benchmarks.StandardRunner(verbose: bool = True, save_results: bool = True, results_manager: BenchmarkResultsManager | None = None)[source]

Bases: object

Standard sequential benchmark runner.

__init__(verbose: bool = True, save_results: bool = True, results_manager: BenchmarkResultsManager | None = None)[source]

Initialize standard benchmark runner.

Parameters:
  • verbose – Whether to print verbose output

  • save_results – Whether to save results automatically

  • results_manager – Custom results manager (creates default if None)

run_benchmark(benchmark: BaseBenchmark, **kwargs) BenchmarkResult[source]

Run a single benchmark.

run_suite(suite: BenchmarkSuite, **kwargs) List[BenchmarkResult][source]

Run a benchmark suite.

save_all_results(experiment_name: str | None = None) Dict[str, Path][source]

Save all results using the results manager.

Parameters:

experiment_name – Optional experiment name for grouping results

Returns:

Dictionary mapping result names to saved file paths

class kaira.benchmarks.ParallelRunner(max_workers: int | None = None, verbose: bool = True)[source]

Bases: object

Parallel benchmark runner using thread pool.

__init__(max_workers: int | None = None, verbose: bool = True)[source]

Initialize parallel benchmark runner.

Parameters:
  • max_workers – Maximum number of worker threads (None for default)

  • verbose – Whether to print verbose output

run_benchmarks(benchmarks: List[BaseBenchmark], **kwargs) List[BenchmarkResult][source]

Run multiple benchmarks in parallel.

class kaira.benchmarks.ComparisonRunner(verbose: bool = True)[source]

Bases: object

Runner for comparing multiple benchmarks on the same task.

__init__(verbose: bool = True)[source]

Initialize comparison runner.

Parameters:

verbose – Whether to print verbose output

get_comparison_summary(comparison_name: str) Dict[str, Any][source]

Get summary of comparison results.

run_comparison(benchmarks: List[BaseBenchmark], comparison_name: str, **kwargs) Dict[str, BenchmarkResult][source]

Run comparison between multiple benchmarks.

class kaira.benchmarks.ParametricRunner(verbose: bool = True)[source]

Bases: object

Runner for sweeping parameters across benchmarks.

__init__(verbose: bool = True)[source]

Initialize parametric runner.

Parameters:

verbose – Whether to print verbose output

run_parameter_sweep(benchmark: BaseBenchmark, parameter_grid: Dict[str, List[Any]]) Dict[str, List[BenchmarkResult]][source]

Run benchmark with parameter sweep.

class kaira.benchmarks.BenchmarkConfig(name: str = 'default', description: str = '', seed: int = 42, device: str = 'auto', num_trials: int = 1, timeout_seconds: float | None = None, verbose: bool = True, save_results: bool = True, output_directory: str = './benchmark_results', save_plots: bool = True, save_raw_data: bool = False, batch_size: int = 1000, num_workers: int = 1, memory_limit_mb: float | None = None, snr_range: ~typing.List[float] = <factory>, block_length: int = 1000, code_rate: float = 0.5, model_precision: str = 'float32', compile_model: bool = False, calculate_confidence_intervals: bool = True, confidence_level: float = 0.95, custom_params: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: object

Configuration for benchmark execution.

__init__(name: str = 'default', description: str = '', seed: int = 42, device: str = 'auto', num_trials: int = 1, timeout_seconds: float | None = None, verbose: bool = True, save_results: bool = True, output_directory: str = './benchmark_results', save_plots: bool = True, save_raw_data: bool = False, batch_size: int = 1000, num_workers: int = 1, memory_limit_mb: float | None = None, snr_range: ~typing.List[float] = <factory>, block_length: int = 1000, code_rate: float = 0.5, model_precision: str = 'float32', compile_model: bool = False, calculate_confidence_intervals: bool = True, confidence_level: float = 0.95, custom_params: ~typing.Dict[str, ~typing.Any] = <factory>) None
batch_size: int = 1000
block_length: int = 1000
calculate_confidence_intervals: bool = True
code_rate: float = 0.5
compile_model: bool = False
confidence_level: float = 0.95
description: str = ''
device: str = 'auto'
classmethod from_dict(config_dict: Dict[str, Any]) BenchmarkConfig[source]

Create config from dictionary.

classmethod from_json(json_str: str) BenchmarkConfig[source]

Create config from JSON string.

get(key: str, default: Any = None) Any[source]

Get configuration parameter.

classmethod load(filepath: str | Path) BenchmarkConfig[source]

Load configuration from file.

memory_limit_mb: float | None = None
model_precision: str = 'float32'
name: str = 'default'
num_trials: int = 1
num_workers: int = 1
output_directory: str = './benchmark_results'
save(filepath: str | Path) None[source]

Save configuration to file.

save_plots: bool = True
save_raw_data: bool = False
save_results: bool = True
seed: int = 42
timeout_seconds: float | None = None
to_dict() Dict[str, Any][source]

Convert config to dictionary.

to_json() str[source]

Convert config to JSON string.

update(**kwargs) None[source]

Update configuration parameters.

verbose: bool = True
snr_range: List[float]
custom_params: Dict[str, Any]
kaira.benchmarks.get_config(name: str) BenchmarkConfig[source]

Get a predefined configuration.

kaira.benchmarks.list_configs() List[str][source]

List available predefined configurations.

class kaira.benchmarks.BenchmarkResultsManager(base_dir: str | Path = 'results')[source]

Bases: object

Manages benchmark results with improved directory structure and organization.

__init__(base_dir: str | Path = 'results')[source]

Initialize the results manager.

Parameters:

base_dir – Base directory for storing all benchmark results

archive_old_results(days_old: int = 30) None[source]

Archive benchmark results older than specified days.

Parameters:

days_old – Number of days after which to archive results

cleanup_empty_directories() None[source]

Remove empty directories in the results structure.

create_comparison_report(result_paths: List[Path], report_name: str) Path[source]

Create a comparison report from multiple benchmark results.

Parameters:
  • result_paths – List of paths to benchmark result files

  • report_name – Name for the comparison report

Returns:

Path to the generated report

list_results(category: str | None = None, experiment_name: str | None = None) List[Path][source]

List available benchmark result files.

Parameters:
  • category – Specific category to list (benchmarks, suites, etc.)

  • experiment_name – Specific experiment to list

Returns:

List of result file paths (excludes summary files and comparison reports)

load_benchmark_result(filepath: str | Path) BenchmarkResult[source]

Load a benchmark result from file.

save_benchmark_result(result: BenchmarkResult, category: str = 'benchmarks', experiment_name: str | None = None, add_timestamp: bool = True) Path[source]

Save a single benchmark result with improved organization.

Parameters:
  • result – The benchmark result to save

  • category – Category (benchmarks, suites, experiments, etc.)

  • experiment_name – Optional experiment name for grouping

  • add_timestamp – Whether to add timestamp to filename

Returns:

Path to the saved file

save_suite_results(results: List[BenchmarkResult], suite_name: str, experiment_name: str | None = None) Dict[str, Path][source]

Save multiple benchmark results from a suite.

Parameters:
  • results – List of benchmark results

  • suite_name – Name of the benchmark suite

  • experiment_name – Optional experiment name

Returns:

Dictionary mapping result names to file paths

class kaira.benchmarks.BenchmarkVisualizer(figsize: tuple = (10, 6), dpi: int = 100)[source]

Bases: object

Visualizer for benchmark results.

__init__(figsize: tuple = (10, 6), dpi: int = 100)[source]

Initialize visualizer.

Parameters:
  • figsize – Figure size in inches (width, height)

  • dpi – Figure resolution

create_benchmark_report(results_file: str, output_dir: str = 'benchmark_plots')[source]

Create a comprehensive visual report from benchmark results.

Parameters:
  • results_file – Path to JSON file containing benchmark results

  • output_dir – Directory to save plots

plot_benchmark_summary(results_file: str, save_path: str | None = None) Figure[source]

Plot summary of multiple benchmark results.

Parameters:
  • results_file – Path to JSON file containing benchmark results

  • save_path – Optional path to save the figure

Returns:

Matplotlib figure object

plot_ber_curve(results: Dict[str, Any], save_path: str | None = None) Figure[source]

Plot BER vs SNR curve.

Parameters:
  • results – Benchmark results containing SNR and BER data

  • save_path – Optional path to save the figure

Returns:

Matplotlib figure object

plot_coding_gain(results: Dict[str, Any], save_path: str | None = None) Figure[source]

Plot coding gain vs SNR.

Parameters:
  • results – Benchmark results containing coding gain data

  • save_path – Optional path to save the figure

Returns:

Matplotlib figure object

plot_constellation(constellation: Tensor, received_symbols: Tensor | None = None, save_path: str | None = None) Figure[source]

Plot constellation diagram.

Parameters:
  • constellation – Ideal constellation points

  • received_symbols – Optional received symbols to overlay

  • save_path – Optional path to save the figure

Returns:

Matplotlib figure object

plot_latency_distribution(results: Dict[str, Any], save_path: str | None = None) Figure[source]

Plot latency distribution.

Parameters:
  • results – Benchmark results containing latency data

  • save_path – Optional path to save the figure

Returns:

Matplotlib figure object

plot_throughput_comparison(results: Dict[str, Any], save_path: str | None = None) Figure[source]

Plot throughput comparison.

Parameters:
  • results – Benchmark results containing throughput data

  • save_path – Optional path to save the figure

Returns:

Matplotlib figure object