Note

Go to the end to download the full example code. or to run this example in your browser via Binder

New Results Management System Demo

This example demonstrates the new organized results management system in Kaira, showcasing automatic directory structuring, experiment naming, suite management, result comparison, and maintenance features.

The results management system provides:

Automatic directory organization for benchmark results
Experiment naming and metadata tracking
Suite-level result aggregation and comparison
Result maintenance and cleanup utilities
Comprehensive result analysis and reporting

Setting up the Environment

First, let’s import the necessary modules and create our demonstration benchmark.

import time

import numpy as np

from kaira.benchmarks.base import BaseBenchmark, BenchmarkSuite
from kaira.benchmarks.results_manager import BenchmarkResultsManager
from kaira.benchmarks.runners import StandardRunner

# Set random seed for reproducibility
np.random.seed(42)

Creating a Custom Benchmark

Let’s create a simple benchmark class for demonstration purposes.

class ExampleBenchmark(BaseBenchmark):
    """Example benchmark for demonstration purposes."""

    def __init__(self, name: str, description: str = "", delay: float = 0.1):
        super().__init__(name, description)
        self.delay = delay

    def setup(self, **kwargs) -> None:
        """Setup benchmark environment."""
        super().setup(**kwargs)

    def run(self, **kwargs) -> dict:
        """Run the benchmark and return metrics."""
        # Simulate benchmark execution
        time.sleep(self.delay)

        # Return some example metrics
        return {"throughput": 1000 / self.delay, "latency": self.delay, "success": True, "memory_usage": 100 + self.delay * 50, "accuracy": 0.95 + (0.05 * (1 - self.delay))}  # Operations per second  # Seconds  # MB  # Percentage

Demonstrating Basic Results Management

Let’s start with the basic usage of the results management system.

def demonstrate_basic_usage():
    """Demonstrate basic usage of the new results system."""
    print("=" * 60)
    print("1. Basic Benchmark Results Management")
    print("=" * 60)

    # Create a results manager
    results_manager = BenchmarkResultsManager("example_results")

    # Create and run a simple benchmark
    benchmark = ExampleBenchmark("Performance Test", "Example benchmark for testing", delay=0.2)
    result = benchmark.execute()

    # Save the result
    saved_path = results_manager.save_benchmark_result(result, category="benchmarks", experiment_name="demo_experiment")

    print("Saved benchmark result to:", saved_path)

    # List available results
    results = results_manager.list_results(category="benchmarks")
    print(f"Found {len(results)} benchmark results")

    return results_manager

Suite Management Features

The results system also provides comprehensive suite management capabilities.

def demonstrate_suite_management(results_manager):
    """Demonstrate benchmark suite management."""
    print("\n" + "=" * 60)
    print("2. Benchmark Suite Management")
    print("=" * 60)

    # Create a benchmark suite
    suite = BenchmarkSuite("Performance Suite", "Collection of performance benchmarks")

    # Add multiple benchmarks to the suite
    benchmarks = [ExampleBenchmark("Fast Benchmark", "Quick test", delay=0.1), ExampleBenchmark("Medium Benchmark", "Medium test", delay=0.2), ExampleBenchmark("Slow Benchmark", "Thorough test", delay=0.3)]

    for benchmark in benchmarks:
        suite.benchmarks.append(benchmark)

    # Run the suite using the StandardRunner
    runner = StandardRunner(verbose=True, results_manager=results_manager)
    suite_results = runner.run_suite(suite, experiment_name="demo_experiment")

    print(f"\nSuite completed with {len(suite_results)} results")

    # The results are automatically saved by the runner
    suite_files = results_manager.list_results(category="suites")
    print(f"Found {len(suite_files)} suite-related files")

Result Comparison and Analysis

The system provides powerful tools for comparing and analyzing benchmark results.

def demonstrate_comparison_and_analysis(results_manager):
    """Demonstrate result comparison and analysis features."""
    print("\n" + "=" * 60)
    print("3. Result Comparison and Analysis")
    print("=" * 60)

    # Get all available results
    all_results = results_manager.list_results()

    if len(all_results) >= 2:
        # Create a comparison report
        comparison_path = results_manager.create_comparison_report(all_results[:3], "demo_comparison")  # Compare first 3 results
        print("Created comparison report:", comparison_path)

        # Load and display a result
        sample_result = results_manager.load_benchmark_result(all_results[0])
        print("\nSample result:", sample_result.name)
        print(f"  Execution time: {sample_result.execution_time:.3f}s")
        print(f"  Key metrics: {sample_result.metrics}")

Maintenance and Cleanup Features

The results system includes maintenance tools to keep your results organized.

def demonstrate_maintenance_features(results_manager):
    """Demonstrate maintenance and cleanup features."""
    print("\n" + "=" * 60)
    print("4. Maintenance and Cleanup")
    print("=" * 60)

    # Archive old results (in a real scenario, you'd set a meaningful days_old value)
    print("Archiving old results...")
    results_manager.archive_old_results(days_old=0)  # Archive everything for demo

    # Clean up empty directories
    print("Cleaning up empty directories...")
    results_manager.cleanup_empty_directories()

    # Show directory structure
    print(f"\nFinal directory structure in {results_manager.base_dir}:")
    for item in sorted(results_manager.base_dir.rglob("*")):
        if item.is_dir():
            print(f"  📁 {item.relative_to(results_manager.base_dir)}/")
        else:
            print(f"  📄 {item.relative_to(results_manager.base_dir)}")

Running the Complete Demo

Let’s run through all the demonstration functions to see the full system in action.

def main():
    """Main demonstration function."""
    print("Kaira Benchmark Results Management Demo")
    print("This script demonstrates the new organized benchmark results system.")

    try:
        # 1. Basic usage
        results_manager = demonstrate_basic_usage()

        # 2. Suite management
        demonstrate_suite_management(results_manager)

        # 3. Comparison and analysis
        demonstrate_comparison_and_analysis(results_manager)

        # 4. Maintenance features
        demonstrate_maintenance_features(results_manager)

        print("\n" + "=" * 60)
        print("Demo completed successfully!")
        print("=" * 60)
        print("\nKey benefits of the new system:")
        print("• Organized directory structure")
        print("• Automatic file naming and timestamping")
        print("• Suite-level result management")
        print("• Built-in comparison and analysis tools")
        print("• Maintenance and archiving features")

        print("\nCheck the 'example_results' directory to see the organized structure.")

    except Exception as e:
        print("Error during demonstration:", e)
        import traceback

        traceback.print_exc()

Execute the demonstration

if __name__ == "__main__":
    main()

Kaira Benchmark Results Management Demo
This script demonstrates the new organized benchmark results system.
============================================================
1. Basic Benchmark Results Management
============================================================
Saved benchmark result to: example_results/benchmarks/demo_experiment/Performance_Test_20250615_021750_edf9f43b.json
Found 1 benchmark results

============================================================
2. Benchmark Suite Management
============================================================
Running benchmark suite: Performance Suite
  3 benchmarks to run
  [1/3] Fast Benchmark
Running benchmark: Fast Benchmark
  ✓ Completed in 0.10s
  [2/3] Medium Benchmark
Running benchmark: Medium Benchmark
  ✓ Completed in 0.20s
  [3/3] Slow Benchmark
Running benchmark: Slow Benchmark
  ✓ Completed in 0.30s

Suite completed with 3 results
Found 0 suite-related files

============================================================
3. Result Comparison and Analysis
============================================================
Created comparison report: example_results/comparisons/demo_comparison_comparison.json

Sample result: Performance Test
  Execution time: 0.200s
  Key metrics: {'throughput': 5000.0, 'latency': 0.2, 'success': True, 'memory_usage': 110.0, 'accuracy': 0.99}

============================================================
4. Maintenance and Cleanup
============================================================
Archiving old results...
Cleaning up empty directories...

Final directory structure in example_results:
  📁 archives/
  📁 archives/archives/
  📁 archives/archives/benchmarks/
  📁 archives/archives/benchmarks/demo_experiment/
  📄 archives/archives/benchmarks/demo_experiment/Performance_Test_20250615_021750_edf9f43b.json
  📁 archives/archives/comparisons/
  📄 archives/archives/comparisons/demo_comparison_comparison.json
  📁 archives/archives/example_results/
  📁 archives/archives/example_results/suites/
  📁 archives/archives/example_results/suites/demo_experiment/
  📁 archives/archives/example_results/suites/demo_experiment/Performance Suite/
  📄 archives/archives/example_results/suites/demo_experiment/Performance Suite/Fast_Benchmark_d8618f87.json
  📄 archives/archives/example_results/suites/demo_experiment/Performance Suite/Medium_Benchmark_6903c0f6.json
  📄 archives/archives/example_results/suites/demo_experiment/Performance Suite/Slow_Benchmark_868f2b48.json
  📁 archives/archives/suites/
  📁 archives/archives/suites/demo_experiment/
  📁 archives/archives/suites/demo_experiment/Performance Suite/
  📄 archives/archives/suites/demo_experiment/Performance Suite/summary.json

============================================================
Demo completed successfully!
============================================================

Key benefits of the new system:
• Organized directory structure
• Automatic file naming and timestamping
• Suite-level result management
• Built-in comparison and analysis tools
• Maintenance and archiving features

Check the 'example_results' directory to see the organized structure.

Summary

This example demonstrated the comprehensive results management system in Kaira:

Organized Structure: Automatic directory organization for different result types
Metadata Tracking: Automatic timestamping and experiment naming
Suite Management: Handling collections of related benchmarks
Comparison Tools: Built-in result comparison and analysis features
Maintenance: Archiving and cleanup utilities to manage result storage

The results management system ensures that your benchmark data is organized, accessible, and maintainable over time, making it easier to track performance trends and compare different approaches.

Gallery generated by Sphinx-Gallery