Benchmarking Suite

Agnostic Stress Testing

The ProtoMQ Benchmarking Suite was built with a critical realization: performance validation is universal. While developed alongside ProtoMQ, the suite is entirely decoupled from the server implementation.

It uses standard MQTT protocols to interact with any broker, making it a valuable tool for testing any MQTT implementation (Mosquitto, EMQX, Vernemq, etc.) against the same rigorous scenarios used to optimize ProtoMQ.

The Scenarios (B1 – B7)

Each benchmark targets a specific stress vector, from connection churn to payload size variance.

B1: Baseline Concurrency & Latency

Verifies sub-millisecond round-trip latency under steady-state load with 100+ concurrent connections.

Metrics: p50/p99 Latency, Connection Time, Memory (RSS)
B2: Thundering Herd

Simulates 10,000 clients connecting in bursts and publishing simultaneously. Tests massive fan-out and connection pressure.

Metrics: Fan-out Time, Connection Failures, Message Loss, Peak CPU/Memory
B3: Sustained Throughput

10-minute endurance test at 10,000 msg/s. Monitors for long-term stability and performance degradation.

Metrics: Messages/sec, p99 Latency Stability, Memory Growth, Avg CPU
B4: Wildcard Subscription Explosion

Stress-tests the topic matching engine with complex overlapping patterns like sensor/+/temp.

Metrics: Topic Matching Latency (µs), Routing Correctness, Peak Memory
B5: Decoding Under Load

Compares binary vs JSON processing overhead. Measures efficiency gains of the Protobuf layer.

Metrics: Bandwidth Savings %, Decoding Latency, CPU Overhead
B6: Connection Churn

100,000 rapid connect/disconnect cycles. Essential for detecting socket leaks in edge/mobile scenarios.

Metrics: Connection Rate (conn/s), Memory Leak (MB), FD Leak Count
B7: Message Size Variations

Tests performance across a spectrum of payloads, from 10-byte telemetry to 64KB binary images.

Metrics: Throughput vs Size, p99 Latency per Size, Memory Scaling

Modern Python Infrastructure

The suite leverages Python 3.11+ and the uv package manager for lightning-fast environment setup and execution.

Modular Design

A common/ library provides shared logic for connection tracking, resource monitoring (CPU/Memory), and a standardized BenchmarkRunner. This allows new scenarios to be added by writing only the logic for the specific test case.

Thresholding & Validation

Raw performance numbers are only useful if they can be measured against expectations. The suite includes a sophisticated Thresholding Mechanism that converts bench results into actionable status reports.

Direction-Aware Validation

Every metric is defined with a target "direction" (lower is better for latency, higher is better for throughput). The ThresholdChecker automatically interprets the results based on these semantics.

● PASS: Metric meets the strict performance target.
● WARN: Metric is within acceptable limits but approaching the maximum (e.g., latency spikes).
● FAIL: Metric exceeds the defined safety threshold, indicating a regression.

CI/CD Implementation

Because thresholds are declared in thresholds.json files, the suite is ideal for CI/CD pipelines. A build can be automatically failed if a commit introduces a performance regression, ensuring the "sub-millisecond" promise of ProtoMQ is never broken.

# Examples of threshold definitions
{
    "p99_latency_ms": { "max": 0.8, "warn": 0.6, "direction": "lower" },
    "concurrent_connections": { "min": 9500, "direction": "higher" }
}

Agnostic Stress Testing

The Scenarios (B1 – B7)

B1: Baseline Concurrency & Latency

B2: Thundering Herd

B3: Sustained Throughput

B4: Wildcard Subscription Explosion

B5: Decoding Under Load

B6: Connection Churn

B7: Message Size Variations

Modern Python Infrastructure

Modular Design

Thresholding & Validation

Direction-Aware Validation

CI/CD Implementation