SipPulse AI telemetry: every parameter explained

SipPulse AI delivers per-call telemetry via signed webhooks. Here is what every event type and metric means, with the open example viewer at /telemetry.

SipPulse AI - Engineering TeamApril 15, 20266 min read

Share this article

SipPulse AI telemetry: every parameter explained

Most voice agent vendors will not show you their numbers. The reason is simple: the numbers are usually not great. Industry-median voice agent latency in 2026 sits at 1.4 to 1.7 seconds, with 10 percent of calls over 3 seconds, while the human conversation expectation is closer to 200ms. Telemetry is the discipline that makes those numbers visible. SipPulse AI emits every voice agent event as a signed webhook, with per-stage latency, per-call counters and structured payload, so customers can pipe the data into any observability stack and see exactly how each call performed. This post walks through every event type, every metric, and how to consume them, with a public example viewer at our telemetry page.

Why telemetry matters for voice agents

A voice agent has more failure modes than a chat agent. The user can interrupt mid-response, the network can drop a packet, the STT can misrecognize a word, the LLM can take 800ms longer than expected, the TTS can stutter on a long sentence. Without telemetry you find out about each of these from a customer complaint, days after the fact, with no ability to reproduce or debug.

Voice agent observability flips the model. Every call emits structured events that capture per-stage timing, model identity and conversation state. Aggregate them and you see trends. Inspect a single call and you see exactly which stage was slow. The same data feeds latency dashboards, alerts on regressions, and Auto QA pipelines that score conversations after the fact.

How telemetry is delivered: signed webhooks

SipPulse AI does not ship a built-in dashboard you have to log into. Telemetry is delivered as webhooks: every call event POSTs a JSON payload to an endpoint you configure. You consume the events in your own observability stack (Grafana, Datadog, a custom service, anything that can receive HTTP).

The webhooks are signed with HMAC using a shared secret, with a timestamp guard against replay attacks. If the timestamp is too old or the signature does not match, the event is rejected. This is the standard webhook security pattern and the same one you have in Stripe or GitHub.

The architectural choice (webhooks instead of a polled API or a hosted dashboard) is intentional. Customers want telemetry in their own stack, with their existing alerting, retention and access controls. A vendor dashboard is one more login for the on-call engineer. A webhook is just another data source.

Event types: what fires when

Every call generates a stream of events with these types:

voice.start_call: emitted when a call connects. Includes participant info, agent ID and timestamp
voice.end_call: emitted when the call disconnects. Includes the per-call usage summary with all latency aggregates
llm.completion: emitted for each LLM turn. Includes prompt, response, model, token counts and per-turn TTFT
stt.transcription: emitted as transcription results stream in. Includes partial and final transcripts
tts.synthesis: emitted for each TTS request. Includes voice model, text and TTFB
thread.created: emitted when a new conversation thread starts. Useful for grouping calls into sessions
thread.closed: emitted when a thread ends. Useful for session-level analytics

The voice.end_call event is where most aggregate analytics live, because it carries the rolled-up usage summary for the whole call. The granular events (llm.completion, stt.transcription, tts.synthesis) are where you see per-turn behavior for debugging.

The latency parameters that matter

The metrics that ship in the voice.end_call payload (and on aggregate stats) are the ones that matter for voice agent observability:

llm_ttft_ms: average LLM time-to-first-token across the call. The number that drives perceived latency. Target: under 400ms.
llm_latency_ms: average total LLM duration per turn. Larger than TTFT because it includes the full generation. Matters for total turn time.
tts_latency_ms: average TTS time-to-first-byte. The time from "TTS requested" to "first audio chunk available". Target: under 150ms.
eou_latency_ms: end-of-utterance detection latency. How long after the user stops talking before the agent decides the turn is over. Target: under 300ms.
conv_latency_ms: average conversation latency, the weighted total round trip across the call. Target: under 800ms.
session_duration: total call length in seconds. The denominator for cost and retention metrics.

Per-provider counters round out the picture: llm_requests and tts_requests count how many times each model was called in the session. The model field identifies which LLM and TTS were in use, which matters when you A/B test different models on the same flows.

Aggregate stats and call counters

The example telemetry viewer surfaces aggregate stats across all recent calls:

total_events and total_calls for volume
avg_llm_latency, avg_tts_latency, avg_conv_latency, avg_eou_latency, avg_llm_ttft for latency trends
avg_duration for session length

The aggregates are computed from the same raw webhook payloads. In your own stack you would compute them on a rolling window (last hour, last day, last 7 days) and alert on regressions. The viewer shows a fixed window so visitors can see the metrics on real demo traffic.

Security and retention

The HMAC signature on every webhook is enforced server-side. The timestamp guard rejects events older than a few minutes, preventing replay. The example viewer's underlying storage retains events for 72 hours and auto-prunes anything older, which is enough to debug a recent regression without becoming a long-term data store. Production deployments customize retention to match their compliance posture (LGPD, GDPR, PCI as applicable).

For the webhook receiver itself, the same standards apply: TLS for transport, HMAC verification before processing, and rate limiting against malformed payloads.

How to consume the webhooks

Our open example viewer shows what consuming the webhooks looks like end to end. The page is a small Next.js endpoint that:

Receives the webhook POST
Verifies the HMAC signature and timestamp
Parses the payload and persists the event to a local SQLite table
Renders aggregate stats and a recent-events table

The implementation is open. Treat it as a starter, not as a production dashboard. Most customers replace it with their own pipeline that pushes events to Grafana, Datadog, BigQuery or whatever observability stack they already operate.

For an even faster path, point a webhook URL at any HTTP receiver you already have, validate the signature, and the events start flowing.

Conclusion

Voice agent telemetry is the line between a vendor that markets latency and a vendor that proves it. SipPulse AI delivers per-call events to your stack via signed webhooks, with the per-stage parameters that actually drive customer experience. Visit our example telemetry viewer to see live events, try the demo to generate your own, or contact our team to wire telemetry into your observability pipeline.

#voice agent telemetry#voice agent observability#webhooks#latency#monitoring#SipPulse AI

Voice AI Architecture for Telecom: Why Three Planes Matter

Most voice AI demos work. Most voice AI production deployments don't. The gap between a demo that handles a scripted conversation and a system that operates under regulatory scrutiny with real PSTN tr

June 18, 20263 min read

Flavio Goncalves