SipPulse AI telemetry: every parameter explained
SipPulse AI delivers per-call telemetry via signed webhooks. Here is what every event type and metric means, with the open example viewer at /telemetry.

Most voice agent vendors will not show you their numbers. The reason is simple: the numbers are usually not great. Industry-median voice agent latency in 2026 sits at 1.4 to 1.7 seconds, with 10 percent of calls over 3 seconds, while the human conversation expectation is closer to 200ms. Telemetry is the discipline that makes those numbers visible. SipPulse AI emits every voice agent event as a signed webhook, with per-stage latency, per-call counters and structured payload, so customers can pipe the data into any observability stack and see exactly how each call performed. This post walks through every event type, every metric, and how to consume them, with a public example viewer at our telemetry page.
Why telemetry matters for voice agents
A voice agent has more failure modes than a chat agent. The user can interrupt mid-response, the network can drop a packet, the STT can misrecognize a word, the LLM can take 800ms longer than expected, the TTS can stutter on a long sentence. Without telemetry you find out about each of these from a customer complaint, days after the fact, with no ability to reproduce or debug.
Voice agent observability flips the model. Every call emits structured events that capture per-stage timing, model identity and conversation state. Aggregate them and you see trends. Inspect a single call and you see exactly which stage was slow. The same data feeds latency dashboards, alerts on regressions, and Auto QA pipelines that score conversations after the fact.
How telemetry is delivered: signed webhooks
SipPulse AI does not ship a built-in dashboard you have to log into. Telemetry is delivered as webhooks: every call event POSTs a JSON payload to an endpoint you configure. You consume the events in your own observability stack (Grafana, Datadog, a custom service, anything that can receive HTTP).
The webhooks are signed with HMAC using a shared secret, with a timestamp guard against replay attacks. If the timestamp is too old or the signature does not match, the event is rejected. This is the standard webhook security pattern and the same one you have in Stripe or GitHub.
The architectural choice (webhooks instead of a polled API or a hosted dashboard) is intentional. Customers want telemetry in their own stack, with their existing alerting, retention and access controls. A vendor dashboard is one more login for the on-call engineer. A webhook is just another data source.
Event types: what fires when
Every call generates a stream of events with these types:
voice.start_call: emitted when a call connects. Includes participant info, agent ID and timestampvoice.end_call: emitted when the call disconnects. Includes the per-call usage summary with all latency aggregatesllm.completion: emitted for each LLM turn. Includes prompt, response, model, token counts and per-turn TTFTstt.transcription: emitted as transcription results stream in. Includes partial and final transcriptstts.synthesis: emitted for each TTS request. Includes voice model, text and TTFBthread.created: emitted when a new conversation thread starts. Useful for grouping calls into sessionsthread.closed: emitted when a thread ends. Useful for session-level analytics
The voice.end_call event is where most aggregate analytics live, because it carries the rolled-up usage summary for the whole call. The granular events (llm.completion, stt.transcription, tts.synthesis) are where you see per-turn behavior for debugging.
The latency parameters that matter
The metrics that ship in the voice.end_call payload (and on aggregate stats) are the ones that matter for voice agent observability:
llm_ttft_ms: average LLM time-to-first-token across the call. The number that drives perceived latency. Target: under 400ms.llm_latency_ms: average total LLM duration per turn. Larger than TTFT because it includes the full generation. Matters for total turn time.tts_latency_ms: average TTS time-to-first-byte. The time from "TTS requested" to "first audio chunk available". Target: under 150ms.eou_latency_ms: end-of-utterance detection latency. How long after the user stops talking before the agent decides the turn is over. Target: under 300ms.conv_latency_ms: average conversation latency, the weighted total round trip across the call. Target: under 800ms.session_duration: total call length in seconds. The denominator for cost and retention metrics.
Per-provider counters round out the picture: llm_requests and tts_requests count how many times each model was called in the session. The model field identifies which LLM and TTS were in use, which matters when you A/B test different models on the same flows.
Aggregate stats and call counters
The example telemetry viewer surfaces aggregate stats across all recent calls:
total_eventsandtotal_callsfor volumeavg_llm_latency,avg_tts_latency,avg_conv_latency,avg_eou_latency,avg_llm_ttftfor latency trendsavg_durationfor session length
The aggregates are computed from the same raw webhook payloads. In your own stack you would compute them on a rolling window (last hour, last day, last 7 days) and alert on regressions. The viewer shows a fixed window so visitors can see the metrics on real demo traffic.
Security and retention
The HMAC signature on every webhook is enforced server-side. The timestamp guard rejects events older than a few minutes, preventing replay. The example viewer's underlying storage retains events for 72 hours and auto-prunes anything older, which is enough to debug a recent regression without becoming a long-term data store. Production deployments customize retention to match their compliance posture (LGPD, GDPR, PCI as applicable).
For the webhook receiver itself, the same standards apply: TLS for transport, HMAC verification before processing, and rate limiting against malformed payloads.
How to consume the webhooks
Our open example viewer shows what consuming the webhooks looks like end to end. The page is a small Next.js endpoint that:
- Receives the webhook POST
- Verifies the HMAC signature and timestamp
- Parses the payload and persists the event to a local SQLite table
- Renders aggregate stats and a recent-events table
The implementation is open. Treat it as a starter, not as a production dashboard. Most customers replace it with their own pipeline that pushes events to Grafana, Datadog, BigQuery or whatever observability stack they already operate.
For an even faster path, point a webhook URL at any HTTP receiver you already have, validate the signature, and the events start flowing.
Read also
- Evaluating voice AI agents in production: WER, MOS, latency
- Voice AI agent architecture: STT, LLM, TTS and the latency budget
- Turn detection, barge-in and interruption handling
Conclusion
Voice agent telemetry is the line between a vendor that markets latency and a vendor that proves it. SipPulse AI delivers per-call events to your stack via signed webhooks, with the per-stage parameters that actually drive customer experience. Visit our example telemetry viewer to see live events, try the demo to generate your own, or contact our team to wire telemetry into your observability pipeline.
Related Articles

Voice agents with RAG and function calling
A voice agent that only chats is a toy. Function calling and RAG turn it into a product. Here is how the pieces fit and where the latency hides.

How Voice AI is Revolutionizing Customer Service
Discover how Voice AI agents are transforming contact centers with real-time conversation, reduced wait times, and 24/7 availability.

Voice AI compliance: LGPD, GDPR and PCI for call data
Voice data is biometric data under GDPR and LGPD. PCI-DSS adds payment rules. Here is what voice AI deployments must handle to stay compliant in 2026.