Year
2025
TYPe
FUN
Category
Security
DEV Duration
2 - 3 Weeks
The project commenced with a detailed investigation into the call control and monitoring capabilities provided by the voice orchestration platform VAPI. During initial research, it became evident that VAPI's documentation extensively covered methodologies for streaming audio packets post-call but lacked comprehensive guidance for real-time audio streaming during active calls. This identified gap prompted an exploration of alternative mechanisms and protocols suitable for low-latency, live audio streaming and playback within web-based applications.
The project architecture is designed around a Node.js environment, providing a scalable and efficient backend to handle WebSocket connections, call control commands, and audio data streaming. The frontend is implemented as a simple yet responsive web application, utilizing standard web technologies such as HTML, CSS, and JavaScript, ensuring ease of use and accessibility across browsers. A critical technical component of the design is the incorporation of the AudioWorkletProcessor, which allows real-time, low-latency audio processing directly within the user's browser.
The development efforts centred around reliably streaming and processes live call audio with minimal latency. Initial efforts involved setting up a Node.js backend to manage WebSocket connections and handle VAPI’s call control API. The main point of friction was configuring audio Hz and Sample rates for 1:1 stream audio accuracy.
The concept was to be for educational and internal use only. Monitoring or recording calls without explicit or implied consent from both parties is illegal. And hence, this project highlighted something more alarming than "call control" and "monitoring", this highlighted real nefarious use cases where this framework could be used in illegal manners.