Real-Time Voice Systems: Design and Architecture in 5 Levels

Voice systems have advanced rapidly in recent years, but most implementations still stop at demos: simple combinations of Speech-to-Text, language models, and Text-to-Speech that work in controlled environments but fail when facing real-world conditions. This talk proposes a different approach: understanding voice systems as an architecture that evolves through maturity levels, from basic prototypes to real-time production-ready systems. Through a 5-level framework, we'll walk the full path of a Conversational AI system: from integrating basic components, through orchestration challenges (streaming, latency, turn-taking), to less obvious but critical problems like audio quality, robustness, and user experience, reaching real-time architectures with technologies like LiveKit, and finally exploring where the future is headed with end-to-end systems and multimodal agents. The talk is based on real experience building voice systems in production and focuses on engineering decisions more than specific tools. Attendees will leave with a clear understanding of how to design modern voice systems with Python, what problems to anticipate, and how to structure their own architectures to build world-class conversational experiences.

Want to know more?

Join PyCon Colombia newsletter and get a complete overview of our events, speakers and community participation.