two people using the SIDR system. The real time visualization in the background screen shows a pie chart of the ration of speaking time of each participant
two people using the SIDR system


Deep Learning-Based Real-Time Speaker Identification

Consider each of our individual voices as a flashlight to illuminate how we project ourselves in society and how much sonic space we give ourselves or others. Thus, turn-taking computation through speaker recognition systems has been used as a tool to understand social situations or work meetings. We present SIDR, a deep learning-based, real-time speaker recognition system designed to be used in real-world settings. The system is resilient to noise, and adapts to room acoustics, different languages, and overlapping dialogues. While existing systems require the use of several microphones for each speaker or the need to couple video and sound recordings for accurate recognition of a speaker, SIDR only requires a medium-quality microphone or computer-embedded microphone.


Clément Duhart



Spring 2017