Robust End-to-end Speaker Diarization with Conformer and Additive Margin Penalty

Robust End-to-end Speaker Diarization with Conformer and Additive Margin Penalty
April 25, 2023
Research

Traditionally, a speaker diarization system has multiple compo- nents to extract and cluster speaker embeddings. However, end- to-end diarization is more desirable as it facilitates optimizing one model in contrast to multiple components in a traditional set up. Moreover, end-to-end diarization systems are capable of handling overlapped speech. Recently proposed self-attentive end-to-end diarization model with encoder-decoder based at- tractors (EEND-EDA) is capable of processing speech from an unknown number of speakers, and has reported comparable per- formances to traditional systems. In this work, we aim to im- prove the EEND-EDA model. First, we increase the robust- ness of the model by incorporating an additive margin penalty for minimizing the intra-class variance. Second, we propose to replace the Transformer encoders with Conformer encoders to capture local information. Third, we propose to use convolu- tional subsampling and upsampling instead of manual subsam- pling only. Our proposed improvements report 21.6% relative reduction in DER on the evaluation full set of the track 2 of the DIHARD III challenge.