Incorporating Prior Knowledge Into Speaker Diarization and Linking for Identifying Common Speaker

October 5, 2022

Speaker Diarization and Linking discovers “who spoke when” across recordings without any speaker enrollment. Diarization is performed on each recording separately, and the linking combines clusters of the same speaker across recordings. It is a two-step approach, however it suffers from propagating the error from diarization step to the linking step. In a situation where a unique speaker appears in a given set of recordings, this paper aims at locating the common speaker using the prior knowledge of his or her existence. That means there is no enrollment data for this common speaker. We propose Pairwise Common Speaker Identification (PCSI) method that takes the existence of a common speaker into account in contrast to the two-step approach. We further show that PCSI can be used to reduce the errors that are introduced in the diarization step of the two-step approach. Our experiments are performed on a corpus synthesised from the AMI corpus and also on a in-house conversational telephony Sichuanese corpus that is mixed with Mandarin. We show up to 7.68% relative improvements of time-weighted equal error rate over a state-of-art x-vector diarization and linking system.

Incorporating Prior Knowledge Into Speaker Diarization and Linking for Identifying Common Speaker

Dr. Albert Lam

Chief Scientist & CTO

B.Eng. (2005), Ph.D. (2010), HKU. Senior Member of IEEE. Croucher research fellow. Adjunct Assistant Professor in EEE, HKU. Post-doc, UC Berkeley. Research Assistant Professor, HKBU and HKU.

Latest articles

Browse all articles

Newsroom

Fano Labs Secures IMDA Accreditation

May 16, 2024

Newsroom

Announcing our Series B funding round

May 2, 2024