Subspace Based Sequence Discriminative Training of LSTM Acoustic Models with Feed-Forward Layers

Subspace Based Sequence Discriminative Training of LSTM Acoustic Models with Feed-Forward Layers
Dr. Albert Lam
October 5, 2022
Research

State-of-the-art automatic speech recognition (ASR) systems use sequence discriminative training for improved performance over frame-level cross-entropy (CE) criterion. Even though sequence discriminative training improves long short-term memory (LSTM) recurrent neural network (RNN) acoustic models (AMs), it is not clear whether these systems achieve the optimal performance due to overfitting. This paper investigates the effect of state-level minimum Bayes risk (sMBR) training on LSTM AMs and shows that the conventional way of performing sMBR by updating all LSTM parameters is not optimal. We investigate two methods to improve the performance of sequence discriminative training of LSTM AMs. First more feed-forward (FF) layers are included between the last LSTM layer and the output layer so those additional FF layers may bene- fit more from sMBR training. Second, a subspace is estimated as an interpolation of rank-1 matrices when performing sMBR for the LSTM layers of the AM. Our methods are evaluated in benchmark AMI single distance microphone (SDM) task. We find that the proposed approaches provide 1.6% absolute improvement over a strong sMBR trained LSTM baseline.

Subspace Based Sequence Discriminative Training of LSTM Acoustic Models with Feed-Forward Layers

B.Eng. (2005), Ph.D. (2010), HKU. Senior Member of IEEE. Croucher research fellow. Adjunct Assistant Professor in EEE, HKU. Post-doc, UC Berkeley. Research Assistant Professor, HKBU and HKU.