Conformer-based Speech Recognition with Linear Nystrom Attention and Rotary Position Embedding

Conformer-based Speech Recognition with Linear Nystrom Attention and Rotary Position Embedding
April 25, 2023
Research

Self-attention has become an important component for end-to-end (E2E) automatic speech recognition (ASR). Recently, Convolution- augmented Transformer (Conformer) with relative positional encod- ing (RPE) achieved state-of-the-art performance. However, the com- putational and memory complexity of self-attention grows quadrati- cally with the input sequence length. Effect of this can be significant for the Conformer encoder when processing longer sequences. In this work, we propose to replace self-attention with a linear com- plexity Nystro ̈m attention which is a low-rank approximation of the attention scores based on the Nystro ̈m method. In addition, we pro- pose to use Rotary Position Embedding (RoPE) with Nystro ̈m at- tention since RPE is of quadratic complexity. Moreover, we show that models can be made even lighter by removing self-attention sub-layers from top encoder layers without any drop in the perfor- mance. Furthermore, we demonstrate that Convolutional sub-layers in Conformer can effectively recover the information lost due to the Nystro ̈m approximation.