MODERN TRENDS IN THE DEVELOPMENT OF SPEECH RECOGNITION SYSTEMS
Ключевые слова:
automatic speech recognition, hidden Markov models, end-to-end, neural networks, CTC.Аннотация
This article presents the main ideas, advantages and disadvantages of models based on hidden
Markov models (HMMs) - a Gaussian mixture models (GMM), end-to-end models and indicates that the end-to-end
model is a developing area in the field of speech recognition. A review of studies that conducted in this subject area
shows that end-to-end speech recognition systems can achieve results comparable to the results of standard systems
using hidden Markov models, but using a simpler configuration and faster operation of the recognition system both
in training and in decoding. An analytical review of the varieties of end-to-end systems for automatic speech
recognition is considered, namely, models based on the connection time classification (CTC), attention-based
mechanism and conditional random fields (CRF), and theoretical comparisons are made. Ultimately, their respective
advantages and disadvantages and the possible future development of these systems are indicated.