الفهرس | Only 14 pages are availabe for public view |
Abstract Multimodal speech recognition is proved to be one of the most promising solutions for robust speech recognition, especially when the acoustic signal is corrupted by noise. The visual signal can be used to obtain more information to enhance the speech recognition accuracy in noisy system because it is not affected by the acoustic noise. In the situations when the SNR of acoustic signals is low, the video cues can compensate the acoustic signals, and thus their method significantly improve the recognition accuracy. The critical stage in designing robust speech recognition system is the choice of reliable classification method from large variety of the existing classification techniques. This research introduces an Audio-Visual Speech Recognition (AVSR) model using both audio and visual speech modality to improve recognition accuracy in a clean and noisy environment. |