Author: Kabel, Samia Abd EL Moneim./ Title: Utilization of Deep Learning<br>Techniques for Speech Signal Analysis /

Search In this Thesis

العنوان

Utilization of Deep Learning
Techniques for Speech Signal Analysis /

المؤلف

Kabel, Samia Abd EL Moneim.

هيئة الاعداد

باحث / سامية عبد المنعم عمر قابل

مشرف / محمد محمد عبد السلام نصار

مناقش / أشرف عبد المنعم خلف

مناقش / معوض ابراهيم دسوقي

الموضوع

Speech processing systems. signal analysis. Learning.

تاريخ النشر

2020

عدد الصفحات

110 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

13/12/2020

مكان الإجازة

جامعة المنوفية - كلية الهندسة الإلكترونية - قسى هنذسة الإنكترونيات والاتصالات انكهربية

الفهرس

Only 14 pages are availabe for public view

from

134

from

134

Abstract

This thesis is mainly concerned with text-independent Speaker Recognition
(SR). Generally, the Automatic Speaker Recognition (ASR) system can be
classified into two main categories: text-dependent SR and text-independent SR.
In text-dependent SR, all speakers are committed to use the same sentence in both
training and testing phases. On the other hand, in text-independent SR, speakers
are free to use any sentences in the training and testing phases. The SR process in
general depends on the extraction of features from the speech signals. The textindependent SR task is harder to implement than the text-dependent SR task. Two
proposed approach are introduced in this thesis for text-independent SR.
The first proposal depends on extracting features and utilization of Long
Short-Term Memory Recurrent Neural Network (LSTM-RNN) to identify the
speakers. The utilized features are Mel Frequency Cepstral Coefficients (MFCCs),
spectrum magnitude bins, and log spectrum magnitude bins. The second proposal
depends on the generation of spectrogram images from the speech signal patches.
These spectrogram images are utilized in the classification process with a
Convolutional Neural Network (CNN).
The reverberation is a severe effect that exists in closed rooms. A proposed
speech classification system is introduced to classify the speech signals into
reverberant or not using the LSTM-RNN and the CNN. The effects of noise,
reverberation, and interference are considered in this study. Moreover, speech
enhancement techniques such as spectral subtraction and wavelet denoising are
considered in this thesis to enhance the performance of the SR process. These
enhancement methods are used as a pre-processing steps prior to the ASR system.
In addition, Radon Transform (RT) is used for better representation of speech
signals in the presence of noise as it is robust to the noise effect. The Radon projection of the spectrogram of speech signals is obtained at different orientation
or angles, A DCT is then taken after applying Radon projection. The performance
of the ASR system with Radon features is compared to that with MFCCs and
spectrum. Also, the effect of interference on the ASR system is studied. The
interference effect is cancelled with a signal separation algorithm that is used as a
pre-processing step prior to the ASR system to boost its performance. For pattern
security of the SR system, cancellable SR is presented in this thesis with an
approach that depends on spectrogram patch selection based on a user-specific
key. The Cancellable pattern is used to protect the user privacy and increase the its
security.
Simulation results prove the high efficiency of the proposed approaches for
text-independent SR with the enhancement methods, Radon based features and
blind signal separation. Also, the results reveal that, the suggested cancellable
approach is practical, and satisfies the desired criteria of renewability, security
[which means that the template can be changed if it is compromised], and high
performance [which is near to the performance of the system with the original
template].