Author: El-Manfaloty, Rania Abdou Gaber./ Title: Investigation about Employing GMM for Voice Conversion Techniques for Arabic Spoken Words \

Search In this Thesis

العنوان

Investigation about Employing GMM for Voice Conversion Techniques for Arabic Spoken Words \

المؤلف

El-Manfaloty, Rania Abdou Gaber.

هيئة الاعداد

باحث / Rania Abdou Gaber El-Manfaloty

rania-elmanfaloty@yahoo.com

مشرف / El-Sayed Ahmed Youssef

مشرف / Noha Othman Korany

nokorany@hotmail.com

مشرف / Mona Hamed Lotfy

مناقش / Said El-Sayed El-Khamy

مناقش / El-Sayed Mahmoud El-Rabiee

الموضوع

Voice Conversion - Techniques.

تاريخ النشر

2013.

عدد الصفحات

108 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

1/12/2012

مكان الإجازة

جامعة الاسكندريه - كلية الهندسة - Electrical Engineering

الفهرس

Only 14 pages are availabe for public view

from

130

from

130

Abstract

This thesis employs Gaussian Mixture Model for voice conversion of Arabic spoken words and compares it with another technique called PSOLA and resampling which depends on pitch shifting. As well as it proposes the usage of two compression techniques to compress the residual which requires large data storage.The first technique based on transforming the spectral envelope which is represented by the LSF coefficients.The transformation function is implemented using a joint density Gaussian Mixture Mode’! that is trained on aligned LSF. Also some residual prediction techniques are used such as (copying source residuals, copying reference residual and residual selection) to predict the LPC target residuals. Also the first technique is implemented by using MFCC instead of LSF. The second technique is Pitch Synchronous Overlap Add (PSOLA) and resampling. This technique depends on pitch shifting using time domain PSOLA and then resampling to return signal to its original length .The two techniques are investigated for some Arabic spoken words that contain the three vowels (a , e , 0) and then subjective and objective evaluations are used to evaluate and compare the two techniques. These evaluations show that the first technique using LSF features and residual selection technique or MFCC gives results better than the second technique.The usage of the residual selection method in the first technique requires a large data storage which need a great storage space, so the Multi-pulse Excitation Model and the Wavelet Transform are used to compress the residual before storing it. This thesis employs the space saving in between 73% -89% with good quality for the transformed Arabic word. This thesis proposed a new technique for voice conversion between genders called Dynamic Pitch Shifting (DPS). The proposed technique aims to minimize the storage area in the voice conversion system by eliminating the need of saving the target residual signal and only save the pitch marks position or the pitch periods of the target signal.