![]() | Only 14 pages are availabe for public view |
Abstract This thesis employs Gaussian Mixture Model for voice conversion of Arabic spoken words and compares it with another technique called PSOLA and resampling which depends on pitch shifting. As well as it proposes the usage of two compression techniques to compress the residual which requires large data storage.The first technique based on transforming the spectral envelope which is represented by the LSF coefficients.The transformation function is implemented using a joint density Gaussian Mixture Mode’! that is trained on aligned LSF. Also some residual prediction techniques are used such as (copying source residuals, copying reference residual and residual selection) to predict the LPC target residuals. Also the first technique is implemented by using MFCC instead of LSF. The second technique is Pitch Synchronous Overlap Add (PSOLA) and resampling. This technique depends on pitch shifting using time domain PSOLA and then resampling to return signal to its original length .The two techniques are investigated for some Arabic spoken words that contain the three vowels (a , e , 0) and then subjective and objective evaluations are used to evaluate and compare the two techniques. These evaluations show that the first technique using LSF features and residual selection technique or MFCC gives results better than the second technique.The usage of the residual selection method in the first technique requires a large data storage which need a great storage space, so the Multi-pulse Excitation Model and the Wavelet Transform are used to compress the residual before storing it. This thesis employs the space saving in between 73% -89% with good quality for the transformed Arabic word. This thesis proposed a new technique for voice conversion between genders called Dynamic Pitch Shifting (DPS). The proposed technique aims to minimize the storage area in the voice conversion system by eliminating the need of saving the target residual signal and only save the pitch marks position or the pitch periods of the target signal. |