Author: Rabie, Mai Ezzel-din./ Title: Enhanced Best Tree Encoding (BTE) Model using adapted wavelet filter /

Search In this Thesis

العنوان

Enhanced Best Tree Encoding (BTE) Model using adapted wavelet filter /

المؤلف

Rabie, Mai Ezzel-din.

هيئة الاعداد

باحث / مي عز الدين ربيع عبد الفتاح

مشرف / . عمرو محمد رفعت جودي

مشرف / . رانيا أحمد عبد العظيم أبو السعود

مناقش / جلال عزالدين نديم

مناقش / أحمد حسن مدين

الموضوع

BTE. Telecommunications. Electronics.

تاريخ النشر

2015.

عدد الصفحات

160 P. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

17/6/2015

مكان الإجازة

جامعة الفيوم - كلية الهندسة - قسم الهندسة الكهربية

الفهرس

Only 14 pages are availabe for public view

from

Abstract

Manual annotation for time-alignment a speech waveform against the corresponding phonetic sequence is a time consuming task. This research aimed to introduce completely automated English isolated phone speakers independent recognition system; based on Wavelet Packets Best Tree Encoding feature (WPBTE). WPBTE is used to find phoneme boundaries along speech utterance. Comparison to Mel-Frequency Cepstral Coefficients (MFCCs) speech feature in solving the same problem is provided. Hidden Markov Model (HMM) and Gaussian Mixtures are used for building the statistical models through this research. HTK software toolkit is utilized for implementation of the model.
Best-Tree Encoding (BTE) is a new algorithm for Automatic Speech Recognition (ASR) problem. BTE is basically acting as spectrum analyzer. It relies on Wavelet packets to get projection of signal power into predefined filter banks. The feature components are encoded into digital form using certain entropy method and certain digital encoding procedure. In this research BTE is further developed by including two more key factors into the BTE process. The key factors are Mel-scale (MS) and baseband Bandwidth mapping (BM).This Research provides a baseline performance evaluation for vocabulary-independent phone recognition (Without Grammar) of English by using Vid-TIMIT database. Vid-TIMIT consists of 43 speakers (19 female and 24 male), reciting short sentences. The recording of this database was done in a noisy environment (mostly computer fan noise) and also it is not hand verified. Total of 15643 phone segments are used for testing and evaluating the newly proposed features. HMM is used as recognition engine via HTK toolkit for its popularity in ASR. Comparison to MFCC on the same database is considered to evaluate the system results. Although it gives the same recognition efficiency as MFCC on the same testing database, the proposed model saves almost 70% of the required storage of the feature vector for MFCC.