Author: Salama, Heba Salama Abdo./ Title: Building A Spoken Arabic Corpus for Egyptian Children :

Search In this Thesis

العنوان

Building A Spoken Arabic Corpus for Egyptian Children :

المؤلف

Salama, Heba Salama Abdo.

هيئة الاعداد

باحث / هبة سلامة عبده سلامة

مشرف / سامح سعد أبو المجد الأنصارى

مناقش / هناء عبد الفتاح سالم

مناقش / الحسين على يحيى

الموضوع

Phonology. Linguistics. English Language - - Usage.

تاريخ النشر

2015.

عدد الصفحات

115 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الصوتيات والموجات فوق الصوتية

تاريخ الإجازة

1/1/2015

مكان الإجازة

جامعة الاسكندريه - كلية الاداب - الصوتيات

الفهرس

Only 14 pages are availabe for public view

from

Abstract

There are three aims for the thesis. The first is to build a spoken Arabic corpus for Egyptian children. The second is to provide annotation scheme of the transcribed data. The third is to use CLAN program in linguistic analysis of child transcript. The proposed corpus is a collection of longitudinal child language data, based on spontaneous conversations. The corpus text files transcribed from 10 children (5 boys-5 girls) from1;6 to 4 years with about 5 hours recording (6GB) and 330 hours of transcription. The recordings provide vast amounts of useful data for linguistic, psychological, and acoustics. Audio data is presented by using the WAV file format. Noise is removed from the WAV files by using an audacity program. The WAV files size is minimized to 48 Hz by using cool edit program to run the files on CLAN. Broad phonemic transcription is done manually by using CHILDES (Child Language Exchange System) Unicode, IPA symbols and chat format codes of transcription. The annotation for 2701 words for only one child is done manually. The size of the corpus is nearly 25,645 utterances based on audio files by five boys and five girls. Linguistic annotation of the corpora provides better exploration of the development of grammatical constructions and their usage. At the end of the thesis, certain applications of linguistic analysis commands are provided. The analyses include frequency counts, word searches, co-occurrence analyses; MLU (mean length of utterance) counts and analyzes specified pairs of utterances. The thesis demonstrates the remarkable ease in database access for many research purposes.