Search In this Thesis
   Search In this Thesis  
العنوان
Towards Machine Comprehension for Arabic Text \
المؤلف
Eid, Ahmad Magdy Ahmad Mahmoud.
هيئة الاعداد
باحث / احمد مجدى احمد محمود عيد
ahmed090742@alex-eng.edu.eg
مشرف / نجوى مصطفى المكى
nagwamakky@gmail.com
مشرف / خالد مجدى ناجى
knagi@alex.edu.eg
مناقش / محمد عبد الحميد اسماعيل
drmaismail@gmail.com
مناقش / صالح عبد الشكور الشهابى
الموضوع
Computer Engineering.
تاريخ النشر
2019.
عدد الصفحات
50 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
الهندسة (متفرقات)
تاريخ الإجازة
1/12/2019
مكان الإجازة
جامعة الاسكندريه - كلية الهندسة - هندسة الحاسب والنظم
الفهرس
Only 14 pages are availabe for public view

from 67

from 67

Abstract

Machine Comprehension (MC) is a new area of question answering (QA) discipline. Machine comprehension is an AI-complete task, which requires a QA system to process a piece of text, comprehend it, and be able to extract the span of text, which is the answer to the user query. Many new types of research are building an end-to-end deep learning paradigm for the English language based on neural networks to directly compute the deep semantic matching among questions, answers, and the corresponding passages. Deep learning gives state-of-the-art performance results for English MC. The Arabic language presents numerous challenges because of the complex structure, morphologically richness, in addition to the lack of its resources. Arabic MC problem has not been addressed yet for the Arabic language, mainly due to the lack of Arabic MC datasets. In this work, an Arabic MC dataset is presented for the research community. This dataset results from the translation of the SQuAD v1.1 dataset and applying a proposed approach that combines partial post-editing, semi-supervised learning, and validation. The proposed dataset consists of 44 K validated question/answer pairs. To the best of our knowledge, this makes it the largest available high-quality Arabic machine comprehension dataset 1. Training state-of-the-art deep learning machine comprehension models on the proposed dataset give promising results, despite the complexity of the Arabic language. For example, a BERT-based Arabic MC model achieves 72.88% and 80.28% for Exact match (EM) and F1 score, respectively. Knowing that BERT [1] was the state-of-the-art on the English SQuAD leaderboard (at the time of writing this thesis) and that the EM and F1 human performance values for SQuAD are 82.3% and 91.22%, respectively according to SQuAD leaderboard, this thesis recommends the BERT-based model trained on the proposed dataset as an effective Arabic MC model.