Search In this Thesis
   Search In this Thesis  
العنوان
An incrementally trainable statistical approach tominformation extraction based on token classification /
الناشر
Ahmed Montaser Hasan Ibrahem Farag ,
المؤلف
Ahmed Montaser Hasan Ibrahem Farag
تاريخ النشر
2015
عدد الصفحات
106 Leaves :
الفهرس
يوجد فقط 14 صفحة متاحة للعرض العام

from 120

from 120

المستخلص

Named Entity Recognition (NER) task has become essential to improve the performance of many Natural Language Processing (NLP) tasks. Its aim is to come up with a solution to increase the accuracy of extracted named entities{u2019} identification. This thesis presents the first step to extract useful information for a researcher who is interested in the Egyptian People{u2019}s Assembly by creating a new corpus of the Egyptian People{u2019}s Assembly and presenting a novel solution for Arabic Named Entity Recognition (ANER). The solution uses a Conditional Random Field (CRF) sequence-labeling model by training it on mixing feature, morphological, gazetteers, and using character n-gram of leading and trailing letters in words. The results in this thesis show that the F-measure of mixing features running on the datasets of the Egyptian People{u2019}s Assembly is the better F- measure than other features run on the datasets as we are going to show in this thesis