Author: El Defrawi, Mai Mahmoud Mahmoud./ Title: A Text Mining Model for the Arabic <br>Comparative Statements /

Search In this Thesis

العنوان

A Text Mining Model for the Arabic
Comparative Statements /

المؤلف

El Defrawi, Mai Mahmoud Mahmoud.

هيئة الاعداد

باحث / مي محمود محمود الدفراوي

مشرف / أحمد شرف الدين احمد

مشرف / مروه صالح فرحان

الموضوع

Computers and Information. Information Systems.

تاريخ النشر

2019.

عدد الصفحات

p. 93 :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Information Systems

تاريخ الإجازة

1/1/2019

مكان الإجازة

جامعة حلوان - كلية الحاسبات والمعلومات - نظم المعلومات

الفهرس

Only 14 pages are availabe for public view

from

Abstract

The rapid development of social media platforms allowed opinion mining
research to increase significantly. Opinion mining is the process of extracting subjective
information from opinions that imply a single sentiment. Comparative opinion mining is
a sub-field of opinion mining that deals with multi-sentiment opinions. Such opinions
are expressed by comparing several entities to each other. The sentiment of a
comparative relation is recognized by identifying the relation’s direction and thus the
preferred entity, Currently, the work on comparative Arabic opinions ignores two of its
main tasks; the extraction of comparative relations and the identification of their
direction. Thus this thesis aims to provide a methodology that handles these two tasks
effectively and effeciently. The methodology processes Arabic comparative opinions
using both supervised and unsupervised machine learning (ML) techniques. It describes
the development of a rule-based methodology for the extraction of comparative
relations from Arabic opinions. The Proposed Methodology (PM) consists of two tasks
a) creating rule base from sequential patterns b) matching new sentences with the rule
base. The rules created in the task (a) are based on the Label Sequential Rules (LSR)
concept while proposing a new technique in sequences selection and pattern creation.
This techniques makes the created rules more focused and match more accurately with
the sentences. Besides LSR created rules, some manually created rules are added for
handling complex patterns and compound keywords. The methodology also proposes
the concept of general rules. General rules are the generalization form of comparative
opinions. They ensure that a match would eventually occur. For task (b) The PM
introduces a new algorithm. The algorithm uses an ML classification algorithm to select
the set of rules that apply first to the new sentences. The classification allows the
algorithm to search through a smaller set of rules instead of traditional covering
algorithms where all rules are searched for a match. A special keyword list is added
during the matching steps for both exclusion form matching and for manually added
rules. The PM extracts comparative relations with high results of 91 % precision, 87%
recall and 88% F-measure score of the total entities and aspects extracted and
outperforms the Conditional Random Field (CRF) algorithm for extraction.