Search In this Thesis
   Search In this Thesis  
العنوان
Toward a Sentiment Arabic Semantic
Lexical Database/
المؤلف
Mobarz, Hanaa Bayomi Ali.
الموضوع
Arabic Software Center
تاريخ النشر
2015.
عدد الصفحات
216 P. :
الفهرس
Only 14 pages are availabe for public view

from 154

from 154

Abstract

Sentiment analysis has recently become one of the growing areas of research related
to text mining and natural language processing. The increasing availability of online
resources and popularity of rich and fast resources for opinion sharing like news,
online review sites and personal blogs, caused several parties such as customers,
companies, and governments to start analyzing and exploring these opinions.
Most of the current studies related to this topic focus mainly on English texts with
very limited resources available for other languages like Arabic. The challenge of
sentiment analysis and text mining in general in Arabic arises from the complexity of
the language in terms of both structure and morphology. The Arabic language allows
for different variants within each type of sentence. Many different parts of speech,
particular about Arabic, are possible. Furthermore, Arabic is a highly inflectional and
derivational language with many word forms and diacritics.
Sentiment analysis attempts to identify and analyze opinions and emotions. A
common point in almost any work on sentiment analysis is the need to identify which
elements of language contribute to express the subjectivity in text, such as
identification if often accomplished by using a lexical resource that lists lexical items
with its general opinion-related properties (positivity, negativity or objectivity). The
lexical items in a lexical resource can be either single word or multiword sequences.
For example, the term ”” casts a positive connotation to its subject.
The objective of this thesis is to acquire the sentiment Arabic lexical semantic
database (SentiRDI) having the word prior polarities coupled with its contextual
polarities and the related phrases. In order to do that, we investigate on the automatic
recognition of opinion-related properties of terms. This result into building Arabic
Semantic Lexical resources, which can be used into sentiment analysis applications.
We present a method for determining term orientation and term subjectivity using
semi-supervised techniques that is based on the quantitative analysis of the synonyms
of such terms.
We present SentiRDI, a novel high-quality, high-coverage lexical resource, where
each one of the 18. 413 semantic fields in the database that covers over 150.000
words contained in RDI Lexical Semantic Data Base (RDILSDB) has been
automatically evaluated on the three dimensions of positivity, negativity, and
objectivity. SentiRDI as the lexical resource determines the prior polarity of each
Arabic word.
Sentiment analysis has always been a research hotspot of text mining. Toward the
problem that traditional lexicon-based sentiment analysis method could not complete
the work of sentiment word discovery with satisfied performance so we propose a
relevant refinement of the task, i.e. recognizing contextual polarity in Arabic phrase
level. Contextual Polarity means the polarity of the expression in which a word
appears, considering the context of the sentence and document. This approach first
determines if the expression is polar or neutral, then takes the polar expressions for
additional classification to determine the polarity for each polar expression.
The Corpus that is used is the Arabic version of MPQA opinion corpus, which
consists of 535 English-language news articles from a variety of sources, manually
annotated for subjectivity. The corpus consists of 9700sentences 55% of them are
xv
labeled as subjective, while the rests are objective. We use this corpus for testing
SentiRDI by extracting the prior polarity and contextual polarity of each opinion word
and use it as a feature in five different machine learning classifiers.
Part of this corpus is taken to annotate manually for contextual polarity in a phrase
level. In total, all 18,678 subjective phrases in the 3, 578 sentences of the MPQA
Corpus were annotated. The contextual polarity rules for our corpus are added,
namely to the annotations of these phrases that contain instances of the opinion words
in sentiment Arabic lexical semantic database (SentiRDI). The experiments were
conducted on this part of MPQA corpus, and we could achieve 90.05% and 84.6%
classification accuracy in phrase-level subjectivity and polarity respectively.
Moreover, we studied the effect of the prior and contextual polarities of words in a
sentence-level subjectivity and Document-level polarity classification. Multiple
machine learning techniques were used for classifications that have proven a
significant improvement when using SentiRDI and the generated model which able to
recognize contextual polarity in Arabic phrase level automatically. The experiments
were conducted on several Arabic sentiment corpora, and we could achieve 86.5%
and 97.02 % classification accuracy in sentence-level and document-level polarity
respectively.
Finally, different classification effectiveness measures were used like: 1) precision, 2)
recall, and 3) F- Measure (F-score) 4) Accuracy to help in evaluating the performance
of the proposed prototype and the effectiveness of the suggested features set.