الفهرس | Only 14 pages are availabe for public view |
Abstract The rapid development of social media platforms allowed opinion mining research to increase significantly. Opinion mining is the process of extracting subjective information from opinions that imply a single sentiment. Comparative opinion mining is a sub-field of opinion mining that deals with multi-sentiment opinions. Such opinions are expressed by comparing several entities to each other. The sentiment of a comparative relation is recognized by identifying the relation’s direction and thus the preferred entity, Currently, the work on comparative Arabic opinions ignores two of its main tasks; the extraction of comparative relations and the identification of their direction. Thus this thesis aims to provide a methodology that handles these two tasks effectively and effeciently. The methodology processes Arabic comparative opinions using both supervised and unsupervised machine learning (ML) techniques. It describes the development of a rule-based methodology for the extraction of comparative relations from Arabic opinions. The Proposed Methodology (PM) consists of two tasks a) creating rule base from sequential patterns b) matching new sentences with the rule base. The rules created in the task (a) are based on the Label Sequential Rules (LSR) concept while proposing a new technique in sequences selection and pattern creation. This techniques makes the created rules more focused and match more accurately with the sentences. Besides LSR created rules, some manually created rules are added for handling complex patterns and compound keywords. The methodology also proposes the concept of general rules. General rules are the generalization form of comparative opinions. They ensure that a match would eventually occur. For task (b) The PM introduces a new algorithm. The algorithm uses an ML classification algorithm to select the set of rules that apply first to the new sentences. The classification allows the algorithm to search through a smaller set of rules instead of traditional covering algorithms where all rules are searched for a match. A special keyword list is added during the matching steps for both exclusion form matching and for manually added rules. The PM extracts comparative relations with high results of 91 % precision, 87% recall and 88% F-measure score of the total entities and aspects extracted and outperforms the Conditional Random Field (CRF) algorithm for extraction. |