Author: Abo Shady,Doaa Hassan Ali./ Title: On Information Retrieval using Co-word<br>Analysis and Data Mining Techniques /

Search In this Thesis

العنوان

On Information Retrieval using Co-word
Analysis and Data Mining Techniques /

المؤلف

Abo Shady,Doaa Hassan Ali.

هيئة الاعداد

باحث / Doaa Hassan Ali Abo Shady

مشرف / Fayed F. M. Ghaleb

مشرف / El?Sayed A. Atlam

مشرف / Dowlat A. El A. Mohamed

تاريخ النشر

2018

عدد الصفحات

88p.;

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الرياضيات

تاريخ الإجازة

1/1/2018

مكان الإجازة

جامعة عين شمس - كلية العلوم - علوم الحاسب

الفهرس

Only 14 pages are availabe for public view

from

Abstract

Information retrieval (IR) is the science of searching for information in documents,
searching for document themselves, searching for metadata which
describe documents, or searching within database, whether relational standalone
or hypertext networked databases such as the internet or World Wide
Web or internet, for text, sound, images or data. Field association terms (FA
terms) are the terms that indicate each subject matter category in the classification
scheme. In this thesis, co-word analysis that counts and analyzes the
co-occurrence of keywords in the publications on a given subject will be used to
measure the relations among a selected sample of FA terms in a common field.
The thesis objectives are the outline of information retrieval, co-word analysis,
and power link. It is devoted to focus on the previous work of the Retrieval
Precision (RP) and focuses on how to use the power link as a tool to
improve the extracted field association terms from corpus by the proposed
algorithm.
The thesis presents a modified method to produce an improvement FA
terms dictionary by using the co-word and Power link analysis. The modified
method is used to calculate the levels of FA terms by giving different weights
to terms according to their position in the document.
The proposed method uses the power link concept as well as modifications
of the rules to classify the scientific papers into its proper field. Instead of the
whole document, a given document will be divided into three parts, namely
the title, abstract, and body. A given term will be given a weight that depends
on the location of the term in a specific document. The greatest weight
will be given to the title, then the abstract, and then the body respectively.
Results of used data show an improvement in precision, recall, and F-measure
in perfect FA terms (Level 1), but with different data the proposed method
can give an improvement in level 2 and level 3.The thesis is organized into four chapters:
Chapter 1: Presents a review of definitions and concepts related to information
retrieval, FA terms, co-word analysis, and presents the relation between
these fields. Also, this chapter discusses the methods of IR system evaluation.
Chapter 2: Presents a review of the power link analysis, real word spell
checker based on power links, and the main steps of this method and its
applications in various fields. Also, it presents the traditional algorithm for
calculating the levels of FA terms based on power link analysis and the methods
to solve spelling errors by using the concept of power link. This survey
reflects that the relation between these areas did not studied before.
Chapter 3: Presents the modified algorithm for calculating the perfect FA
terms, and presents the Continuity and Transition theme to detect the different
parts of every document. Also, it presents Python language that used
to write a program for the code of the modified system. Finally, it presents
the experiments applied to a set of documents (scientific researches) and the
comparison between the traditional and proposed methods that presented in
this chapter, which helps in evaluating the system.
Chapter 4: Concludes the thesis and lists important future work.