Author: Ismail,Aya Abd El-Aziz Ibrahim./ Title: On Text Analysis and Time Series Variation /

Search In this Thesis

العنوان

On Text Analysis and Time Series Variation /

المؤلف

Ismail,Aya Abd El-Aziz Ibrahim.

هيئة الاعداد

باحث / Aya Abd El-Aziz Ibrahim Ismail

مشرف / Fayed F. M. Ghaleb

مشرف / El-Sayed A. Atlam

مشرف / Azza A. Taha

تاريخ النشر

2018

عدد الصفحات

117p.;

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الرياضيات

تاريخ الإجازة

1/1/2018

مكان الإجازة

جامعة عين شمس - كلية العلوم - علوم الحاسب

الفهرس

Only 14 pages are availabe for public view

from

117

from

117

Abstract

With the advent of the World Wide Web, the significance of Information Retrieval (IR) has grown. IR is the process of searching for the relevant information in the subjects that interest the user and then retrieving it. Recent years have seen enormous increases in the amount of texts that are available electronically on the Internet. These texts are analyzed into useful information widely and then it can be utilized for searching, clustering, classifying, summarizing and retrieving information, etc. Typically, the system of IR searches in collections of data that are either unstructured or semi-structured. The user can get the needed information from a collection of documents by reading the whole documents, then maintaining the relevant documents and neglecting the others. The result of this retrieval may be good but not perfect because the popularity of words in a given period of time is not taken into account in the searching process. The time also is wasted in reading the whole document. In order to treat these drawbacks and to obtain highly effective retrieval results, a retrieval method based on time series variation using Field Association (FA) terms is suggested. Persons can determine the document field when they find particular words or conceptual units which are named FA terms without the need to read the whole document. In this thesis, we study the effects of the time change on the frequencies of FA terms in a given period of time. Furthermore, a method for automatic evaluation of the Stabilization (SB) classes of FA terms is suggested to improve the precision of Decision Tree (DT). The SB classes point out the popularity of list of FA terms depending on time change. The method is evaluated through conducting experiments (using Python programming language) by simulating the result of 1,245 files which are equivalent to 4.15 MB. The F-measure for Increment, Fairly Steady and Decrement classes achieves %90.4, %99.3 and %38.6, sequentially. Moreover, the problem of the scattering of data among classes is handled using two methods to improve the performance of DT. The two methods are random sampling method and data replication method. from the experimental results, the F-measure is %90.8 for Increment class, %99.5 for Steady class and %68.1 for Decrement class using random sampling method. While the F-measure is %93.6 for Increment class, %99.8 for Steady class and %75.7 for Decrement class using data replication method.