Author: Doaa Saleh Ali/ Title: An intelligent data clustering model for a real application /

Search In this Thesis

العنوان

An intelligent data clustering model for a real application /

الناشر

Doaa Saleh Ali ,

المؤلف

Doaa Saleh Ali

هيئة الاعداد

باحث / Doaa Saleh Ali

مشرف / Mohamed Saleh

مشرف / Mohamed Rasmy

مشرف / Ayman Ghoneim

تاريخ النشر

2017

عدد الصفحات

195 Leaves :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

21/3/2018

مكان الإجازة

جامعة القاهرة - كلية الحاسبات و المعلومات - Operations Research and Decision Support

الفهرس

Only 14 pages are availabe for public view

from

200

from

200

Abstract

Data Clustering, an important unsupervised technique in data mining, aims to identify interesting distributions and patterns in the underlying data. Cluster validity indices are used to evaluate the performance of clustering models. Some recent research used cluster validity indices as the objective functions in multiobjective framework, in order to improve the clustering performance. Therefore, an interesting research question is how to further improve the clustering performance via cluster validity indices. We address this research question by three main contributions. First, using new combinations of cluster validity indices, we introduce two new multiobjective data clustering models for numerical and categorical data. Based on our literature review, we select a combination of cluster validity indices (i.e. objective functions) for the proposed clustering models. Based on the experimental results, the proposed multiobjective data clustering models prove their efficiency in improving the clustering performance. However, when forming a new combination of the cluster validity indices for any given dataset, there are still open research questions regarding what the best cluster validity indices are to use and what the best size for this combination is. The second contribution of the dissertation addresses these questions by proposing a hybrid meta-heuristic clustering (HMHC) methodology for computing the best combination of the cluster validity indices for any used dataset. The HMHC methodology illustrates its ability to compute a different and better-performing combination of indices for each benchmark dataset. Also, for reducing the complexity of the HMHC methodology, we introduce a way to filter the indices in the pool based on the data features of the dataset under consideration. Finally, we also introduce some recommendations for the practitioners in a data clustering field, by doing some additional analyses on the experimental results by using the concepts of Shapely value and mutual information