Search In this Thesis
   Search In this Thesis  
العنوان
An intelligent data clustering model for a real application /
الناشر
Doaa Saleh Ali ,
المؤلف
Doaa Saleh Ali
هيئة الاعداد
باحث / Doaa Saleh Ali
مشرف / Mohamed Saleh
مشرف / Mohamed Rasmy
مشرف / Ayman Ghoneim
تاريخ النشر
2017
عدد الصفحات
195 Leaves :
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
Computer Science (miscellaneous)
تاريخ الإجازة
21/3/2018
مكان الإجازة
جامعة القاهرة - كلية الحاسبات و المعلومات - Operations Research and Decision Support
الفهرس
Only 14 pages are availabe for public view

from 200

from 200

Abstract

Data Clustering, an important unsupervised technique in data mining, aims to identify interesting distributions and patterns in the underlying data. Cluster validity indices are used to evaluate the performance of clustering models. Some recent research used cluster validity indices as the objective functions in multiobjective framework, in order to improve the clustering performance. Therefore, an interesting research question is how to further improve the clustering performance via cluster validity indices. We address this research question by three main contributions. First, using new combinations of cluster validity indices, we introduce two new multiobjective data clustering models for numerical and categorical data. Based on our literature review, we select a combination of cluster validity indices (i.e. objective functions) for the proposed clustering models. Based on the experimental results, the proposed multiobjective data clustering models prove their efficiency in improving the clustering performance. However, when forming a new combination of the cluster validity indices for any given dataset, there are still open research questions regarding what the best cluster validity indices are to use and what the best size for this combination is. The second contribution of the dissertation addresses these questions by proposing a hybrid meta-heuristic clustering (HMHC) methodology for computing the best combination of the cluster validity indices for any used dataset. The HMHC methodology illustrates its ability to compute a different and better-performing combination of indices for each benchmark dataset. Also, for reducing the complexity of the HMHC methodology, we introduce a way to filter the indices in the pool based on the data features of the dataset under consideration. Finally, we also introduce some recommendations for the practitioners in a data clustering field, by doing some additional analyses on the experimental results by using the concepts of Shapely value and mutual information