Search In this Thesis
   Search In this Thesis  
العنوان
Enhancement of Decision Tree for Data
Stream Mining /
المؤلف
lefa, Mai Ebrahim Mohamed.
هيئة الاعداد
باحث / مي إبراهيم محمد ليفة
مشرف / حاتم محمد عبد القادر
مناقش / راشد خليل سالم
مناقش / حاتم محمد عبد القادر
الموضوع
Data mining. Data mining- Mathematical models. Streaming technology – Telecommunications.
تاريخ النشر
2023
عدد الصفحات
920p. :
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
Information Systems
تاريخ الإجازة
18/5/2023
مكان الإجازة
جامعة المنوفية - كلية الحاسبات والمعلومات - قسم نظم الحاسب
الفهرس
Only 14 pages are availabe for public view

from 92

from 92

Abstract

Traditional machine learning (ML) algorithms model knowledge using static
datasets. Nowadays, there is an increasing demand for machine learning based
solutions that can handle very huge amounts of data in the shape of continuous
streams. The Very Fast Decision Tree (VFDT) is one of the most widely utilized data
stream mining algorithms (DSM), despite the fact that it wastes a huge amount of
energy on trivial calculations. The machine learning community has come first in
terms of accuracy and execution time when designing algorithms of this nature.
Energy usage is considered a crucial factor in assessing data mining algorithms
through various types of studies.
In this thesis, two new techniques are proposed to optimize the VFDT algorithm,
which reduces the waste of energy while maintaining accuracy. In the first proposed
method, certain fixed algorithm parameters were changed to dynamic parameters after
analyzing each one separately and understanding the extent of their positive impact on
reducing energy consumption in various cases within the algorithm. The second
approach is based on determining the functions that are considered one of the most
energy-consuming functions in the algorithm.
In the first proposed method, the practical experiment was conducted on both the
algorithm in its basic form and the algorithm in the proposed form. Experiment was
conducted on several different types of datasets in the same application environment.
The main advantage of the results of the proposed method compared to the results of
the basic algorithm is that there was a significant improvement in the performance of
the algorithm in terms of reducing its energy consumption and maintaining its
accuracy levels especially in large datasets which have no noise. In the second
approach, experiments were conducted on real-world benchmark and synthetic
datasets to compare the proposed method to state-of-the-art algorithms in previous
works. The proposed algorithm works considerably better and faster while using less energy and maintaining accuracy especially in the datasets with large number
instances and attributes.
Keywords:- Big Data; Data stream mining; classification; Very fast decision tree
algorithm; Hoeffding bound; Energy consumption; Massive online analysis.