Author: Ibrahim, Arwa Zakaria Ahmed Fouad,/ Title: Improved Algorithm for Processing Real-Time Sensing Big Data /

Search In this Thesis

العنوان

Improved Algorithm for Processing Real-Time Sensing Big Data /

المؤلف

Ibrahim, Arwa Zakaria Ahmed Fouad,

هيئة الاعداد

مشرف / Wael Abd El-Kader Awad

مشرف / Ibrahim Mohamed Hanafy

مناقش / Khaled Mohamed Hosny

مناقش / Samir El-Desouky El-Mogy

الموضوع

Computer Science.

تاريخ النشر

2022.

عدد الصفحات

91 p. ;

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Multidisciplinary

تاريخ الإجازة

14/2/2022

مكان الإجازة

جامعة بورسعيد - كلية العلوم ببورسعيد - Department of Mathematics and Computer Science.

الفهرس

Only 14 pages are availabe for public view

from

Abstract

In recent years, there has been enormous growth in data from sensors and needs to be analyzed and processed instantaneously in real-time. This has created the term of real-time sensing big data. Several environments like Apache Flink existed to handle this problem. The real-time streaming environment used in this thesis is the Apache Storm. Apache Storm was created to handle the problem of processing big data in real-time. Apache Storm works on a cluster formed of Nimbus node, one or more Supervisor nodes, and the Zookeeper which its main objective is to manage the work between the Nimbus and the Supervisors. Apache Storm is used to analyze data entering specific applications. These applications are topologies in the form of a directed acyclic DAG. These topologies are spouts and bolts, and there are connections between them to manage the entering stream of data using a type of stream groupings.
This thesis focuses on enhancing the performance of scheduling these topologies on the cluster. Apache Storm has a default scheduler that schedules the tasks using a round robin strategy that evenly allocates tasks on nodes of the cluster. Apache Storm can use custom schedulers to enhance its performance. According to this thesis, we hybrid two custom schedulers to get a new scheduler with better performance.
In specify, we aim to obtain a better performance by reducing the complete latency time and maximizing the number of tuples that can be processed in seconds while balancing the cluster nodes’ workload. Our proposed hybrid scheduler is compared to other two schedulers: Workload scheduler and A3 Storm scheduler based on four topologies: SOL, Rolling Count, Word Count and Spike Detection. It was found that the proposed algorithm has better performance as it has reduced the complete latency and maximized the throughput when compared by the two other schedulers.