Search In this Thesis
   Search In this Thesis  
العنوان
IncReStore :
المؤلف
Khalifa, Ahmed El-Morsy Saad.
هيئة الاعداد
باحث / أحمد المرسي سعد المرسي خليفة
ahmed.khalifa5@alex-eng.edu.eg
مشرف / نجوى مصطفى المكى
nagwamakky@gmail.com
مشرف / ايمان غندور
ielghand@yahoo.com
مناقش / محمد سعيد حلمى ابوجبل
مناقش / امانى انور احمد
الموضوع
Computer Engineering.
تاريخ النشر
2017.
عدد الصفحات
76 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
هندسة النظم والتحكم
تاريخ الإجازة
1/1/2017
مكان الإجازة
جامعة الاسكندريه - كلية الهندسة - حاسب ونظم
الفهرس
Only 14 pages are availabe for public view

from 94

from 94

Abstract

Many applications in various industrial and research areas analyze large amounts of data. Big data analytics platforms such as MapReduce focus on distributed batch processing. High level query languages such as Pig Latin are usually used by users of MapReduce to easily express their complex queries. The query engines of these languages translate input queries into workflows of MapReduce jobs. Each job in a workflow of MapReduce jobs generates intermediate results to be consumed as input by other jobs in the workflow. Storing these intermediate results and using them to answer parts of future queries can reduce the execution time of these queries. However, storing these intermediate results come with two challenges. The first challenge is choosing the intermediate results that we can exploit for rewriting future queries and materializing them. The second challenge is the maintenance of the materialized outputs when the data that was processed to generate them evolve. In this thesis, we present IncReStore, a system that materializes the outputs of parts of the queries that it executes and reuses these outputs to answer full or parts of the queries that are submitted to it in the future. IncReStore selects sub-queries of the input query and materializes their outputs using a monetary cost-based approach. Moreover, IncReStore maintains the materialized outputs. Thus, IncReStore is capable of computing queries on fast growing datasets by materializing query outputs and maintaining them incrementally. Incremental update of materialized outputs can be done either lazily when materialized query outputs are used, which is called opportunistic IncReStore, or eagerly when data evolve, which is called active IncReStore. We have implemented IncReStore as an extension to Pig and Hadoop. Our experimental evaluation of IncReStore using the TPC-H benchmark shows the effectiveness of the proposed approaches.