الفهرس | Only 14 pages are availabe for public view |
Abstract In Big Data era, all data about our lives is captured, stored, processed and used to change the world around us. This data is generated by different sources such as Web, IoT sensors, application server logs, social media, traffic surveillance, and mobile data. A large amount of high speed data presents new challenges that the traditional database system cannot resolve. Hadoop provides reliability, high availability and processes thousands of terabytes of data on thousands of the nodes. There are many ways to store and process large amounts of data. Hadoop is widely used and is one of the most popular platforms for storing large amounts of data and performing parallel processing. When storing sensitive data, security plays an important role in ensuring its safety. When Hadoop was originally designed, not too much security was considered. The original purpose of Hadoop was to manage large amounts of public web data, so the confidentiality of stored data is not a problem. Initially, users and services in Hadoop were not authenticated; Hadoop was designed to run code on a distributed cluster of machines, so without proper authentication, anyone can submit code and run it. Different frameworks have been launched to improve Hadoop security. With the daily increase of data production and collection, Hadoop is a platform for processing big data on a distributed system. A master node globally manages running jobs, whereas worker nodes process partitions of the data locally. Hadoop uses MapReduce as an effective computing model. However, Hadoop experiences a high level of security vulnerability over hybrid and public clouds. Specially, several workers can fake results without actually processing their portions of the data. Several redundancy-based approaches have been proposed to counteract this risk. A replication mechanism is used to duplicate all or some of the Abstract II tasks over multiple workers (nodes).A drawback of such approaches are that they generate a high overhead over the cluster. Additionally, malicious workers can behave well for a long period of time and attack later. This thesis presents a novel model to enhance the security of the cloud environment against untrusted workers. A new component called malicious workers trap (MWT) is developed to run on the master node to detect malicious (non-collusive and collusive) workers as they convert and attack the system. An implementation to test the proposed model and to analyze the performance of the system show that the proposed model can successfully detect malicious workers with minor processing overhead compared to vanilla MapReduce and Verifiable MapReduce (V-MR) model. In addition, MWT maintains a balance between security and usability of the Hadoop cluster. |