Search In this Thesis
   Search In this Thesis  
العنوان
Text Plagiarism Detection System using Machine Learning Techniques /
المؤلف
Awad, Ramy Gamal Mohamed.
هيئة الاعداد
باحث / رامى جمال محمد عوض
مشرف / نوال أحمد الفيشاوى
مناقش / محمد نور السيد أحمد
مناقش / محمد عبده بربار
الموضوع
Application software. Artificial intelligence.
تاريخ النشر
2022.
عدد الصفحات
108 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
علوم الحاسب الآلي
تاريخ الإجازة
1/3/2023
مكان الإجازة
جامعة المنوفية - كلية الحاسبات والمعلومات - قسم هندسة علوم الحاسب
الفهرس
Only 14 pages are availabe for public view

from 131

from 131

Abstract

Text plagiarism has greatly spread in the recent years, it becomes a common phenomenon in several fields such as research manuscripts, textbooks, patents, academic circles, etc. This is due to the convenience of access to a massive number of scholarly and educational papers available on the internet. Although many types of research have been developed to address this phenomenon. However, detecting lexical, syntactic, and semantic text plagiarism remains to be a challenge. There are many sentence similarity features were used to detect plagiarism, but each of them is not discriminative to differentiate the similarity cases. This causes the discovery of all text plagiarism types to be a challenging problem. This work aims to develop reliability plagiarism detection system with perfect performance; this is done through two suggested approaches.
First, a plagiarism detection system is proposed to extract the most effective sentence similarity features and construct hyperplane equation of the selected features to distinguish the similarity cases with the highest accuracy. It consists of three phases; the first phase is used to preprocess the documents. The second phase is depended on two paths, the first path is based on traditional paragraph level comparison, and the second path is based on the computed hyperplane equation using Support Vector Machine (SVM) and Chi-square techniques. The third phase is used to extract the best plagiarized segment.
Second, all the features that reflect the different types of text similarities are computed and recorded in a new database called Text Similarity Feature (TSF). The created database is proposed for intelligent learning to solve text plagiarism detection problems. Using the created database, a reliable plagiarism detection system is also proposed, which depends on intelligent deep learning. Different approaches to deep learning, such as convolution and recurrent neural network architectures, were considered during the construction of this system.
In comparison with up-to-date ranking systems, the achieved results demonstrate the validity and effectiveness of the proposed system that based on long short-term memory (LSTM) for reliability evaluation and improvement of the plagiarism detection system.