Author: ُُElziky, Mayada Atef Ibrahim./ Title: Improvinf The Index-Based Techniqes for Duplicate Record Detection /

Search In this Thesis

العنوان

Improvinf The Index-Based Techniqes for Duplicate Record Detection /

المؤلف

ُُElziky, Mayada Atef Ibrahim.

هيئة الاعداد

باحث / مياده عاطف ابراهيم زكى

مشرف / امانى محمود سرحان

مشرف / شيرين مصطفى الجوخى

مشرف / احمد حسن يوسف

مشرف / لا يوجد

الموضوع

Computer and Control Engineering.

تاريخ النشر

2018.

عدد الصفحات

p 83. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Computational Mechanics

تاريخ الإجازة

13/6/2018

مكان الإجازة

جامعة طنطا - كلية الهندسه - هندسه الحاسبات والتحكم الالى

الفهرس

Only 14 pages are availabe for public view

from

108

from

108

Abstract

With the aim of reducing duplicate records in databases, Duplicate Record Detection (DRD) ensures the integrity of data. In same or different databases, DRD identifies records signifying same entities. A diversity of indexing techniques has been proposed to support DRD. Q-gram is one of the common techniques used to index databases. This thesis introduces a modification to the Q-gram indexing technique. Such modification participates in improving the performance of the duplicate detection process and in reducing the time and number of comparisons. In the proposed work, in order to make the back-end computations easier, Q-gram strings are alternatively converted into numeric values using their corresponding ASCII code. Based on these numeric values, the indexing will decrease the complexity of Q-gram comparisons and speed up the DRD process as a whole. Unlike the existing approaches, the proposed technique is easier in implementation and requires less memory space. Two other variations of the proposed technique are introduced; the first uses a range for matching and the second sort words alphabetically inside same blocks to speed up the matching process. According to experimental results, the three proposed techniques perform much faster(20 - 25%)and are almost as accurate as the current Q-gram technique, meaning that they can be used in large-sized databases DRD.