Search In this Thesis
   Search In this Thesis  
العنوان
Using deep nets for visual question answering /
المؤلف
Samar Ibrahem Youssef Ghareeb Zekrallah,
هيئة الاعداد
باحث / Samar Ibrahem Youssef Ghareeb Zekrallah
مشرف / Aboul-Ella Otifey Hassanien
مشرف / Nour Eldeen Mahmoud Khalifa
مناقش / Aboul-Ella Otifey Hassanien
مناقش / Hesham Nabih ElMahdy
الموضوع
Information Technology
تاريخ النشر
2022.
عدد الصفحات
86 Leaves. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Science (miscellaneous)
تاريخ الإجازة
11/7/2022
مكان الإجازة
جامعة القاهرة - كلية الحاسبات و المعلومات - Information Technology
الفهرس
Only 14 pages are availabe for public view

from 106

from 106

Abstract

VQA is a challenging research area where a model must be able to understand image
semantics along with the asked question in order to infer the correct answer. The ability of
a VQA model of generalization to new questions about new images that have not seen
before in the training stage is called zero shot capability and also there is a need for good
evaluation metrics to compensate for dataset bias. In this thesis, TDIUC dataset is
redistributed for this purpose to test this capability and apply good evaluation metrics.
Also, Using transformer models for vqa task takes long training time, substituting selfattention layers by FNet sublayers shows improvement to training speed by 24% and
testing speed by 12.7% with a limited accuracy cost by 5.61%.