Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model

Published

ECAI (2024)

Authors

Taehee Kim^a, Yeongjae Cho^b, Heejun Shin^a, Yohan Jo^b and Dongmyung Shin^a

Affiliations

^aRadisen Co. Ltd.
^bSeoul National University

Abstract

Visual question answering (VQA) is a task where an image is given, and a series of questions are asked about the image. To build an efficient VQA algorithm, a large amount of QA data is required which is very expensive. Generating synthetic QA pairs based on templates is a practical way to obtain data. However, VQA models trained on those data do not perform well on complex, human-written questions. To address this issue, we propose a new method called chain of QA for human-written questions (CoQAH). CoQAH utilizes a sequence of QA interactions between a large language model and a VQA model trained on synthetic data to reason and derive logical answers for human-written questions. We tested the effectiveness of CoQAH on two types of human-written VQA datasets for 3D-rendered and chest X-ray images and found that it achieved state-of-the-art accuracy in both types of data. Notably, CoQAH outperformed general vision-language models, VQA models, and medical foundation models with no finetuning.

Link to Publication

RadiSen Innovates Global Healthcare

General – contact@radisentech.com
Sales – sales@radisentech.com
Customer Service – cs@radisentech.com

Investment – invest@radisentech.com

Offices

Mapo-gu, Seoul, South Korea (Headquarters)
Anyang-si, Gyeonggi-do, South Korea (Factory)
Nangang-gu, Taipei, Taiwan (Research Center)

318, H Square, 81 Yanghwa-ro, Mapo-gu, Seoul, Korea

1. Purpose of collecting and using personal information

Sub-One Co. Ltd. collects personal information according to items of personal information collected to receive customer counseling and process inquiries.

2. Items of personal information collected
– Required: Email
– Selections: Company name, mobile phone number

3. Retention and period of use of personal information
Sub-One Co. Ltd. shall be destroyed after storing the information for 3 years after the purpose of personal information collection and use has been achieved.

Provided, That if it is necessary to preserve it under the provisions of the relevant statutes, member information shall be kept for a certain period of time prescribed by the relevant statutes.