The question-and-answer system based on DPR system and LLAVA
Chen, Zisong (2024)
Kandidaatintyö
Chen, Zisong
2024
School of Engineering Science, Tietotekniikka
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2024052032978
https://urn.fi/URN:NBN:fi-fe2024052032978
Tiivistelmä
This paper explores the integration of dense channel retrieval (DPR) and large language and visual assistant (LLaVA) models, aiming to enhance the capabilities of multimodal information retrieval systems. As digital content becomes increasingly diverse, complex user queries involving multimodal data such as images and text are beyond the processing scope of traditional text-based search engines. The DPR model is known for its deep semantic text retrieval capabilities, while the LLaVA model can convert visual content into rich text descriptions. This study examines the potential of these two models to cooperate to improve the accuracy and efficiency of information retrieval.
The main content of the article is based on theoretical analysis of existing research. This article outlines the advantages of combining DPR and LLaVA to handle multi-modal queries. The article also discusses different strategies in training DPR models and LLaVA models, such as random, BM25 and gold standard methods, to improve the discriminative power of the system.
Although this study does not provide new experimental data, a comprehensive review of the current literature predicts the effects of integrating these technologies. A simple similar system has also been implemented. After reasoning, this integration is not only expected to improve the performance of retrieving relevant information across different data types, but also provides a research reference for future experimental studies to verify these theories, which will help develop advanced systems that can cope with the complexity of modern information needs.
The main content of the article is based on theoretical analysis of existing research. This article outlines the advantages of combining DPR and LLaVA to handle multi-modal queries. The article also discusses different strategies in training DPR models and LLaVA models, such as random, BM25 and gold standard methods, to improve the discriminative power of the system.
Although this study does not provide new experimental data, a comprehensive review of the current literature predicts the effects of integrating these technologies. A simple similar system has also been implemented. After reasoning, this integration is not only expected to improve the performance of retrieving relevant information across different data types, but also provides a research reference for future experimental studies to verify these theories, which will help develop advanced systems that can cope with the complexity of modern information needs.
