Hyppää sisältöön
    • Suomeksi
    • På svenska
    • In English
  • Suomeksi
  • In English
  • Kirjaudu
Näytä aineisto 
  •   Etusivu
  • LUTPub
  • Diplomityöt ja Pro gradu -tutkielmat
  • Näytä aineisto
  •   Etusivu
  • LUTPub
  • Diplomityöt ja Pro gradu -tutkielmat
  • Näytä aineisto
JavaScript is disabled for your browser. Some features of this site may not work without it.

Evaluating enterprise product recommendation chatbot using LLM : the case of easy selection

Chakma, Kanak (2025)

Katso/Avaa
Mastersthesis_Chakma_Kanak.pdf (572.7Kb)
Lataukset: 


Diplomityö

Chakma, Kanak
2025

School of Engineering Science, Tietotekniikka

Kaikki oikeudet pidätetään.
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe20251215119533

Tiivistelmä

In recent years, the application of Large Language Models has been significantly growing, particularly in evaluating open-domain chatbots. However, there has been limited exploration when it comes to evaluating task-oriented chatbots, especially across different cognitive levels. Cognitive levels refer to a concept of progressively complex forms of reasoning processes. The study aimed to investigate how LLMs (GPT-4) perform as evaluators of task oriented chatbots across three cognitive levels, i.e. remember, understand and evaluate. As a case study, a product-suggestion chatbot called Easy Selection was used. In short, effectiveness and coherence were evaluated by leveraging GPT-4, with coherence measured across the three cognitive levels. Agreement between GPT-4 and human raters was quantified using Cohen’s kappa, and score relationships were analysed using Spearman’s rho. Results indicate that the GPT-4 aligned well with human evaluations at lower cognitive levels. However, agreement and correlation decrease as cognitive complexity increases, indicating limitations in evaluating higher level reasoning. In conclusion, the results show that LLMs can be effective for simpler cognitive levels, but they struggle as the complexity of reasoning increases. Further research should be conducted with a larger sample size, improved chatbot models, or different prompting techniques to identify factors that can improve the effectiveness of LLMs in evaluating task oriented chatbots across complex cognitive processes.
Kokoelmat
  • Diplomityöt ja Pro gradu -tutkielmat [15267]
LUT-yliopisto
PL 20
53851 Lappeenranta
Ota yhteyttä | Tietosuoja | Saavutettavuusseloste
 

 

Tämä kokoelma

JulkaisuajatTekijätNimekkeetKoulutusohjelmaAvainsanatSyöttöajatYhteisöt ja kokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
LUT-yliopisto
PL 20
53851 Lappeenranta
Ota yhteyttä | Tietosuoja | Saavutettavuusseloste