Hyppää sisältöön
    • Suomeksi
    • På svenska
    • In English
  • Suomeksi
  • In English
  • Kirjaudu
Näytä aineisto 
  •   Etusivu
  • LUTPub
  • Diplomityöt ja Pro gradu -tutkielmat
  • Näytä aineisto
  •   Etusivu
  • LUTPub
  • Diplomityöt ja Pro gradu -tutkielmat
  • Näytä aineisto
JavaScript is disabled for your browser. Some features of this site may not work without it.

Evaluating the accuracy of large language models for text summarization in Finnish

Ataei, Maryam (2025)

Katso/Avaa
mastersthesis_ataei_maryam.pdf (573.4Kb)
Lataukset: 


Diplomityö

Ataei, Maryam
2025

School of Engineering Science, Laskennallinen tekniikka

Kaikki oikeudet pidätetään.
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2025052251458

Tiivistelmä

This thesis investigates the performance of large language models in Finnish text summarization by comparing three models: Poro, DeepSeek, and OpenAI o3mini. Given the high degree of morphological inflection and the relatively free word order characteristic of Finnish, conventional evaluation metrics often fail to capture content equivalence. To address this, the study adopts chrf, bertscore, and cosine similarity metrics demonstrated to be more reliable for evaluating lexical and semantic similarity in low-resource, morphologically complex languages.

The evaluation is conducted on a curated set of Finnish academic articles. Each model-generated summary is compared against the corresponding human-written abstract. Furthermore, the impact of prompt engineering is examined by testing three prompt variations, ranging from generic to domain-specific, including expert-role conditioning. Empirical results show that DeepSeek consistently achieves the highest performance. OpenAI o3mini performs moderately well, while Poro underperforms across all metrics. The structured prompt incorporating expert role instructions yields the best results across all models, highlighting the importance of prompt formulation in low-resource contexts.

These findings suggest that multilingual large language models, when guided with carefully designed prompts, can outperform smaller domain-specific models in Finnish summarization tasks. The study underscores the value of prompt engineering as a cost-effective strategy to enhance the performance of general-purpose models without additional fine-tuning.
Kokoelmat
  • Diplomityöt ja Pro gradu -tutkielmat [14083]
LUT-yliopisto
PL 20
53851 Lappeenranta
Ota yhteyttä | Tietosuoja | Saavutettavuusseloste
 

 

Tämä kokoelma

JulkaisuajatTekijätNimekkeetKoulutusohjelmaAvainsanatSyöttöajatYhteisöt ja kokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
LUT-yliopisto
PL 20
53851 Lappeenranta
Ota yhteyttä | Tietosuoja | Saavutettavuusseloste