Improving AI-based ad creative checker with prompt logic and Semantic Kernel
Rais, Zeeshan (2025)
Diplomityö
Rais, Zeeshan
2025
School of Engineering Science, Tietotekniikka
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe20251118109136
https://urn.fi/URN:NBN:fi-fe20251118109136
Tiivistelmä
This thesis studied how ad creative validation could be improved by changing the pipeline. The work was done with an Azure Function setup, and two versions of the system were tested. The older version used a REST client, while the new one included Semantic Kernel, RAG, and fallback rules.
The evaluation used two datasets processed by both variants: 160 creatives and 463 creatives. Accuracy improved from 86.2% to 92.6% on the 160 set and from 85.8% to 90.1% on the 463 set when RAG was enabled. This confirms a consistent gain with RAG across both sizes. Overall errors fell with RAG: false positives decreased and false negatives decreased, while accuracy reached 92.6% (160) and 90.1% (463). A few borderline cases remained due to retrieval near-misses, but the net effect was positive for this dataset.
RAG was one of the key parts. In many cases it helped by giving extra context and improving advertiser and CTA detection. But when the reference data was not reliable, it caused errors and increased false negatives. The dataset size and imbalance between common and rare categories also influenced the results, so the numbers cannot yet be seen as final.
The study shows that AI-based validation can be improved, but the improvements also bring new risks. Every change in the pipeline affects the balance differently. Future work should test with larger and more balanced datasets, adjust the precision–recall balance depending on the use case, and add explainability so decisions are easier to trust.
The evaluation used two datasets processed by both variants: 160 creatives and 463 creatives. Accuracy improved from 86.2% to 92.6% on the 160 set and from 85.8% to 90.1% on the 463 set when RAG was enabled. This confirms a consistent gain with RAG across both sizes. Overall errors fell with RAG: false positives decreased and false negatives decreased, while accuracy reached 92.6% (160) and 90.1% (463). A few borderline cases remained due to retrieval near-misses, but the net effect was positive for this dataset.
RAG was one of the key parts. In many cases it helped by giving extra context and improving advertiser and CTA detection. But when the reference data was not reliable, it caused errors and increased false negatives. The dataset size and imbalance between common and rare categories also influenced the results, so the numbers cannot yet be seen as final.
The study shows that AI-based validation can be improved, but the improvements also bring new risks. Every change in the pipeline affects the balance differently. Future work should test with larger and more balanced datasets, adjust the precision–recall balance depending on the use case, and add explainability so decisions are easier to trust.
