Hyppää sisältöön
    • Suomeksi
    • På svenska
    • In English
  • Suomeksi
  • In English
  • Kirjaudu
Näytä aineisto 
  •   Etusivu
  • LUTPub
  • Diplomityöt ja Pro gradu -tutkielmat
  • Näytä aineisto
  •   Etusivu
  • LUTPub
  • Diplomityöt ja Pro gradu -tutkielmat
  • Näytä aineisto
JavaScript is disabled for your browser. Some features of this site may not work without it.

Quality of dataset for data-driven software vulnerability detection

Sah, Dharmendra (2024)

Katso/Avaa
Master_thesis__Sah_Dharmendra.pdf (1.567Mb)
Lataukset: 


Diplomityö

Sah, Dharmendra
2024

School of Engineering Science, Tietotekniikka

Kaikki oikeudet pidätetään.
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe202501021060

Tiivistelmä

Given that modern complex software is getting more and more intricate, the likelihood of having glitches in security mechanisms also rises, and this requires the use of automated methods for identification. To identify the vulnerable Python codes, this research work proposes and assesses a deep learning model with word2vec and LSTM called VUDENC. This paper solves the issues of limited data availability, noisy labels, and dataset imbalance by filtering the samples systematically and removing noise from labels. The rationale for the methodology followed a structured literature review, stressing that high-quality labeled datasets are an influential determinant of machine learning-based software vulnerability detection. The actual data was mined and preprocessed by pre-fetching Python repositories from GitHub to obtain a large dataset for the model training and evaluation. Critical threats which are SQL Injection and cross-site scripting were identified in the VUDENC model to improve automated integration and scalability into SDLC. Key findings show how effectively the model performs for conducting accuracy as well as precision for code vulnerability identification. The integration of color-coded output into the actual building of the model proved to greatly improve this aspect of usability for developers, to make a rather large improvement in the manner in which vulnerabilities could be found and fixed. This work further gives a boost to automated vulnerability detection by incorporating machine learning techniques into practical, real-world applications. The results show great promise in integrating such tools into the CI/CD pipeline as a means of proactive security. Future work includes increasing dataset variety across more programming languages and exploring advanced methods to refine preprocessing techniques to make models more robust and generalizable within a broader set of coding environments.
Kokoelmat
  • Diplomityöt ja Pro gradu -tutkielmat [15256]
LUT-yliopisto
PL 20
53851 Lappeenranta
Ota yhteyttä | Tietosuoja | Saavutettavuusseloste
 

 

Tämä kokoelma

JulkaisuajatTekijätNimekkeetKoulutusohjelmaAvainsanatSyöttöajatYhteisöt ja kokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
LUT-yliopisto
PL 20
53851 Lappeenranta
Ota yhteyttä | Tietosuoja | Saavutettavuusseloste