Time series classification
Omerovic, Ajdin (2022)
Diplomityö
Omerovic, Ajdin
2022
School of Engineering Science, Kemiantekniikka
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2022060141915
https://urn.fi/URN:NBN:fi-fe2022060141915
Tiivistelmä
Time series classification is a machine learning subfield with numerous real-world applications. Standard machine learning algorithms are typically not well suited to work on raw time series as the input data is, in general, temporally structured. Dealing with air pollution, especially with PM 2.5 as the main air pollutant, is a major environmental challenge in urban areas nowadays. It is possible to create software or a tool whose results can be used for analysing and predicting PM 2.5 concentration, as well as analysing the relation between PM 2.5 concentration and other important meteorological factors. Usage of Machine Learning techniques for analysis and classification has considerably improved the process of analyzing the pollution data and air quality as well. Existing researches have been focused on using different ML tools and algorithms, and, different approaches for classification.
The importance and usefulness of ML algorithms to classify air pollution / PM 2.5 concentration using real-time data have been emphasized throughout the whole work. In this paper, the PM 2.5 pollution classification is performed using the two most promising classification algorithms and a comparative analysis between them was made.
This work also contains a review of existing literature and previous works that served as a motivation to write this work and a brief theoretical explanation of the performed work.
Dataset examined in the work was obtained from the official website of the Indian government and it contains 48,192 samples of PM 2.5 concentration measured every one hour in the time span of five years. The data is grouped into daily time series, where each time series consists of 24 samples. The original dataset was further cleaned, as there were spotted some outliers, hence, the cleaned dataset was reduced to 43392 samples, or 1808 time series. The optimal number of classes was determined by clustering the cleaned dataset, after which the classification was performed.
ML algorithms used in the work are Support Vector Machine, as one of the best ML classifying algorithms, and Feedforward Neural Network as one of the most intriguing new options to be applied as this algorithm has not been very often the subject of study in existing studies.
The obtained results showed that Support Vector Machine outperformed Feedforward Neural Network, with an outstanding accuracy score of 99.5%. Additionally, the SVM model kept the same accuracy level for different values of tuning parameters, proving its reliability.
The importance and usefulness of ML algorithms to classify air pollution / PM 2.5 concentration using real-time data have been emphasized throughout the whole work. In this paper, the PM 2.5 pollution classification is performed using the two most promising classification algorithms and a comparative analysis between them was made.
This work also contains a review of existing literature and previous works that served as a motivation to write this work and a brief theoretical explanation of the performed work.
Dataset examined in the work was obtained from the official website of the Indian government and it contains 48,192 samples of PM 2.5 concentration measured every one hour in the time span of five years. The data is grouped into daily time series, where each time series consists of 24 samples. The original dataset was further cleaned, as there were spotted some outliers, hence, the cleaned dataset was reduced to 43392 samples, or 1808 time series. The optimal number of classes was determined by clustering the cleaned dataset, after which the classification was performed.
ML algorithms used in the work are Support Vector Machine, as one of the best ML classifying algorithms, and Feedforward Neural Network as one of the most intriguing new options to be applied as this algorithm has not been very often the subject of study in existing studies.
The obtained results showed that Support Vector Machine outperformed Feedforward Neural Network, with an outstanding accuracy score of 99.5%. Additionally, the SVM model kept the same accuracy level for different values of tuning parameters, proving its reliability.