Novel fuzzy k-nearest neighbor methods for effective classification and regression
Mailagaha Kumbure, Mahinda (2022-11-10)
Väitöskirja
Mailagaha Kumbure, Mahinda
10.11.2022
Lappeenranta-Lahti University of Technology LUT
Acta Universitatis Lappeenrantaensis
School of Business and Management
School of Business and Management, Kauppatieteet
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:978-952-335-869-0
https://urn.fi/URN:ISBN:978-952-335-869-0
Tiivistelmä
With the advancement of technology in many areas, an immense amount of data has currently become available, and discovering patterns and trends from this data is a core subject of interest in machine learning research. Machine learning, a form of artificial intelligence, provides a robust set of algorithms that iteratively learn from the data to understand and analyze data as well as predict future outcomes. The focus of this dissertation is on supervised machine learning techniques—classification and regression. In particular, the emphasis is on the fuzzy k-nearest neighbor (FKNN) algorithm that has received substantial attention in classification problems due to its efficacy and flexibility.
In the context of classification, learning from data can be considered challenging for many algorithms due to uncertainties and inconsistencies in the data. In particular, a typical issue associated with most classification problems is that class distributions in the data are imbalanced—meaning that data points do not equally represent the classes in class variable, which can significantly affect classification performance. Along with class imbalance, it is apparent that a level of class overlapping, class noise, and outliers may also cause the degradation of the classifier’s performance. Given these issues, research has continued to make classification algorithms—particularly the nearest neighbor-based methods—more accurate and more competent. However, this has been a great challenge because the performance and efficiency of learning algorithms are heavily reliant on the correct choice of model features and data that often engages with many issues. In this context, this research seeks to develop solution techniques based on the FKNN algorithm, particularly for class imbalance problems.
The multi-local power mean fuzzy k-nearest neighbor (MLPM-FKNN), which uses class prototype local mean vectors instead of individuals for creating memberships, is the first approach presented in this dissertation. It is demonstrated that the proposed MLPMFKNN classifier achieves better classification results than the classical methods in realworld data sets, often with high k (number of nearest neighbors chosen) values. In addition, the MLPM-FKNN classifier, in cooperation with feature selection, is applied to create a hybrid feature selection model to forecast the intraday return of the S&P index. Further, this work brings a feature selection and prediction (formed by classification) to a nexus wherein the feature selection can produce a significant impact with the help of MLPM-FKNN classification. The second approach proposed is the Bonferroni mean-based fuzzy k-nearest neighbor (BM-FKNN) classifier, which is an extension of the MLPM-FKNN method by the use of the Bonferroni mean instead of the Power mean. The findings with one artificial and six real-world data sets stress the capability and effectiveness of this method in solving class imbalance problems as compared to the original and several other competitive classifiers. The next contribution of this dissertation is a novel regression approach called the Minkowski distance-based fuzzy k-nearest neighbor regression (Md-FKNNreg) method. This is motivated by the fact that no one has investigated the ability of the FKNN method in regression settings, although it has gained broader attention in the classification context. Moreover, the principal advantage of this algorithm is that it attributes importance to the nearest neighbors using fuzzy weights considering their distances to the test instance and hence makes a more accurate prediction across a weighted average. Experimental results using real-world data show that the Md- FKNNreg outperformed the benchmark models and thus highlight its potential in terms of linear and non-linear regression problems.
In the context of classification, learning from data can be considered challenging for many algorithms due to uncertainties and inconsistencies in the data. In particular, a typical issue associated with most classification problems is that class distributions in the data are imbalanced—meaning that data points do not equally represent the classes in class variable, which can significantly affect classification performance. Along with class imbalance, it is apparent that a level of class overlapping, class noise, and outliers may also cause the degradation of the classifier’s performance. Given these issues, research has continued to make classification algorithms—particularly the nearest neighbor-based methods—more accurate and more competent. However, this has been a great challenge because the performance and efficiency of learning algorithms are heavily reliant on the correct choice of model features and data that often engages with many issues. In this context, this research seeks to develop solution techniques based on the FKNN algorithm, particularly for class imbalance problems.
The multi-local power mean fuzzy k-nearest neighbor (MLPM-FKNN), which uses class prototype local mean vectors instead of individuals for creating memberships, is the first approach presented in this dissertation. It is demonstrated that the proposed MLPMFKNN classifier achieves better classification results than the classical methods in realworld data sets, often with high k (number of nearest neighbors chosen) values. In addition, the MLPM-FKNN classifier, in cooperation with feature selection, is applied to create a hybrid feature selection model to forecast the intraday return of the S&P index. Further, this work brings a feature selection and prediction (formed by classification) to a nexus wherein the feature selection can produce a significant impact with the help of MLPM-FKNN classification. The second approach proposed is the Bonferroni mean-based fuzzy k-nearest neighbor (BM-FKNN) classifier, which is an extension of the MLPM-FKNN method by the use of the Bonferroni mean instead of the Power mean. The findings with one artificial and six real-world data sets stress the capability and effectiveness of this method in solving class imbalance problems as compared to the original and several other competitive classifiers. The next contribution of this dissertation is a novel regression approach called the Minkowski distance-based fuzzy k-nearest neighbor regression (Md-FKNNreg) method. This is motivated by the fact that no one has investigated the ability of the FKNN method in regression settings, although it has gained broader attention in the classification context. Moreover, the principal advantage of this algorithm is that it attributes importance to the nearest neighbors using fuzzy weights considering their distances to the test instance and hence makes a more accurate prediction across a weighted average. Experimental results using real-world data show that the Md- FKNNreg outperformed the benchmark models and thus highlight its potential in terms of linear and non-linear regression problems.
Kokoelmat
- Väitöskirjat [1093]