Similarity based classification methods with different aggregation operators
Kurama, Onesfole (2017-12-14)
Väitöskirja
Kurama, Onesfole
14.12.2017
Lappeenranta University of Technology
Acta Universitatis Lappeenrantaensis
Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:978-952-335-164-6
https://urn.fi/URN:ISBN:978-952-335-164-6
Tiivistelmä
Modern digitalization has created a need for efficient classification methods that can deal with both large and small data sets. Classification is seen as the central part of many pattern recognition and machine vision systems used in many applications. In medical diagnosis and epidemiological related researches, classification is a crucial aspect. For example diseases must be recognized and diagnosed before treatment can start. In such systems, classification accuracy of the classifiers is essential since even small improvements in accuracy can save human lives. Classifiers are useful in quality control models where faulty products must be recognized and removed from the ensemble line. Misclassifications in this area may cause heavy losses to these industries. Currently, classifiers are also useful in making on-line classification decisions which normally requires results in a short time interval.
The main objective of this thesis is to examine several different aggregation operators in the similarity classifier and how they affect classification accuracy. The suitability of different aggregation methods is examined with several real world data sets. Particular tasks include designing new algorithms, generalizing existing ones and implementing several new methods on real world tasks. In some classification tasks, we are faced with a problem of a small attribute space where there are less measurements yet decisions are required. In other cases there are large data sets with many attributes which may cause several challenges during classification. In large data sets, some of the attributes may be irrelevant and need to be neglected or even removed, thus finding relevant attributes is crucial. There are many other problems related to large data sets like computational time may increase and tasks may require relatively larger memory for storage and implementation. Reduction in attribute space is advantageous since it can reduce computational time of an algorithm, and also less memory may be required.
In practical problems with real world data sets, data analysis is always burdened with uncertainty, vagueness and imprecision. This calls for classification methods that can handle data sets with those problems. In this research, classifiers from fuzzy set theory are examined further and emphasized.
Data sets applied in this thesis were taken from UCI machine learning data repository where they are freely provided for research purposes. The main tool used in developing new methods is MATLABTM with all implementation codes done with this software. The final output of this thesis are new similarity classifiers applying generalized aggregation processes. These include, the similarity classifier with the generalized ordered weighted averaging (GOWA) operator, one with the weighted ordered weighted averaging (WOWA) operator, one with the Bonferroni mean variants, and the similarity classifier with an n-ary l-averaging operator.
Results from these new classifiers have shown improvements in accuracy compared to existing similarity classifiers for some applied data sets. However, it was observed that there is no single method that is suitable for all classification tasks. Each method has its particular strength over other methods when applied on data sets.
The main objective of this thesis is to examine several different aggregation operators in the similarity classifier and how they affect classification accuracy. The suitability of different aggregation methods is examined with several real world data sets. Particular tasks include designing new algorithms, generalizing existing ones and implementing several new methods on real world tasks. In some classification tasks, we are faced with a problem of a small attribute space where there are less measurements yet decisions are required. In other cases there are large data sets with many attributes which may cause several challenges during classification. In large data sets, some of the attributes may be irrelevant and need to be neglected or even removed, thus finding relevant attributes is crucial. There are many other problems related to large data sets like computational time may increase and tasks may require relatively larger memory for storage and implementation. Reduction in attribute space is advantageous since it can reduce computational time of an algorithm, and also less memory may be required.
In practical problems with real world data sets, data analysis is always burdened with uncertainty, vagueness and imprecision. This calls for classification methods that can handle data sets with those problems. In this research, classifiers from fuzzy set theory are examined further and emphasized.
Data sets applied in this thesis were taken from UCI machine learning data repository where they are freely provided for research purposes. The main tool used in developing new methods is MATLABTM with all implementation codes done with this software. The final output of this thesis are new similarity classifiers applying generalized aggregation processes. These include, the similarity classifier with the generalized ordered weighted averaging (GOWA) operator, one with the weighted ordered weighted averaging (WOWA) operator, one with the Bonferroni mean variants, and the similarity classifier with an n-ary l-averaging operator.
Results from these new classifiers have shown improvements in accuracy compared to existing similarity classifiers for some applied data sets. However, it was observed that there is no single method that is suitable for all classification tasks. Each method has its particular strength over other methods when applied on data sets.
Kokoelmat
- Väitöskirjat [1064]