## Differential evolution based classification with pool of distances and aggregation operators

##### Koloseni, David (2015-05-15)

Väitöskirja

Koloseni, David

15.05.2015

Lappeenranta University of Technology

Acta Universitatis Lappeenrantaensis

**Julkaisun pysyvä osoite on**

http://urn.fi/URN:ISBN:978-952-265-783-1

#### Tiivistelmä

The objective of this thesis is to develop and generalize further the differential evolution based data classification method. For many years, evolutionary algorithms have been successfully applied to many classification tasks. Evolution algorithms are population based, stochastic search algorithms that mimic natural selection and genetics. Differential evolution is an evolutionary algorithm that has gained popularity because of its simplicity and good observed performance. In this thesis a differential evolution classifier with pool of distances is proposed, demonstrated and initially evaluated.

The differential evolution classifier is a nearest prototype vector based classifier that applies a global optimization algorithm, differential evolution, to determine the optimal values for all free parameters of the classifier model during the training phase of the classifier.

The differential evolution classifier applies the individually optimized distance measure for each new data set to be classified is generalized to cover a pool of distances. Instead of optimizing a single distance measure for the given data set, the selection of the optimal distance measure from a predefined pool of alternative measures is attempted systematically and automatically. Furthermore, instead of only selecting the optimal distance measure from a set of alternatives, an attempt is made to optimize the values of the possible control parameters related with the selected distance measure. Specifically, a pool of alternative distance measures is first created and then the differential evolution algorithm is applied to select the optimal distance measure that yields the highest classification accuracy with the current data. After determining the optimal distance measures for the given data set together with their optimal parameters, all determined distance measures are aggregated to form a single total distance measure. The total distance measure is applied to the final classification decisions.

The actual classification process is still based on the nearest prototype vector principle; a sample belongs to the class represented by the nearest prototype vector when measured with the optimized total distance measure. During the training process the differential evolution algorithm determines the optimal class vectors, selects optimal distance metrics, and determines the optimal values for the free parameters of each selected distance measure.

The results obtained with the above method confirm that the choice of distance measure is one of the most crucial factors for obtaining higher classification accuracy. The results also demonstrate that it is possible to build a classifier that is able to select the optimal distance measure for the given data set automatically and systematically. After finding optimal distance measures together with optimal parameters from the particular distance measure results are then aggregated to form a total distance, which will be used to form the deviation between the class vectors and samples and thus classify the samples.

This thesis also discusses two types of aggregation operators, namely, ordered weighted averaging (OWA) based multi-distances and generalized ordered weighted averaging (GOWA). These aggregation operators were applied in this work to the aggregation of the normalized distance values. The results demonstrate that a proper combination of aggregation operator and weight generation scheme play an important role in obtaining good classification accuracy.

The main outcomes of the work are the six new generalized versions of previous method called differential evolution classifier. All these DE classifier demonstrated good results in the classification tasks.

The differential evolution classifier is a nearest prototype vector based classifier that applies a global optimization algorithm, differential evolution, to determine the optimal values for all free parameters of the classifier model during the training phase of the classifier.

The differential evolution classifier applies the individually optimized distance measure for each new data set to be classified is generalized to cover a pool of distances. Instead of optimizing a single distance measure for the given data set, the selection of the optimal distance measure from a predefined pool of alternative measures is attempted systematically and automatically. Furthermore, instead of only selecting the optimal distance measure from a set of alternatives, an attempt is made to optimize the values of the possible control parameters related with the selected distance measure. Specifically, a pool of alternative distance measures is first created and then the differential evolution algorithm is applied to select the optimal distance measure that yields the highest classification accuracy with the current data. After determining the optimal distance measures for the given data set together with their optimal parameters, all determined distance measures are aggregated to form a single total distance measure. The total distance measure is applied to the final classification decisions.

The actual classification process is still based on the nearest prototype vector principle; a sample belongs to the class represented by the nearest prototype vector when measured with the optimized total distance measure. During the training process the differential evolution algorithm determines the optimal class vectors, selects optimal distance metrics, and determines the optimal values for the free parameters of each selected distance measure.

The results obtained with the above method confirm that the choice of distance measure is one of the most crucial factors for obtaining higher classification accuracy. The results also demonstrate that it is possible to build a classifier that is able to select the optimal distance measure for the given data set automatically and systematically. After finding optimal distance measures together with optimal parameters from the particular distance measure results are then aggregated to form a total distance, which will be used to form the deviation between the class vectors and samples and thus classify the samples.

This thesis also discusses two types of aggregation operators, namely, ordered weighted averaging (OWA) based multi-distances and generalized ordered weighted averaging (GOWA). These aggregation operators were applied in this work to the aggregation of the normalized distance values. The results demonstrate that a proper combination of aggregation operator and weight generation scheme play an important role in obtaining good classification accuracy.

The main outcomes of the work are the six new generalized versions of previous method called differential evolution classifier. All these DE classifier demonstrated good results in the classification tasks.

##### Kokoelmat

- Väitöskirjat [711]