Forest machinery productivity study with data mining
Haapalainen, Juho (2020)
Diplomityö
Haapalainen, Juho
2020
School of Engineering Science, Tuotantotalous
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2020120499517
https://urn.fi/URN:NBN:fi-fe2020120499517
Tiivistelmä
In this thesis, multidimensional sensor data from Ponsse Oy’s harvesters were utilized with data mining in order 1) to study the factors affecting harvesting productivity and 2) to discover the work stages of a harvester. As the data consisted of 9.6 million time-series observations, which had been collected from 58 sensors in 0.02 second intervals, the material for the study corresponded to over 53 hours of harvesting work, during which more than 2,6 thousand trees had been felled.
Using Python programming language, a comprehensive data preprocessing and feature extraction algorithm was developed for these data. The algorithm took the raw csv-files, used the sensor information on the harvester motions to identify five work stages (felling, processing, moving, delays and other activities) from the time-series data, and simultaneously, by extracting a set of 17 explanatory variables, gradually built a data frame, in which the rows corresponded to the temporal sequences, during which an individual tree had been felled and processed (including possible movement from the previous tree). To determine the most important factors affecting harvesting productivity, regression analysis was then conducted on this preprocessed dataset. Firstly, after an automated feature selection with backward elimination, OLS multiple regression was fitted both with standardized (μ=0 and σ^2=1) and Box-Cox-transformed values. R-squared values of 0.74 and 0.84, respectively, were obtained for these two models, and their validities were studied with selected statistical tests, including Koenker, Durbin-Watson and Jarque–Bera tests. Also, Lasso regression, with grid-search cross-validation based optimization of the penalty parameter λ, was fitted, and this time R-squared value of 0.77 was obtained.
As a result of this thesis, eight factors affecting harvesting productivity were discovered, including the diameter of the felled tree, the temporal shares of felling and processing (i.e. delimbing and cross-cutting) from the total work time, average fuel consumption, tree species, inter-tree distance, crane movement complexity and the moving average of the harvesting productivity. By far the most important factor (with standardized coefficients from 0.73 to 0.77) was the tree diameter, as opposed to the other seven factors with coefficients from 0.05 up to 0.23. The factors that did not seem to affect the productivity include, for instance, the altitude changes, the driving speed between the trees and the time since starting the current fellings.
Using Python programming language, a comprehensive data preprocessing and feature extraction algorithm was developed for these data. The algorithm took the raw csv-files, used the sensor information on the harvester motions to identify five work stages (felling, processing, moving, delays and other activities) from the time-series data, and simultaneously, by extracting a set of 17 explanatory variables, gradually built a data frame, in which the rows corresponded to the temporal sequences, during which an individual tree had been felled and processed (including possible movement from the previous tree). To determine the most important factors affecting harvesting productivity, regression analysis was then conducted on this preprocessed dataset. Firstly, after an automated feature selection with backward elimination, OLS multiple regression was fitted both with standardized (μ=0 and σ^2=1) and Box-Cox-transformed values. R-squared values of 0.74 and 0.84, respectively, were obtained for these two models, and their validities were studied with selected statistical tests, including Koenker, Durbin-Watson and Jarque–Bera tests. Also, Lasso regression, with grid-search cross-validation based optimization of the penalty parameter λ, was fitted, and this time R-squared value of 0.77 was obtained.
As a result of this thesis, eight factors affecting harvesting productivity were discovered, including the diameter of the felled tree, the temporal shares of felling and processing (i.e. delimbing and cross-cutting) from the total work time, average fuel consumption, tree species, inter-tree distance, crane movement complexity and the moving average of the harvesting productivity. By far the most important factor (with standardized coefficients from 0.73 to 0.77) was the tree diameter, as opposed to the other seven factors with coefficients from 0.05 up to 0.23. The factors that did not seem to affect the productivity include, for instance, the altitude changes, the driving speed between the trees and the time since starting the current fellings.