Minimal Spanning Tree Algorithm Applied to the Implementation of an Efficient Isolation Forest for Anomaly Detection
Materialtyp:
ArtikelUtgivningsinformation: Lublin Lublin University of Technology Publishing House Lublin University of Technology Publishing House [Imprint] 2026Beskrivning: 1 electronic resource (201 p.)Innehållstyp: - text
- computer
- online resource
- 978-83-7947-665-7
Open Access Unrestricted online access star
This study introduces innovative anomaly detection algorithms based on the Isolation Forest (IF) method. It begins by presenting an overview of the anomaly detection issue, encompassing a review of general methods used in this field. Subsequently, the IF method is discussed in detail, along with key concepts necessary for the implementation of the novel techniques. Studies regarding modifications of the IF method are analyzed. Within this review, particular emphasis is placed on those publications focused on the improvements and extensions of the basic IF algorithm. The study includes the introduction of five new anomaly detection algorithms. Among them are two attribute reduction methods used in the data preprocessing, based on clustering techniques such as ?-Means and Fuzzy ?-Means. Furthermore, a method is developed to improve the selection of the attribute value at isolation nodes, using an optimized clustering algorithm based on the minimal spanning tree. Within the next two methods, innovative solutions are introduced in the construction of isolation trees, involving the isolation of elements through their merging using the minimal spanning tree algorithm. The newly introduced isolation trees are characterized by two components of anomaly assessment: One associated with the depth of the element in the isolation tree and the other with the distance of the element from the nearest leaf node. The first of the mentioned methods utilizes the assessment function by summing the introduced components of anomaly assessment. The second technique introduces the integration of normalized components of anomaly assessment using the fuzzy rules block in the Takagi-Sugeno inference model. A comprehensive series of experiments on newly proposed approaches is conducted to evaluate the algorithms and compare them with existing competitive techniques. The studies involved 26 real-world datasets. Measures of classification quality independent of the detection threshold selection are measured, such as the area under the receiver operating characteristic curve and the area under the precision-recall curve. Additionally, optimal detection threshold values for the methods are determined, and values of measures dependent on this threshold are calculated, namely accuracy, precision, recall, specificity, false alarm ratio, and F1 measure. The results of these studies unequivocally confirm that the newly introduced solutions are characterized by high effectiveness. Moreover, other characteristics are also analyzed, including the response times of the algorithms in the training and evaluation phases, and a thorough analysis of hyperparameters is conducted. Adjusting the hyperparameters of the methods allowed for identifying possibilities for their modification to optimize the performance of the algorithms with specific tasks in mind. To demonstrate the effectiveness of the new approaches in separating anomalous samples from normal ones, a graphical representation of the separability is presented, using normalized values of the assessment function of individual algorithms. To visualize the characteristics of the newly developed algorithms, four artificially generated, two-dimensional anomaly detection datasets are prepared. Heatmaps reflecting the assessment function values are developed, both for individual isolation trees and for complex forests consisting of one hundred trees. In addition, a graphical representation of the anomaly detection process is presented, using the optimally determined detection threshold. The presented results again confirm the very good detection properties of the newly introduced methods. In the final part of the study, a synthesis of the results of experiments and analyses is conducted, key conclusions are formulated, and prospects for future research work are outlined. The directions for further development of the techniques proposed are inspired both by the observation of the effects of the experiments and by a deep understanding of the operation mechanisms applied in the discussed algorithms.
Creative Commons Licence cc by-nc-nd cc https://creativecommons.org/licenses/by-nc-nd/4.0/
eng
Freely available e-book