# Algorithm::Classifier::IsolationForest Isolation Forest (Liu, Fei Tony & Ting, Kai & Zhou, Zhi-Hua, 2008) detects anomalies by random partitioning rather than by modelling normal points. Each tree repeatedly splits the data. Points that get isolated after only a few splits are likely anomalies. The score is the average isolation depth across many trees, normalised so values approach 1 for anomalies and stay below 0.5 for normal points. In extended mode the module implements the Extended Isolation Forest variant. Each split is a random hyperplane instead of an axis-aligned cut, which removes the rectangular, axis-aligned bias in the score field and tends to help on elongated or multi-modal data. ```perl use Algorithm::Classifier::IsolationForest; my @data = ([0.1, -0.2], [0.0, 0.1], [5.0, 6.0], ...); # Classic, axis-parallel Isolation Forest my $iforest = Algorithm::Classifier::IsolationForest->new( n_trees => 100, sample_size => 256, seed => 42, ); $iforest->fit(\@data); my $scores = $iforest->score_samples(\@data); # arrayref, each in (0,1] my $flags = $iforest->predict(\@data, 0.6); # arrayref of 0/1 # Save and reload $iforest->save('model.json'); my $reloaded = Algorithm::Classifier::IsolationForest->load('model.json'); # Extended Isolation Forest (oblique hyperplane splits) my $eif = IsolationForest->new(mode => 'extended', seed => 42); $eif->fit(\@data); ``` # Performance options A handful of constructor / method-level knobs unlock measurable speedups for specific workloads. All of them are no-ops when the optional Inline::C backend is absent. ## `parallel_fit => N` — fork-based parallel training Builds the `n_trees` across `N` forked workers (Unix-like platforms; no-op elsewhere). Each worker gets a derived RNG seed, so parallel fits are reproducible across runs at fixed worker count — though the trees *differ* from a serial fit with the same seed, because the RNG draws happen in a different order. Inference results are unaffected. ```perl my $f = Algorithm::Classifier::IsolationForest->new( n_trees => 200, sample_size => 256, seed => 42, parallel_fit => 4, # 4 forked workers )->fit(\@training_data); ``` ## `pack_data` — score the same dataset many times faster `pack_data` returns an opaque wrapper that the scoring methods accept directly, skipping the per-call walk over the arrayref-of-arrayrefs. Use it when the same dataset is scored repeatedly (interactive threshold tuning, dashboards, plotting that updates as parameters change). ```perl my $packed = $f->pack_data(\@data); my $scores = $f->score_samples($packed); my $flags = $f->predict($packed, 0.6); my ($s, $l) = $f->score_predict_split($packed); # two flat arrayrefs ``` ## `score_predict_split` — get scores + labels without the AV-of-AVs When you want both anomaly scores and 0/1 labels but don't need them paired together row-by-row, `score_predict_split` returns the two as flat arrayrefs and skips the ~`2 * n_pts` SV allocations that the classic `score_predict_samples` shape requires. ```perl my ($scores, $labels) = $f->score_predict_split(\@data, 0.6); ``` # Native acceleration (Inline::C, OpenMP, SIMD) The scoring hot path (`score_samples`, `predict`, `path_lengths`, `score_predict_samples`, `score_predict_split`) is automatically accelerated through [`Inline::C`](https://metacpan.org/pod/Inline::C) when it is installed and a working C compiler is present. On top of that: * if the toolchain accepts `-fopenmp` and can link against `libgomp`, the per-point tree walk runs in parallel across all available CPU cores using OpenMP; * on OpenMP 4.0+ compilers the extended-mode oblique dot product is vectorised via `#pragma omp simd` — substantially faster for high-feature-count extended models. Detection happens once at module load and is cached under `_Inline/`. None of these dependencies are required: without them the module falls back to a pure-Perl implementation that produces identical results, just slower. Check which backend is active on your machine: ```shell iforest accel ``` Sample output on a host with everything wired up: ``` Algorithm::Classifier::IsolationForest acceleration status Inline::C : available OpenMP : available SIMD : available Active backend: Inline::C with OpenMP + SIMD ``` User code that wants to introspect the active backend can read three package variables: ```perl $Algorithm::Classifier::IsolationForest::HAS_C # 0/1 $Algorithm::Classifier::IsolationForest::HAS_OPENMP # 0/1 $Algorithm::Classifier::IsolationForest::HAS_SIMD # 0/1 ``` # Install ## Source ```shell perl Makefile.PL make make test make install ``` ## FreeBSD ```shell pkg install p5-App-Cmd p5-File-Slurp p5-App-cpanminus \ p5-Inline p5-Inline-C gcc cpanm Algorithm::Classifier::IsolationForest ``` `gcc` ships with `libgomp` and provides the OpenMP runtime; the system clang does not by default. `p5-Inline-C` is what makes the C backend build at module load. ## Debian ```shell apt-get install libapp-cmd-perl libfile-slurp-perl cpanminus \ libinline-c-perl gcc cpanm Algorithm::Classifier::IsolationForest ``` `libinline-c-perl` brings in `libinline-perl`. `gcc` pulls in `libgomp1` (the OpenMP runtime), which is what enables the parallel tree-walk. Both dependencies are optional — leave them out and the module installs and runs in pure-Perl mode.