# Algorithm::Classifier::IsolationForest

Isolation Forest (Liu, Fei Tony & Ting, Kai & Zhou, Zhi-Hua, 2008) detects anomalies by
random partitioning rather than by modelling normal points. Each tree repeatedly splits
the data. Points that get isolated after only a few splits are likely anomalies. The score
is the average isolation depth across many trees, normalised so values approach 1 for
anomalies and stay below 0.5 for normal points.

In extended mode the module implements the Extended Isolation Forest variant. Each split
is a random hyperplane instead of an axis-aligned cut, which removes the rectangular,
axis-aligned bias in the score field and tends to help on elongated or multi-modal data.

```perl
use Algorithm::Classifier::IsolationForest;

my @data = ([0.1, -0.2], [0.0, 0.1], [5.0, 6.0], ...);

# Classic, axis-parallel Isolation Forest
my $iforest = Algorithm::Classifier::IsolationForest->new(
    n_trees     => 100,
    sample_size => 256,
    seed        => 42,
);
$iforest->fit(\@data);

my $scores = $iforest->score_samples(\@data);  # arrayref, each in (0,1]
my $flags  = $iforest->predict(\@data, 0.6);    # arrayref of 0/1

# Save and reload
$iforest->save('model.json');
my $reloaded = Algorithm::Classifier::IsolationForest->load('model.json');

# Extended Isolation Forest (oblique hyperplane splits)
my $eif = IsolationForest->new(mode => 'extended', seed => 42);
$eif->fit(\@data);
```

# Performance options

A handful of constructor / method-level knobs unlock measurable speedups
for specific workloads.  All of them are no-ops when the optional
Inline::C backend is absent.

## `parallel_fit => N` — fork-based parallel training

Builds the `n_trees` across `N` forked workers (Unix-like platforms; no-op
elsewhere).  Each worker gets a derived RNG seed, so parallel fits are
reproducible across runs at fixed worker count — though the trees
*differ* from a serial fit with the same seed, because the RNG draws
happen in a different order.  Inference results are unaffected.

```perl
my $f = Algorithm::Classifier::IsolationForest->new(
    n_trees      => 200,
    sample_size  => 256,
    seed         => 42,
    parallel_fit => 4,       # 4 forked workers
)->fit(\@training_data);
```

## `pack_data` — score the same dataset many times faster

`pack_data` returns an opaque wrapper that the scoring methods accept
directly, skipping the per-call walk over the arrayref-of-arrayrefs.
Use it when the same dataset is scored repeatedly (interactive threshold
tuning, dashboards, plotting that updates as parameters change).

```perl
my $packed = $f->pack_data(\@data);
my $scores = $f->score_samples($packed);
my $flags  = $f->predict($packed, 0.6);
my ($s, $l) = $f->score_predict_split($packed);  # two flat arrayrefs
```

## `score_predict_split` — get scores + labels without the AV-of-AVs

When you want both anomaly scores and 0/1 labels but don't need them
paired together row-by-row, `score_predict_split` returns the two as
flat arrayrefs and skips the ~`2 * n_pts` SV allocations that the
classic `score_predict_samples` shape requires.

```perl
my ($scores, $labels) = $f->score_predict_split(\@data, 0.6);
```

# Native acceleration (Inline::C, OpenMP, SIMD)

The scoring hot path (`score_samples`, `predict`, `path_lengths`,
`score_predict_samples`, `score_predict_split`) is automatically
accelerated through [`Inline::C`](https://metacpan.org/pod/Inline::C)
when it is installed and a working C compiler is present.  On top of
that:

* if the toolchain accepts `-fopenmp` and can link against `libgomp`,
  the per-point tree walk runs in parallel across all available CPU
  cores using OpenMP;
* on OpenMP 4.0+ compilers the extended-mode oblique dot product is
  vectorised via `#pragma omp simd` — substantially faster for
  high-feature-count extended models.

Detection happens once at module load and is cached under `_Inline/`.
None of these dependencies are required: without them the module falls
back to a pure-Perl implementation that produces identical results,
just slower.

Check which backend is active on your machine:

```shell
iforest accel
```

Sample output on a host with everything wired up:

```
Algorithm::Classifier::IsolationForest acceleration status
  Inline::C : available
  OpenMP    : available
  SIMD      : available

Active backend: Inline::C with OpenMP + SIMD
```

User code that wants to introspect the active backend can read three
package variables:

```perl
$Algorithm::Classifier::IsolationForest::HAS_C       # 0/1
$Algorithm::Classifier::IsolationForest::HAS_OPENMP  # 0/1
$Algorithm::Classifier::IsolationForest::HAS_SIMD    # 0/1
```

# Install

## Source

```shell
perl Makefile.PL
make
make test
make install
```

## FreeBSD

```shell
pkg install p5-App-Cmd p5-File-Slurp p5-App-cpanminus \
            p5-Inline p5-Inline-C gcc
cpanm Algorithm::Classifier::IsolationForest
```

`gcc` ships with `libgomp` and provides the OpenMP runtime; the system
clang does not by default.  `p5-Inline-C` is what makes the C backend
build at module load.

## Debian

```shell
apt-get install libapp-cmd-perl libfile-slurp-perl cpanminus \
                libinline-c-perl gcc
cpanm Algorithm::Classifier::IsolationForest
```

`libinline-c-perl` brings in `libinline-perl`.  `gcc` pulls in `libgomp1`
(the OpenMP runtime), which is what enables the parallel tree-walk.  Both
dependencies are optional — leave them out and the module installs and
runs in pure-Perl mode.