Lab

Classifier Threshold Lab

Drag a decision threshold across score distributions and watch precision, recall, and ROC curves update in real time.

JavaScript Required

The Classifier Threshold Lab requires JavaScript to run in your browser. Please enable JavaScript to use this tool.

Interactive threshold tuning

Pick a security scenario, drag the decision threshold, and watch how precision, recall, and the confusion matrix change in real time.

Select Scenario

Score Distribution

Threshold 0.500

Confusion Matrix

	Predicted +	Predicted −
Actual +	0	0
Actual −	0	0

Accuracy

0.000

Precision

0.000

Recall

0.000

F1 Score

0.000

FP Rate

0.000

ROC Curve

AUC = 0.000

Precision-Recall Curve

AUC = 0.000

Controls

Prevalence: 20%

Fraction of positive samples (1–50%)

Noise: 50%

Amount of distribution overlap

Optimize for

Balanced (max F1) Precision (≥ 0.95) Recall (≥ 0.95)

How Threshold Tuning Works

Decision Threshold

A classifier outputs a score between 0 and 1. The threshold determines the cutoff: scores above it are labeled positive, below negative. Moving the threshold trades false positives for false negatives.

Precision vs Recall

Precision measures "of everything flagged, how much is real?" Recall measures "of everything real, how much did we catch?" You can rarely maximize both—raising one usually lowers the other.

Base Rate Effect

When positives are rare (low prevalence), even a good classifier generates many false positives relative to true positives. This is why the insider threat scenario is hard—1% prevalence means the base rate dominates the confusion matrix.

ROC vs PR Curves

ROC curves (TPR vs FPR) can be optimistic when classes are imbalanced. Precision-recall curves expose poor performance on rare-event detection that ROC may hide. Always check both.

Challenge 1: The SOC is drowning

Scenario: Your SOC team processes 500 alerts/day. They can investigate at most 50. Set the phishing detection threshold so that no more than 5% of all samples are flagged positive.

Hint: Switch to the Phishing Detection scenario. Watch the FP Rate stat as you raise the threshold. When (prevalence × recall) + ((1 − prevalence) × FP Rate) drops below 0.05, you've hit the budget.

This mirrors real SOC capacity planning where alert volume must match analyst bandwidth.

Challenge 2: The needle in the haystack

Scenario: You're hunting insider threats (1% prevalence). The CISO demands recall ≥ 0.90—you cannot miss a real insider. Find the lowest threshold that meets this requirement.

Hint: Use the Insider Threat scenario. Watch the recall stat. Notice how many false positives you generate at 1% prevalence when recall is high. Now try the "Optimize for Recall" button and compare.

This is why base rate matters: at 1% prevalence, even 5% FPR swamps your true positives.

Challenge 3: The regulator is watching

Scenario: Your malware classifier feeds a blocking rule. Every false positive blocks a legitimate file and generates a customer complaint. The compliance team requires precision ≥ 0.95.

Hint: Use the Malware Family scenario. Push the threshold up until precision hits 0.95. Note how many real malware samples you miss (low recall). Now try adjusting the noise slider to see how a better feature set (less overlap) improves the trade-off.

High-precision requirements force you to accept lower recall—or invest in better features.

Security model

Everything runs in your browser. No data is sent to any server. Score distributions are generated client-side using a seeded pseudo-random number generator.