Lab
Classifier Threshold Lab
Drag a decision threshold across score distributions and watch precision, recall, and ROC curves update in real time.
Interactive threshold tuning
Select Scenario
Score Distribution
Confusion Matrix
| Predicted + | Predicted − | |
|---|---|---|
| Actual + | 0 | 0 |
| Actual − | 0 | 0 |
ROC Curve
AUC = 0.000Precision-Recall Curve
AUC = 0.000Controls
Fraction of positive samples (1–50%)
Amount of distribution overlap
Optimize for
How Threshold Tuning Works
Decision Threshold
A classifier outputs a score between 0 and 1. The threshold determines the cutoff: scores above it are labeled positive, below negative. Moving the threshold trades false positives for false negatives.
Precision vs Recall
Precision measures "of everything flagged, how much is real?" Recall measures "of everything real, how much did we catch?" You can rarely maximize both—raising one usually lowers the other.
Base Rate Effect
When positives are rare (low prevalence), even a good classifier generates many false positives relative to true positives. This is why the insider threat scenario is hard—1% prevalence means the base rate dominates the confusion matrix.
ROC vs PR Curves
ROC curves (TPR vs FPR) can be optimistic when classes are imbalanced. Precision-recall curves expose poor performance on rare-event detection that ROC may hide. Always check both.
Challenge 1: The SOC is drowning
Scenario: Your SOC team processes 500 alerts/day. They can investigate at most 50. Set the phishing detection threshold so that no more than 5% of all samples are flagged positive.
Hint: Switch to the Phishing Detection scenario. Watch the FP Rate stat as you raise the threshold. When (prevalence × recall) + ((1 − prevalence) × FP Rate) drops below 0.05, you've hit the budget.
This mirrors real SOC capacity planning where alert volume must match analyst bandwidth.
Challenge 2: The needle in the haystack
Scenario: You're hunting insider threats (1% prevalence). The CISO demands recall ≥ 0.90—you cannot miss a real insider. Find the lowest threshold that meets this requirement.
Hint: Use the Insider Threat scenario. Watch the recall stat. Notice how many false positives you generate at 1% prevalence when recall is high. Now try the "Optimize for Recall" button and compare.
This is why base rate matters: at 1% prevalence, even 5% FPR swamps your true positives.
Challenge 3: The regulator is watching
Scenario: Your malware classifier feeds a blocking rule. Every false positive blocks a legitimate file and generates a customer complaint. The compliance team requires precision ≥ 0.95.
Hint: Use the Malware Family scenario. Push the threshold up until precision hits 0.95. Note how many real malware samples you miss (low recall). Now try adjusting the noise slider to see how a better feature set (less overlap) improves the trade-off.
High-precision requirements force you to accept lower recall—or invest in better features.
Security model
Everything runs in your browser. No data is sent to any server. Score distributions are generated client-side using a seeded pseudo-random number generator.