About this dashboard
This dashboard analyzes credit card fraud using a
public, anonymized dataset of transactions by European cardholders (not proprietary data).
Source:
Kaggle — Credit Card Fraud Detection (ULB)
.
The dataset spans ~48 hours, includes anonymized PCA-derived features (V1–V28
) plus Amount
,
and a binary label (Class
) commonly used to benchmark fraud models.
Key Performance Indicators
ROC AUC
—
PR AUC
—
Precision (Fraud=1)
—
Recall (Fraud=1)
—
F1 (Fraud=1)
—
Operating Threshold
—
ROC AUC measures the model’s ability to rank fraudulent transactions above legitimate ones across all thresholds.
PR AUC emphasizes performance on the positive (fraud) class under heavy class imbalance—how precise the alerts are
as we try to capture more fraud. Precision is “of all the alerts we raised, how many were actually fraud?”;
Recall is “of all true frauds, how many did we catch?”. F1 is the harmonic mean of precision and recall,
useful when both are important. The Operating Threshold is the probability cutoff where we decide to alert or
auto-block; moving it up increases precision and reduces recall, and vice versa.
Precision–Recall Curve
This chart shows how precision changes as we expand coverage (recall) by lowering the decision threshold.
The dashed iso-F1 lines indicate combinations of precision/recall that yield the same F1. The dot marks the
highest-F1 point (your “balanced” operating point). In this run, the area under the PR curve is
AP ≈ 0.885 (also displayed in the legend). In imbalanced problems like fraud, PR AUC is a more realistic
summary than ROC AUC because it focuses on alert quality when positives are rare.

ROC Curve
The ROC curve plots True Positive Rate (TPR) vs False Positive Rate (FPR) across thresholds and
answers: “If we change the cutoff, how quickly do we pick up fraud relative to false alarms?” The diagonal is
random. Your model achieves ROC AUC ≈ 0.984, indicating excellent ranking ability. However, for production
alerting with extreme imbalance, pair ROC with PR and business cost modeling.

Threshold Trade-offs
As we raise the probability threshold, we flag fewer transactions: precision typically rises (alerts are cleaner),
while recall falls (we miss more fraud). The F1 curve highlights a “balanced” choice, but your optimal
cutoff should reflect business costs: the dollar loss of a missed fraud (FN), the operational cost/customer
friction of a false alarm (FP), and review capacity. Select the threshold where the expected cost is minimized
or where recall meets a target (e.g., Recall ≥ 0.90) subject to an FP budget.

Confusion Matrix @ Operating Threshold
Counts at the chosen cutoff (rows = actual, columns = predicted):
- TN (True 0 → Pred 0): 0 — legitimate correctly passed
- FP (True 0 → Pred 1): 4 — legitimate incorrectly flagged (customer friction)
- FN (True 1 → Pred 0): 17 — fraud we missed (financial loss risk)
- TP (True 1 → Pred 1): 81 — fraud correctly caught

Cumulative Gain
Sort transactions by model score from highest to lowest risk. The curve shows the fraction of all fraud captured
as a function of the fraction of the population reviewed. Example interpretation: “Reviewing the top 1% of scores
captures X% of all fraud.” The diagonal is random selection. Use this to size manual review queues and to estimate
ROI for different review depths.

Lift Curve
Lift compares performance to random at each review depth. A lift of 10 at 1% means the top 1% by score
contains 10× as much fraud as a random 1% slice. High lift at small depths indicates the model is excellent at
ranking the riskiest transactions at the very top—ideal for constrained review capacity. Expect lift to decline
toward 1.0 as you include more of the population.
