About this dashboard

This dashboard analyzes credit card fraud using a public, anonymized dataset of transactions by European cardholders (not proprietary data). Source: Kaggle — Credit Card Fraud Detection (ULB) . The dataset spans ~48 hours, includes anonymized PCA-derived features (V1–V28) plus Amount, and a binary label (Class) commonly used to benchmark fraud models.

Key Performance Indicators

ROC AUC
PR AUC
Precision (Fraud=1)
Recall (Fraud=1)
F1 (Fraud=1)
Operating Threshold

ROC AUC measures the model’s ability to rank fraudulent transactions above legitimate ones across all thresholds. PR AUC emphasizes performance on the positive (fraud) class under heavy class imbalance—how precise the alerts are as we try to capture more fraud. Precision is “of all the alerts we raised, how many were actually fraud?”; Recall is “of all true frauds, how many did we catch?”. F1 is the harmonic mean of precision and recall, useful when both are important. The Operating Threshold is the probability cutoff where we decide to alert or auto-block; moving it up increases precision and reduces recall, and vice versa.

Precision–Recall Curve

Download PNG
This chart shows how precision changes as we expand coverage (recall) by lowering the decision threshold. The dashed iso-F1 lines indicate combinations of precision/recall that yield the same F1. The dot marks the highest-F1 point (your “balanced” operating point). In this run, the area under the PR curve is AP ≈ 0.885 (also displayed in the legend). In imbalanced problems like fraud, PR AUC is a more realistic summary than ROC AUC because it focuses on alert quality when positives are rare.
Precision–Recall Curve
Area under PR curve (AP). Prefer for imbalanced fraud detection.

ROC Curve

Download PNG
The ROC curve plots True Positive Rate (TPR) vs False Positive Rate (FPR) across thresholds and answers: “If we change the cutoff, how quickly do we pick up fraud relative to false alarms?” The diagonal is random. Your model achieves ROC AUC ≈ 0.984, indicating excellent ranking ability. However, for production alerting with extreme imbalance, pair ROC with PR and business cost modeling.
ROC Curve
ROC AUC as a calibration-agnostic ranking metric.

Threshold Trade-offs

Download PNG
As we raise the probability threshold, we flag fewer transactions: precision typically rises (alerts are cleaner), while recall falls (we miss more fraud). The F1 curve highlights a “balanced” choice, but your optimal cutoff should reflect business costs: the dollar loss of a missed fraud (FN), the operational cost/customer friction of a false alarm (FP), and review capacity. Select the threshold where the expected cost is minimized or where recall meets a target (e.g., Recall ≥ 0.90) subject to an FP budget.
Threshold vs Precision/Recall/F1
Choose a cutoff that balances capture rate (recall) with alert quality (precision).

Confusion Matrix @ Operating Threshold

Download PNG
Counts at the chosen cutoff (rows = actual, columns = predicted):
  • TN (True 0 → Pred 0): 0 — legitimate correctly passed
  • FP (True 0 → Pred 1): 4 — legitimate incorrectly flagged (customer friction)
  • FN (True 1 → Pred 0): 17 — fraud we missed (financial loss risk)
  • TP (True 1 → Pred 1): 81 — fraud correctly caught
From these: Precision ≈ 0.953 (81 / (81+4)), Recall ≈ 0.827 (81 / (81+17)), and F1 ≈ 0.885. Use this together with the threshold chart to decide if you prefer higher recall (catch more fraud, tolerate more FPs) or higher precision (fewer false alerts, more missed fraud).
Confusion Matrix
TN/FP/FN/TP counts at the selected operating threshold.

Cumulative Gain

Download PNG
Sort transactions by model score from highest to lowest risk. The curve shows the fraction of all fraud captured as a function of the fraction of the population reviewed. Example interpretation: “Reviewing the top 1% of scores captures X% of all fraud.” The diagonal is random selection. Use this to size manual review queues and to estimate ROI for different review depths.
Cumulative Gain Curve
What % of fraud you capture by reviewing the top X% highest-risk transactions.

Lift Curve

Download PNG
Lift compares performance to random at each review depth. A lift of 10 at 1% means the top 1% by score contains 10× as much fraud as a random 1% slice. High lift at small depths indicates the model is excellent at ranking the riskiest transactions at the very top—ideal for constrained review capacity. Expect lift to decline toward 1.0 as you include more of the population.
Lift Curve
How many times better than random you perform at each review depth.
Sources Ask ChatGPT