Fraud Detection ML Suite
Gradient-boosted ensemble with explainability to cut false positives and raise capture rate.
-27% false positives
+14% catch rate
XAI reports
Problem
False negatives drive loss while high manual review volume slows decisions. Compliance and model risk teams require transparent, defensible decisions with auditable reasoning.
Data and Sourcing
- Labeled transactions with timestamps (events, approvals, chargebacks).
- Device, network, merchant, payment, and customer features (PII excluded).
- Leak-free splits with temporal validation and out-of-time (OOT) backtests.
Approach
- Feature store with rolling windows, risk signals, and interaction terms.
- Class imbalance handling (SMOTE/undersample), calibrated thresholds, cost-sensitive objective.
- Gradient boosting (LightGBM) with Bayesian hyperparameter search & early stopping.
- Explainability with SHAP (global & local) for model risk management and reviewer guidance.
Experiments and Evaluation
- Time-based CV; tracked ROC-AUC and PR-AUC across folds and OOT windows.
- Threshold sweep to map precision–recall vs reviewer capacity (Ops intake curves).
- OOT backtest confirming stability and no target leakage.
Results and Impact
- -27% false positives at matched recall vs baseline rules.
- +14% catch rate on an OOT window at constant review volume.
- Reduced reviewer workload while maintaining coverage of high-severity cases.
What I Did
- Designed the feature set, training pipeline, and evaluation framework.
- Shipped a FastAPI scoring endpoint with auth, rate limits, and logging.
- Produced SHAP reports and governance notes to support model sign-off.
Stack
Python
LightGBM
Pandas
Scikit-learn
Imbalanced-learn
SHAP
Plotly
FastAPI
Postgres
Docker
GitHub Actions
