This project tackled the real-world problem of financial fraud, a global crisis estimated at $3.1 trillion in illicit flows in 2023 alone.
As a core team member, I contributed to the design and execution of the full data science pipeline. We engineered a synthetic dataset inspired by real-world sources including PaySim and IBM’s Anti-Money Laundering dataset, incorporating behavioral, transactional, and demographic features. I was involved in data cleaning, feature selection, and model development using PyCaret, where we benchmarked multiple classification algorithms — logistic regression, random forest, and gradient boosting — evaluated on AUC, precision, recall, and F1-score. We supplemented the supervised model with an Isolation Forest unsupervised anomaly detection approach, which independently flagged ~5% of transactions as suspicious, aligning well with the supervised results. The project also included a rigorous ethical analysis covering algorithmic bias, false positive risks, and fairness considerations — areas directly relevant to real-world financial deployment.