Projects

Selected academic and independent work.

Understanding Misinformation on Social Media

Data Science for Social Good • Python

  • Processed and analyzed 10,700 COVID-19 social media posts
  • Applied tokenization, stopword filtering, and TF-IDF vectorization
  • Built a Random Forest classifier achieving 94% accuracy
  • Used SHAP and permutation importance to interpret model decisions
Python NLP Random Forest SHAP

Predicting the Success of Netflix Movies

Statistical Learning • R

  • Cleaned and engineered features from a dataset of 32,540 movies
  • Defined a profit-based success metric for classification
  • Built Bayesian classifiers and ensemble models
  • Achieved an average prediction accuracy of 79% with cross-validation
R Bayesian Modeling Ensemble Methods Model Evaluation

Rally

Independent Project • Product & Data

  • Developed the original Suggestioneer concept, winner of the YEP competition
  • Rebuilt the project as Rally, a social recommendation engine
  • Focused on scalable design, iteration, and data-informed product decisions
Product Data Engineering Analytics