Projects

Selected academic and independent work.

Rally

Independent • Base44 / OpenAI / Cursor • Oct 2025 – Present

  • Created a social platform focused on transforming ideas and conversations into change
  • Designed and built the front-end experience, prioritizing community and idea sharing
  • Led product vision and iteration, translating abstract user needs into concrete features
  • Navigated early-stage technical constraints while exploring scalable architecture
Product Management Prompt Engineering Product Design
Rally

Predicting Extreme Durability of Rolled-Formed Aluminum

Regression and Data Mining (Kaggle Competition) • Python / R • Nov 2025 – Dec 2025

  • Built and evaluated boosting models to predict durability with 160,000+ production records
  • Performed EDA and feature engineering on industrial process variables
  • Optimized models using cross-validation and log-loss minimization to improve resulst
  • Achieved 9th place in the Kaggle competion by tuning the final CatBoost model
Python R Feature Engineering Boosting
Durability project

ASA DataFest 2025

Hackathon • Python / R • May 2025

  • Analyzed 194,685 office lease transactions across 29 U.S. markets during competition
  • Designed an interactive tool to streamline client decision-making for office relocation
  • Built data visualizations to surface insights on where, when, and how companies relocate
  • Delivered a presentation to Savills executives, translating market analysis into guidance
Clustering Product Design Random Forest Time Series
ASA DataFest

A Case Study of COVID-19 Social Media Posts

Data Science for Social Good • Python • May 2025 – Jun 2025

  • Preprocessed and analyzed 10,700 COVID-19 social media posts
  • Applied tokenization, stopword filtering, and TF-IDF vectorization
  • Built a Random Forest classifier achieving 94% accuracy detecting misinformation
  • Leveraged SHAP and permutation importance to interpret model behavior
Python NLP Random Forest Factor Analysis
Misinformation project

How Exercise Effects Cortisol Experiment

Design and Analysis of Experiment • R • May 2025 – Jun 2025

  • Designed a randomized complete block experiment with a simulated population
  • Collected and analyzed 180 cortisol measurements across 90 participants
  • Ran ANOVA, post-hoc comparisons, diagnostics, and power analyses
  • Validated assumptions and flagged potential nuisance variables
R Experimental Design Statistical Testing ANOVA
Cortisol experiment

Predicting Success of Netflix Movies

Statistical Learning • R • Nov 2024 – Dec 2024

  • Cleaned and engineered features from a dataset of 32,540 movies
  • Defined a profit-based success metric for classification
  • Built Bayesian classifiers and ensemble methods
  • Achieved 79% average accuracy with cross-validation
R Bayesian Modeling Ensemble Methods Model Evaluation Machine Learning
Netflix project

San Francisco Rent Analysis

Statistical Learning • R • Sep 2024 – Oct 2024

  • Scraped and preprocessed rent data from Tidytuesday (200k+ rows)
  • Used PCA, EDA, and FA to identify latent structure
  • Applied k-means, hierarchical clustering, GMM, and PAM
  • Evaluated results in lower dimensions for enhanced clarity
R Clustering Dimensionality Reduction Machine Learning Web Scraping
Rent analysis