Projects
Selected academic and independent work.
Rally
Independent • Base44 / OpenAI / Cursor • Oct 2025 – Present
- Created a social platform focused on transforming ideas and conversations into change
- Designed and built the front-end experience, prioritizing community and idea sharing
- Led product vision and iteration, translating abstract user needs into concrete features
- Navigated early-stage technical constraints while exploring scalable architecture
Regression and Data Mining (Kaggle Competition) • Python / R • Nov 2025 – Dec 2025
- Built and evaluated boosting models to predict durability with 160,000+ production records
- Performed EDA and feature engineering on industrial process variables
- Optimized models using cross-validation and log-loss minimization to improve resulst
- Achieved 9th place in the Kaggle competion by tuning the final CatBoost model
Hackathon • Python / R • May 2025
- Analyzed 194,685 office lease transactions across 29 U.S. markets during competition
- Designed an interactive tool to streamline client decision-making for office relocation
- Built data visualizations to surface insights on where, when, and how companies relocate
- Delivered a presentation to Savills executives, translating market analysis into guidance
Data Science for Social Good • Python • May 2025 – Jun 2025
- Preprocessed and analyzed 10,700 COVID-19 social media posts
- Applied tokenization, stopword filtering, and TF-IDF vectorization
- Built a Random Forest classifier achieving 94% accuracy detecting misinformation
- Leveraged SHAP and permutation importance to interpret model behavior
Design and Analysis of Experiment • R • May 2025 – Jun 2025
- Designed a randomized complete block experiment with a simulated population
- Collected and analyzed 180 cortisol measurements across 90 participants
- Ran ANOVA, post-hoc comparisons, diagnostics, and power analyses
- Validated assumptions and flagged potential nuisance variables
Statistical Learning • R • Nov 2024 – Dec 2024
- Cleaned and engineered features from a dataset of 32,540 movies
- Defined a profit-based success metric for classification
- Built Bayesian classifiers and ensemble methods
- Achieved 79% average accuracy with cross-validation
Statistical Learning • R • Sep 2024 – Oct 2024
- Scraped and preprocessed rent data from Tidytuesday (200k+ rows)
- Used PCA, EDA, and FA to identify latent structure
- Applied k-means, hierarchical clustering, GMM, and PAM
- Evaluated results in lower dimensions for enhanced clarity