Projects

Here’s a collection of personal projects I’ve worked on — some started out of curiosity, others from a desire to build something useful. All of them helped me learn along the way.

Hinge Data Analysis

A project that analyzes and visualizes personal data exports provided by the dating app Hinge.

By examining the user’s profile, dating preferences, and interactions with other users, the project aims to reveal patterns, trends, and meaningful statistics that enhance the understanding of how users engage with Hinge and make decisions based on their preferences.

Technologies: Python, Plotly, pandas, Docker
Data Source: Personal data exports from Hinge
GitHub: View Repository
Key Features: Analyzes user profile presentation, dating preferences, messaging patterns, response times, and match durations

Spore Sense - Mushroom Classification

A project predicting whether mushrooms are poisonous or edible using machine learning models.

This project uses neural networks and dimensionality reduction (PCA) techniques on the UCI Mushroom dataset to determine whether mushrooms are edible or poisonous, while exploring whether PCA improves model efficiency without reducing accuracy.

Technologies: Python, pandas, sklearn, tensorflow, matplotlib
Data Source: UC Irvine’s machine learning repository, collected by G.H. Lincoff and published by Alfred A. Knopf in The Audubon Society Field Guide to North American Mushrooms (1981)
GitHub: View Repository
Key Features: Compares baseline and PCA-enhanced neural networks, visualizes confusion matrices, and analyzes model accuracy and efficiency.

Student Performance Predictions using Machine Learning

A project that analyzes data from a Portuguese school system, and creates machine learning models to predict student’s performance.

These data will help school administrators make more informed decisions about which students need more assistance in order to achieve better learning outcomes.

Technologies: Python, pandas, sklearn, matplotlib
Data Source: UC Irvine’s machine learning repository
GitHub: View Repository
Key Features: Analyzes student data collected from reports and surveys, trains two sets of machine learning models, determines the best fit for predicting future student performance.

BRFSS 2021 Mental Health Analysis

Predicting mental health outcomes using CDC survey data.

This project analyzes data from the 2021 Behavioral Risk Factor Surveillance System (BRFSS), a large-scale health survey conducted by the CDC. Using linear regression models, the notebook explores how factors like employment status, e-cigarette usage, and receiving the flu shot relate to reported mental health outcomes.

Technologies: R, Jupyter, tidyverse, lm.beta
Dataset: 2021 BRFSS (Behavioral Risk Factor Surveillance System)
Focus: Mental health prediction using linear regression
GitHub: View Repository

Scooby-Doo Episode Analysis

Unmasking decades of cartoon mysteries through data.

This project analyzes episodes and monster encounters from the long-running Scooby-Doo franchise using a comprehensive dataset from Kaggle with data from over 600 episodes and movies. It explores how different themes, characters, and catchphrases show up over time, and highlights the show’s trends through data visualization.

Technologies: Python, Jupyter, pandas, matplotlib
Dataset: Scooby-Doo Complete Dataset (Kaggle)
Focus: Data cleansing, exploratory data analysis, visualization
GitHub: View Repository

Job Search Sankey Visualization

A visual exploration of job application progress through interview stages using Sankey diagrams.

In this notebook, I process and visualize detailed job application data tracking how candidates move through various interview stages over time. By mapping transitions such as “Applied” to “Recruiter Inquiry,” then to “Technical Interview” or “Offer,” across multiple companies and application cycles, this project provides insights into common hiring workflows, bottlenecks where candidates drop out, and patterns that lead to successful offers. Using Sankey diagrams, the project reveals common paths, drop-off points, and outcomes to better understand the flow and challenges of navigating job applications.

Technologies: Python, Jupyter, pandas, Plotly
Dataset: Personal job application tracking data in .csv format across companies and interview stages
Focus: Visualizing job search interview stages with Sankey diagrams
GitHub: View Repository

Palmer Penguins Data Storytelling

An exploratory analysis using the palmerpenguins dataset, a collection of data about penguins from the Palmer Archipelago in Antarctica.

In this notebook, I leverage data analysis and visualization techniques using ggplot2, and I explore and uncover insights from the palmerpenguins dataset through compelling visualizations.

Technologies: R, Jupyter, ggplot2, dplyr, tidyr
Focus: Reproducible visual insights from real-world ecological data
GitHub: View Repository