Feature Engineering

Modeling Customer Behavior

View PDF

View Presentation

View GitHub


s

I. Abstract

Predicting customer behavior is a crucial part of the e-commerce industry for platforms such as Fingerhut to enhance user experience and drive business. Our team leveraged the extensive dataset provided to us from Fingerhut to focus on predicting whether a customer will complete a “journey” on the company’s web page; defining a successful journey as one where the customer reaches the ‘order shipped’ event stage. After our team performed meticulous feature engineering and cleaning on the dataset, we evaluated and trained the data on several different models, such as Logistic Regression, Gradient Boosting, Neural Networks, XGBoost, and Decision Trees, finding the XGBoost model to be the most effective, achieving the highest F1 score of 73%. Although our team faced many limitations due to the extensive size of the dataset as well as our limited computing power, we believe our results to be of great value to the Fingerhut team and to have the potential to be leveraged to learn more about their customer’s behavior and evidently drive business in the coming years.

Predicting Car Accident Severity

Developed for Stats 101C at UCLA

Using exploratory data analysis, imputation, and machine learning classification techniques, we predicted the severity of car accidents (mild or severe) in the United States using a country-wide car accident dataset.

View GitHub | View Kaggle

Abstract

The goal of this project was to predict the severity of car accidents using a provided countrywide traffic accident dataset. Ultimately, our results were submitted to a Kaggle competition where our scores were ranked against our classmates in a 2-lecture wide competition. This dataset was extremely large, with a comprehensive 50,000 observations recorded over both a training and testing dataset. We were tasked with creating multiple predictive models based on provided and independently-developed predictors to attempt to achieve a high score on Kaggle – essentially how well our predictions match the real ‘test’ classifications. Our final model was a Random Forest model that produced a final score of 0.9355 – to earn us a score of 14th in our lecture.