Random Forest

Predicting Car Accident Severity

Developed for Stats 101C at UCLA

Using exploratory data analysis, imputation, and machine learning classification techniques, we predicted the severity of car accidents (mild or severe) in the United States using a country-wide car accident dataset.

View GitHub | View Kaggle

Abstract

The goal of this project was to predict the severity of car accidents using a provided countrywide traffic accident dataset. Ultimately, our results were submitted to a Kaggle competition where our scores were ranked against our classmates in a 2-lecture wide competition. This dataset was extremely large, with a comprehensive 50,000 observations recorded over both a training and testing dataset. We were tasked with creating multiple predictive models based on provided and independently-developed predictors to attempt to achieve a high score on Kaggle – essentially how well our predictions match the real ‘test’ classifications. Our final model was a Random Forest model that produced a final score of 0.9355 – to earn us a score of 14th in our lecture.