Although hydroplaning is a major contributor to roadway crashes, it is not typically reported in conventional crash databases. Hence, a framework to classify various crash attributes from police reports and to identify hydroplaning crashes is strongly needed. This study applied natural language processing (NLP) tools to seven years (2010–2016) of crash data from the Louisiana traffic crash database to identify hydroplaning related crashes. This research focused on the development of a framework to apply interpretable machine learning models to unstructured textual content in order to classify the number of vehicle involvements in a crash. This approach evaluated the effectiveness of keywords in determining the classification. This study used three machine learning algorithms. Of these algorithms, the eXtreme Gradient Boosting (XGBoost) model was found to be the most effective classifier. This research provided a platform to understand the application of interpretability in machine learning models. The outcomes of this study prove that underlying trends or precursors can be revealed and analyzed through these models. Furthermore, this indicates that quantitative modeling techniques can be used to address safety concerns.
|Original language||English (US)|
|Journal||Transportation Research Interdisciplinary Perspectives|
|State||Published - Jul 2020|
All Science Journal Classification (ASJC) codes
- Civil and Structural Engineering
- Management Science and Operations Research
- Automotive Engineering