SCHOOL OF SCIENCE ENGINEERING AND TECHNOLOGY (SSET) DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY Modelling and Assessing the Severity of Road Traffic Accidents in Zambia, Using Data Mining (original) (raw)

Road traffic accidents are one of the leading causes of death and injuries in Zambia. Some of the answers to reducing the problem of road traffic accidents are through research, and data mining is one of the research tools for discovering the causes of road traffic accidents. The main aim of this study was to identify and investigate drivers, road, weather and motor Vehicle-related factors that contribute to the severity of a road traffic accident in Zambia. In this research, road traffic accident severity was classified into three classes and these are, fatal, seriously injured, and slightly injured. This research develops a road traffic accident prediction model and compares the performance of various prediction models in order to select the best performing algorithm in the prediction of the road traffic accident severity. The data used in this study was a data file collected from the Zambia Police Service headquarter in Lusaka. The data collected was from the year 2016 to 2020, it contained 159,698 road traffic accidents. The CRISP-DM 1.0 standard data mining methodology was adopted in this research. Using WEKA (Waikato Environment for Knowledge Analysis) data mining software, four renowned classification algorithms were engaged to model the severity of the accidents. These algorithms comprised of Decision Tree (J48), Rule Induction (PART), Naive Bayes, and Random Forest. To build the models, first the whole dataset was used as a training set for the algorithm and the same dataset was used to build classifiers using 10-fold cross-validation. To institute the main causal features for road accident severity, rules produced by the Decision Tree (J48) and PART algorithms were supplementary explored. The efficiency of the algorithms used in the research was evaluated by comparing the classification accuracy, the Receiver Operator Characteristics curve, and the results shown in the confusion matrix. The results showed that the Random forest algorithm performed better in terms of classification accuracy and produced a better Receiver Operator Characteristics curve using training set, while the J48 algorithm out-performed the other three algorithms in terms of classification accuracy using 10-fold cross-validation. The rules produced by PART algorithm shows that, year, province, tire condition, car braking condition, cause of the accident, driver's age, driver's license grade, time and lighting condition are the most important features in the classification of a road traffic accident severity.