Document Type : Research Article

Authors

1 Insurance Research Center, Tehran, Iran

2 Department of statistics‎, ‎University of Birjand‎, ‎Birjand‎, ‎Iran

3 Department of Computer Engineering‎, ‎University of Birjand‎, ‎Birjand‎, ‎Iran

10.22054/jmmf.2025.84807.1169

Abstract

Accurate prediction of third-party insurance claims is critical for pricing policies and managing risk. However, the highly imbalanced nature of insurance data—where non-claim cases vastly outnumber claim cases—poses significant challenges to standard predictive models. This study explores the use of machine learning algorithms to enhance claim prediction by directly addressing this imbalance. We use real data from the Insurance Research Center of Iran, incorporating variables such as driver characteristics, vehicle features, location, and claims history. Five models are evaluated: logistic regression, decision tree, bagging, random forest, and boosting. To handle the imbalance, we apply random undersampling, oversampling, and SMOTE. Model performance is assessed using accuracy, sensitivity, specificity, precision, and F-score. Results indicate that when data imbalance is properly treated, ensemble methods—particularly decision trees, bagging, and random forest—significantly outperform logistic regression and boosting, especially in detecting actual claim cases. The study underscores the importance of using appropriate resampling techniques and evaluation metrics in imbalanced settings. These findings can help insurers develop more reliable models for pricing and risk classification‏.

Keywords

[1] E.M., Aldahasi, R.K.,Alsheikh, F.A., Khan, G., Jeon, Optimizing fraud detection in
financial transactions with machine learning and imbalance mitigation, Expert Systems, 42
(2025), e13682.
[2] A., Abdallah, M.A., Maarof, A., Zainal, Fraud detection system: A survey. Journal of
Network and Computer Applications, 68 (2016), pp. 90-113.
[3] P., Baecke, L., Bocca , The value of vehicle telematics data in insurance risk selection
processes, Decision Support Systems, 98 (2017), pp. 69-79.
[4] K., Ding, B., Lev, X., Peng, T., Sun, M.A., Vasarhelyi, Machine learning improves
accounting estimates: Evidence from insurance payments. Review of accounting studies, 25
(2020), pp. 1098-1134.
[5] G., Dionne (Ed.), Handbook of Insurance, 2nd ed. Springer, 2013.
[6] M., Esna-Ashari, Using a new data mining method for automobile insurance fraud detection:
a case study by a real data from an Iranian insurance company, International Journal of
Mathematical Modeling Computations, 14 (2024), pp. 15-20.
[7] M., Firuzi, M., Shakouri, L., Kazemi, S., Zahedi, A data mining approach to auto insurance fraud, Iranian Journal of Insurance Research (Sanaat-e-Bimeh). 26 (2011), pp. 103-128.
Available from: https://sid.ir/paper/100794/en (in Persian).
[8] E.W., Frees , Regression modeling with actuarial and financial applications, Cambridge
University Press, 2014.
[9] N.K., Frempong, N., Nicholas, M.A., Boateng, Decision tree as a predictive modeling
tool for auto insurance claims, International Journal of Statistics and Applications, 7 (2017),
pp. 117-120.
[10] I., Goodfellow, Y., Bengio, A., Courville, Machine learning basics, Deep Learning, 1
(2016), pp. 98-164.
[11] N., Hajiheidari, S., Khaleie, A., Farahi, The insured risk classification in auto collision
insurance using data mining algorithms: evidence from an Iranian insurance company, Iranian Journal of Insurance Research (Sanaat-e-Bimeh). 26 (2012), pp. 107-129. Available from:
https://sid.ir/paper/100920/en (in Persian).
[12] M., Hanafy, R., Ming, Machine learning approaches for auto insurance big data, Risks, 9
(2021), pp. 42.
[13] M., Hanafy, R., Ming, Improving imbalanced data classification in auto insurance by the
data level approaches, Journal of Advanced Computer Science and Applications, (2021), pp.
493-499.
[14] J.T., Hancock, T.M., Khoshgoftaar, J.M., Johnson, Evaluating classifier performance
with highly imbalanced big data, Journal of Big Data, 10 (2023), pp. 1-31.
[15] G., James, D., Witten, T., Hastie, R., Tibshirani, An Introduction to Statistical Learning:
with Applications in R, 2nd ed. Springer, 2021.
[16] V., Kaelan, L., Kaelan, M., Novovi Buri, A nonparametric data mining approach for
risk prediction in car insurance, Economic Research-Ekonomska Istraivanja. 29 (2016), pp.
545-558.
[17] F., Khamesian, M., Esna-Ashari, E., Dei Ofosu-Hene, F., Khanizadeh, Risk classification of imbalanced data for car insurance companies: Machine learning approaches, International Journal of Mathematical Modelling & Computations, 12 (2022), pp. 153-162.
[18] G., Kowshalya, M., Nandhini, Predicting fraudulent claims in automobile insurance, In:
Proceedings of the 2018 Second International Conference on Inventive Communication and
Computational Technologies (ICICCT), (2018), pp. 1338-1343.
[19] M., Manteqipour, V., Ghorbani, M., Aalaei, Classifying age of policyholders according
to the claim rates in Iran, Journal of Applied Economics Studies in Iran, 39 (2021), pp.
141-175.
[20] R., Ming, O., Mohamad, N., Innab, M., Hanafy, (2024). Bagging Vs. Boosting in Ensemble Machine Learning? An Integrated Application to Fraud Risk Analysis in the Insurance
Sector, Applied Artificial Intelligence, 38 (2024), 2355024.
[21] J., Pesantez-Narvaez, M., Guillen, M., Alcaniz ˜ , Predicting motor insurance claims using
telematics dataXGBoost versus logistic regression, Risks, 7 (2019), 70.
[22] K.A., Smith, R.J., Willis, M., Brooks, An analysis of customer retention and insurance
claim patterns using data mining: a case study, Journal of the Operational Research Society,
53 (2002), pp. 532-541.
[23] G.G., Sundarkumar, V., Ravi, A novel hybrid under-sampling method for mining unbalanced datasets in banking and insurance, Engineering Applications of Artificial Intelligence,
37 (2015), pp. 368-377.
[24] M., Torkestani, A., Dehpanah, M.T., Taghavifard, S., Shafiee, Providing a framework
for reforming premium rates of vehicle collision coverage using neural networks model: a
case study of Asia Insurance Company, Journal of Information Technology Management, 8
(2017), pp. 711-732. Available from: https://sid.ir/paper/140340/en (in Persian).
[25] K.P.M.LP., Weerasinghe, M.C., Wijegunasekara, A comparative study of data mining
algorithms in the prediction of auto insurance claims, European International Journal of
Science and Technology, 5 (2016), pp. 47-54.
[26] M.V., Wuthrich, M., Merz ¨ , Statistical foundations of actuarial learning and its applications, Springer Nature, 2023.
[27] S., Wuyu, P., Cerna, Risk assessment predictive modelling in insurance industry using
data mining, Software Engineering, 6 (2019), 121.