Classification Model of Employee Turnover Causes Using the CRISP-DM Framework

##plugins.themes.bootstrap3.article.main##

Daud Fernando Rangga Gelar Guntara

Abstract

The problem of high employee turnover in a company has several negative impacts in terms of cost, energy, and time and one of them is felt by the fictitious Company “XYZ”. The purpose of this research is to classify the causes of employee turnover in the industry using a classification machine learning model on two different algorithms namely Random Forest and Decision Tree. In addition, this study addresses the implications of previous classification research, employee classification in the education industry, which suggests comparing the evaluation of two machine learning model performances. There are 10 variables and 9,540 historical employee data used in the research. The research technique or method used is Cross-industry Standard Process for Data Mining (CRISP-DM). The results of this study show that the Random Forest classification model is the optimal machine learning model with an AUC - ROC value reaching 0.9988. RapidMiner was used to revalidate the performance of the machine learning model using the same parameters and resulted in the highest accuracy value of 85.04% for the Random Forest model compared to the Decion Tree model.

##plugins.themes.bootstrap3.article.details##

Section
Articles
References
Alawi, A. I. (2024). Machine Learning in Human Resource Analytics: Promotion Classification using Data Balancing Techniques. 2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems, 10(1), 1001–1021.
Ardhana, V. Y. P. (2024). Analysis Of Medicine Sales Classification Using Decision Tree Method. Jurnal Teknologi Informasi, Komputer Dan Aplikasinya, 6(1), 376–383.
Aris, A. A. (2023). The Role of Management of Human Resources in Enhancing The Quality of Schools. Journal Of Social Science Research, 3(3), 11012–11023. https://www.breathehr.com/en-gb/blog/topic/business-process/why-is-human-resources-important#:~:text=HR plays a key role,business culture covered by HR.
Effendi, M. E. (2023). Prediksi Guru Kemungkinan Tetap Bekerja di Sekolah Al Uswah Surabaya MenggunakanMachine Learning. Jurnal Informasi Dan Teknologi, 5(1), 129–137.
Faradisa, S. M., Nugrahadi, T. D., Muliadi, Budiman, I., & Kartini, D. (2021). Implementasi IQR-SMOTE Untuk Mengatasi Ketidakseimbangan Kelas Pada Klasifikasi Diabetes menggunakan K-Nearest Neighbors. 15, 48–60.
Hidayati, N. (2021). Perbandingan Algoritma Klasifikasi untuk Prediksi Cacat Software dengan Pendekatan CRISP-DM. Jurnal Sains Dan Informatika, 7(2), 117–126.
Holliday, M. (2021). What Is Employee Turnover & Why It Matters for Your Business. Netsuite.Com. https://www.netsuite.com/portal/resource/articles/human-resources/employee-turnover.shtml?mc24943=v2
Ihsani, D. A., Arifin, A., & Fatoni, M. H. (2020). Klasifikasi DNA Microarray Menggunakan Principal Component Analysis (PCA) dan Artificial Neural Network (ANN). Jurnal Teknik ITS, 9(1). https://doi.org/10.12962/j23373539.v9i1.51637
Jamiluddin, F. (2024). Implementasi Hyperparameter Tuning Grid Search CV Pada Prediksi Produksi Padi Menggunakan Algoritma Linear Regresi. Journal of Information System Research (JOSH), 6(1), 490–498.
Jungryeol, P. (2023). A study on improving turnover intention forecasting by solving imbalanced data problems: focusing on SMOTE and generative adversarial networks. Journal of Big Data, 10(1).
Marques, H. O. (2023). On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles. Data Mining and Knowledge Discovery, 37(4).
Maulana, M. A., Bijaksana, M. A., & Huda, A. F. (2019). Entity Recognition for Quran English Version with Supervised Learning Approach. 4, 77–86. https://doi.org/10.21108/indojc.2019.4.3.362
Maylani, I., Rochman, F., Kurniasari, N. D., & Timur, J. (2022). Seleksi Fitur pada Klasifikasi K-Nearest Neighbors untuk Data Churn for Bank Customers dengan Analisis Korelasi. SNESTIK.
Ordila, R., Wahyuni, R., Irawan, Y., & Yulia Sari, M. (2020). Penerapan Data Mining Untuk Pengelompokan Data Rekam Medis Pasien Berdasarkan Jenis Penyakit Dengan Algoritma Clustering (Studi Kasus : Poli Klinik PT. Inecda). Jurnal Ilmu Komputer, 9(2), 148–153. https://doi.org/10.33060/jik/2020/vol9.iss2.181
Patange, A. D. (2023). Augmentation of decision tree model through hyper-parameters tuning for monitoring of cutting tool faults based on vibration signatures. Journal of Vibration Engineering & Technologies, 11(8), 3759–3777.
Pradana, R. Y. (2024). Machine Learning Pengklasifikasikan Performa Karyawan Direct Sales Force Kartu Prabayar Menggunakan Metode Random Forest Classifier. Jurnal Teknik Informatika, 4(3).
Purwa, T. (2019). Perbandingan Metode Regresi Logistik dan Random Forest untuk Klasifikasi Data Imbalanced (Studi Kasus: Klasifikasi Rumah Tangga Miskin di Kabupaten Karangasem, Bali Tahun 2017). Jurnal Matematika, Statistika Dan Komputasi, 16(1), 58. https://doi.org/10.20956/jmsk.v16i1.6494
Richardson, B. (2021). Employee Happiness Statistics & Facts – What Makes Employees Happy? New Research For Q2 2021. Development-Academy.Co.Uk. https://development-academy.co.uk/news-tips/employee-happiness-statistics-2021/
Shafie, M. R. (2024). A cluster-based human resources analytics for predicting employee turnover using optimized Artificial Neural Networks and data augmentation. Decision Analytics Journal 11, 11(1).
Shedriko, & Firdaus, M. (2022). Penentuan Klasifikasi Dengan Crisp-Dm. The Indonesian Journal of Computer Science, 10(11), 826–831.
Singgalen, Y. A. (2024). Sentiment Classification of The Capsule Hotel Guest Reviews using Cross-Industry Standard Process for Data Mining (CRISP-DM). JURNAL MEDIA INFORMATIKA BUDIDARMA, 8(1), 632–643.
Sutisna, L. A. (2022). Using Feature Engineering In Logistic Regression And Random Forest Methods To Improve Employee Attrition Prediction In Kimia Farma. INFOKUM, 10(5), 1421–1439.