Comparative Study of K-Nearest Neighbor and Naïve Bayes for Diabetes Risk Classification
##plugins.themes.bootstrap3.article.main##
Abstract
Diabetes mellitus is one of the fastest-growing health problems in the 21st century. One of the causes is the lack of public awareness for regular health check-ups, while the lifestyle being led is quite unhealthy. Hemoglobin A1c (HbA1c) examination is highly recommended to detect diabetes. However, this service is not yet available at Posbindu in Bulupitu Village. Therefore, another approach is needed to detect the risk of diabetes early, namely through data mining. The data mining methods used in this research are the Naïve Bayes and kNN classification methods. The variables to determine the risk of diabetes include gender, age, family history of diabetes, frequent urination, Body Mass Index (BMI), blood sugar levels, and diabetes risk output. The division of testing and training datasets uses cross-validation and ratio (60:40, 70:30, 80:20, and 90:10). The best accuracy of the Naïve Bayes method was obtained by dividing the dataset using k-fold cross-validation with k=2, achieving 96.1%. In the kNN method, the best results were obtained from the 80:20 dataset ratio. Manhattan distance was found to be the best distance calculation in this study compared to Euclidean distance and Chebyshev distance.
##plugins.themes.bootstrap3.article.details##
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The writer agreed that the article copyright by Smatika journal and the writer has the right to disseminate the paper published without permission in advance.
[2] M. Ratna Saraswati and I. Ngoerah, “Diabetes Melitus Adalah Masalah Kita,” Kementerian Kesehatan Republik Indonesia. Accessed: May 15, 2024. [Online]. Available: https://yankes.kemkes.go.id/view_artikel/1131/diabetes-melitus-adalah-masalah-kita.
[3] S. Hartini, “Hubungan HBA1c Terhadap Kadar Glukosa Darah Pada Penderita Diabetes Mellitus Di RSUD. Abdul Wahab Syahranie Samarinda Tahun 2016,” Jurnal Husada Mahakam, vol. IV, no. 3, pp. 171–180, 2016.
[4] L. Barreto Moreira and A. Amendoeira Namen, “A hybrid data mining model for diagnosis of patients with clinical suspicion of dementia,” Comput Methods Programs Biomed, pp. 139–149, 2018.
[5] W. Apriliah et al., “Prediksi Kemungkinan Diabetes pada Tahap Awal Menggunakan Algoritma Klasifikasi Random Forest,” SISTEMASI:Jurnal Sistem Informasi, vol. 10, no. 1, pp. 2540–9719, 2021, [Online]. Available: http://sistemasi.ftik.unisi.ac.id
[6] A. Dwi Cahyani and A. Basuki, “Klasifikasi Diabetes Mellitus Menggunakan Support Vector Machine (Studi Kasus: Puskesmas Modopuro, Mojokerto),” REKAYASA: Journal of Science and Technology, vol. 12, no. 2, pp. 174–182, 2019.
[7] H. A. Dwi Fasnuari, H. Yuana, and M. T. Chulkamdi, “PENERAPAN ALGORITMA K-NEAREST NEIGHBOR UNTUK KLASIFIKASI PENYAKIT DIABETES MELITUS,” Antivirus : Jurnal Ilmiah Teknik Informatika, vol. 16, no. 2, pp. 133–142, Oct. 2022, doi: 10.35457/antivirus.v16i2.2445.
[8] C. A. Rahayu, R. Hartono, and A. Sudiarjo, “Prediksi Penderita Diabetes Menggunakan Metode Naive Bayes,” JITET (Jurnal Informatika dan Teknik Elektro Terapan), vol. 11, no. 3, pp. 261–266, 2023.
[9] M. F. M. Khalik and F. Arifin, “Klasifikasi Indeks Kedalaman Kemiskinan Provinsi Sulawesi Selatan Berbasis Decision Tree, KNearest Neighbor, Naive Bayes, Neural Network, dan Random Forest,” JEPIN (Jurnal Edukasi dan Penelitian Informatika), vol. 9, no. 2, pp. 282–288, 2023.
[10] R. Putri Fadhillah et al., “KLASIFIKASI PENYAKIT DIABETES MELLITUS BERDASARKAN FAKTOR-FAKTOR PENYEBAB DIABETES MENGGUNAKAN ALGORITMA C4.5,” JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), vol. 7, no. 4, pp. 1265–1270, 2022, [Online]. Available: www.kaggle.com
[11] P. Arsi and O. Somantri, “Deteksi Dini Penyakit Diabetes Menggunakan Algoritma Neural Network Berbasiskan Algoritma Genetika,” Jurnal Informatika: Jurnal Pengembangan IT, vol. 3, no. 3, pp. 290–294, Oct. 2018, doi: 10.30591/jpit.v3i3.1008.
[12] F. Fitriyani, “Prediksi Diabetes Menggunakan Algoritma Naive Bayes dan Greedy Forward Selection,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 7, no. 2, pp. 61–69, Aug. 2021, doi: 10.25077/teknosi.v7i2.2021.61-69.
[13] W. I. N. P. Trisna, S. L. Sariwening, M. Fajar, and D. Wijayanto, “Perbandingan penghitungan jarak pada k-nearest neighbour dalam klasifikasi data tekstual,” Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 1, pp. 54–58, 2020.
[14] R. A. Siallagan and Fitriyani, “Prediksi Penyakit Diabetes Mellitus Menggunakan Algoritma C4.5,” JURNAL RESPONSIF, vol. 3, no. 1, pp. 44–52, 2021.
[15] R. Hidayati, A. Zubair, A. H. Pratama, and L. Indana, “Analisis Silhouette Coefficient pada 6 Perhitungan Jarak K-Means Clustering,” Techno. Com, vol. 20, no. 2, pp. 186–197, 2021.
[16] A. Zubair and M. Muksin, “Penerapan Metode Naive Bayes Untuk Klasifikasi Status Gizi (Studi Kasus Di Klinik Bromo Malang),” Malang, 2018.
[17] P. H. Azis, F. Fattah, and I. P. Putri, “Performa Klasifikasi K-NN dan Cross-validation pada Data Pasien Pengidap Penyakit Jantung,” ILKOM Jurnal Ilmiah , vol. 12, no. 2, pp. 81–86, 2020.
[18] W. S. Hoar, A. Zubair, and L. Muflikhah, “Analisis sentimen kebijakan masuk sekolah pagi menggunakan algoritma Naïve Bayes,” Journal of Information System and Application Development (JISAD), vol. 2, no. 1, pp. 20–30, 2024.
[19] O. Nurdiawan, R. Herdiana, and S. Anwar, “Komparasi Algoritma Naïve Bayes dan Algoritma K-Nearst Neighbor terhadap Evaluasi Pembalajaran Daring,” SMATIKA JURNAL, vol. 11, no. 02, pp. 126–135, Dec. 2021, doi: 10.32664/smatika.v11i02.621.