Implementasi Algoritma Naive Bayes untuk Filtrasi Spam Komentar Judi Online pada YouTube

Faiz Jauhari Makarim Riza, Rangga Gelar Guntara, Muhammad Rizki Nugraha

Abstract


Perkembangan interaksi pengguna di platform YouTube turut menghadirkan tantangan baru, salah satunya adalah maraknya komentar spam yang mengandung unsur promosi perjudian online. Jenis komentar ini berdampak negatif terhadap komunitas yang terdapat di kanal. Penelitian ini bertujuan merancang sebuah sistem klasifikasi komentar spam dengan memanfaatkan algoritma Naive Bayes. Pengembangan sistem dilakukan berdasarkan tahapan CRISP-DM, dimulai dari proses pengambilan data komentar menggunakan YouTube API, dilanjutkan tahapan text preprocessing seperti unicode normalization, case folding, tokenizing, stopword removal, filtering, hingga pelabelan data. Selanjutnya, setiap kata diberi bobot menggunakan metode TF-IDF. Evaluasi model dilakukan menggunakan teknik K-Fold Cross Validation dan analisis Confusion Matrix. Hasil evaluasi menunjukkan bahwa model memiliki performa yang baik dengan akurasi sebesar 97,1%, precision 96,4%, recall 95,6%, dan f1-score 96%. Model ini kemudian diimplementasikan dalam bentuk aplikasi berbasis Command Line Interface (CLI) yang dapat digunakan oleh pemilik kanal YouTube untuk mendeteksi serta menghapus komentar spam secara otomatis. Berdasarkan hasil pengujian, sistem bekerja secara efektif sehingga membuktikan bahwa proses preprocessing dan algoritma Naive Bayes dapat menghasilkan sistem deteksi spam akurat.

Keywords


YouTube; Spam; Naive Bayes; Term Frequency-Inverse Document Frequency; Classifier; Command Line Interface

Full Text:

PDF

References


Abdullah, A. O., Ali, M. A., Karabatak, M., & Sengur, A. (2018). A comparative analysis of common YouTube comment spam filtering techniques. 2018 6th International Symposium on Digital Forensic and Security (ISDFS), 1–5. https://doi.org/10.1109/ISDFS.2018.8355315

Christian, H., Agus, M. P., & Suhartono, D. (2016). Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF). ComTech: Computer, Mathematics and Engineering Applications, 7(4), 285. https://doi.org/10.21512/comtech.v7i4.3746

Daniel, J., & Martin, J. H. (2024). Naive Bayes, Text Classifica tion, and Sentiment.

Fernando, J. R., Budiraharjo, R., & Haganusa, E. (2019). Spam Classification on 2019 Indonesian President Election Youtube Comments Using Multinomial Naïve-Bayes. Indonesian Journal of Artificial Intelligence and Data Mining, 2(1). https://doi.org/10.24014/ijaidm.v2i1.6445

Ghatasheh, N., Altaharwa, I., & Aldebei, K. (2022). Modified Genetic Algorithm for Feature Selection and Hyper Parameter Optimization: Case of XGBoost in Spam Prediction. IEEE Access, 10, 84365–84383. https://doi.org/10.1109/ACCESS.2022.3196905

Han, J., Pei, J., & Tong, H. (2022). Data mining: concepts and techniques. Morgan kaufmann.

Hani, D. (2023). Klasifikasi Masalah Pada Komunitas Marah-Marah Di Twitter Menggunakan Bidirectional Long Short-Term Memory.

Hayoung. (2021). A YouTube Spam Comments Detection Scheme Using Cascaded Ensemble Machine Learning Model. IEEE Access, 9, 144121–144128. https://doi.org/10.1109/ACCESS.2021.3121508

Kudo, T., & Richardson, J. (2018). SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. http://arxiv.org/abs/1808.06226

Mahmoud, T. M., El Nashar, A. I., Abd-El-Hafeez, T., & Khairy, M. (2014). An Efficient Three-phase Email Spam Filtering Technique. In British Journal of Mathematics & Computer Science (Vol. 4, Issue 9). www.sciencedomain.org

Makarin, A. A., & Astuti, L. (2023). Faktor yang Mempengaruhi Mahasiswa Melakukan Perjudian Online. Indonesian Journal of Criminal Law and Criminology (IJCLC), 3(3), 180–189. https://doi.org/10.18196/ijclc.v3i3.17674

Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.

Paramartha, P., Dewi, A., & Seputra, P. (2021). Sanksi Pidana terhadap Para Pemasang dan Promosi Iklan Bermuatan Konten Judi Online. Jurnal Preferensi Hukum, 2(1), 156–160. https://doi.org/10.22225/jph.2.1.3062.156-160

Rangga Gelar Guntara. (2023). Aplikasi Deteksi Phising Berbasis Android Menggunakan Metode Pengembangan Perangkat Lunak DSRM. Jurnal Minfo Polgan, 12(1), 303–310. https://doi.org/10.33395/jmp.v12i1.12379

Samsudin, N. M., Mohd Foozy, C. F. B., Alias, N., Shamala, P., Othman, N. F., & Wan Din, W. I. S. (2019). Youtube spam detection framework using naïve bayes and logistic regression. Indonesian Journal of Electrical Engineering and Computer Science, 14(3), 1508–1517. https://doi.org/10.11591/ijeecs.v14.i3.pp1508-1517

Schröer, C., Kruse, F., & Gómez, J. M. (2021). A systematic literature review on applying CRISP-DM process model. Procedia Computer Science, 181, 526–534. https://doi.org/10.1016/j.procs.2021.01.199

Shahriar, A. (2024). Improving Bengali and Hindi Large Language Models.

Sugiyono, D. (2013). Metode penelitian pendidikan pendekatan kuantitatif, kualitatif dan R&D.

Tantithamthavorn, C., McIntosh, S., Hassan, A. E., & Matsumoto, K. (2017). An Empirical Comparison of Model Validation Techniques for Defect Prediction Models. IEEE Transactions on Software Engineering, 43(1), 1–18. https://doi.org/10.1109/TSE.2016.2584050

Yadav, S., & Shukla, S. (2016). Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for Quality Classification. Proceedings - 6th International Advanced Computing Conference, IACC 2016, 78–83. https://doi.org/10.1109/IACC.2016.25

Zeng, G. (2020). On the confusion matrix in credit scoring and its analytical properties. Communications in Statistics - Theory and Methods, 49(9), 2080–2093. https://doi.org/10.1080/03610926.2019.1568485




DOI: https://doi.org/10.17509/ijdb.v5i2.88641

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Universitas Pendidikan Indonesia (UPI)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Digital Business is published by Universitas Pendidikan Indonesia (UPI)
and managed by Department of Digital Business
Jl. Dr. Setiabudi No.229, Kota Bandung, Indonesia - 40154
View My Stats