Hybrid Explainable AI Approach for DNA Sequence Classification: Feature Importance, Permutation Importance, and Local Explanations with LIME

Nur Alamsyah, Budiman Budiman, Reni Nursyanti, Elia Setiana, Venia Restreva Danestiara, Titan Parama Yoga

Abstract


Understanding the contribution of features in DNA sequence classification is crucial for enhancing model interpretability and reliability. This study proposes a Hybrid Explainable AI (XAI) approach that integrates Feature Importance (FI), Permutation Importance (PI), and Local Interpretable Model-Agnostic Explanations (LIME) to analyze the most influential features in a Random Forest classifier. FI is utilized to determine the most significant features contributing to the model, while PI validates their impact by assessing performance changes when features are shuffled. Additionally, LIME is employed to provide local explanations, offering insight into how specific feature values affect classification decisions. Experimental results on a publicly available DNA sequence dataset reveal a strong correlation between FI and PI rankings, validating the stability of key features such as A84, A89, and A92. LIME further enhances interpretability by highlighting individual instance contributions, reinforcing the relevance of specific nucleotide positions in sequence classification. This hybrid approach provides a more comprehensive understanding of feature importance, improving trust and transparency in DNA sequence classification models. 

Keywords


Explainable AI, Feature Importance, Permutation Importancer, LIME, DNA Classification.

Full Text:

PDF

References


D. Bendigeri, L. Sakri, S. Mural, S. Hukkeri, and P. Tayannavar, “Human Genetic based Disease Identification,” in 2024 International Conference on Inventive Computation Technologies (ICICT), IEEE, 2024, pp. 486–491.

H. Song et al., “A novel approach utilizing domain adversarial neural networks for the detection and classification of selective sweeps,” Adv. Sci., vol. 11, no. 14, p. 2304842, 2024.

F. M. Talaat, A. Aljadani, M. Badawy, and M. Elhosseini, “Toward interpretable credit scoring: integrating explainable artificial intelligence with deep learning for credit card default prediction,” Neural Comput. Appl., vol. 36, no. 9, pp. 4847–4865, 2024.

A. Husnain, A. Shiwlani, M. N. Gondal, A. Ahmad, A. Saeed, and others, “Utilizing machine learning for proactive detection of cardiovascular risks: A data-driven approach,” Int. J. Sci. Res. Arch., vol. 13, no. 1, pp. 1280–1290, 2024.

E. Hikmawati and N. Alamsyah, “Supervised Learning for Emotional Prediction and Feature Importance Analysis Using SHAP on Social Media User Data.,” Ingénierie Systèmes Inf., vol. 29, no. 6, 2024.

V. Vimbi, N. Shaffi, and M. Mahmud, “Interpreting artificial intelligence models: a systematic review on the application of LIME and SHAP in Alzheimer’s disease detection,” Brain Inform., vol. 11, no. 1, p. 10, 2024.

U. Michelucci, “Feature Importance and Selection,” in Fundamental Mathematical Concepts for Machine Learning in Science, Springer, 2024, pp. 229–242.

N. Alamsyah, B. Budiman, T. P. Yoga, and R. Y. R. Alamsyah, “XGBOOST HYPERPARAMETER OPTIMIZATION USING RANDOMIZEDSEARCHCV FOR ACCURATE FOREST FIRE DROUGHT CONDITION PREDICTION,” J. Pilar Nusa Mandiri, vol. 20, no. 2, pp. 103–110, 2024.

J. Narkhede, “Comparative Evaluation of Post-Hoc Explainability Methods in AI: LIME, SHAP, and Grad-CAM,” in 2024 4th International Conference on Sustainable Expert Systems (ICSES), IEEE, 2024, pp. 826–830.

T. Khater et al., “Explainable Machine Learning Model for Alzheimer Detection Using Genetic Data: A Genome-Wide Association Study Approach,” IEEE Access, 2024.

W. Zhou, Z. Yan, and L. Zhang, “A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction,” Sci. Rep., vol. 14, no. 1, p. 5905, 2024.

I. M. Zubair, Y.-S. Lee, and B. Kim, “A New Permutation-Based Method for Ranking and Selecting Group Features in Multiclass Classification,” Appl. Sci., vol. 14, no. 8, p. 3156, 2024.

E. Cantor, S. Guauque-Olarte, R. León, S. Chabert, and R. Salas, “Knowledge-slanted random forest method for high-dimensional data and small sample size with a feature selection application for gene expression data,” BioData Min., vol. 17, no. 1, p. 34, 2024.

J. Labory, E. Njomgue-Fotso, and S. Bottini, “Benchmarking feature selection and feature extraction methods to improve the performances of machine-learning algorithms for patient classification using metabolomics biomedical data,” Comput. Struct. Biotechnol. J., vol. 23, pp. 1274–1287, 2024.

A. Puiu, C. Gómez Tapia, M. E. Weiss, V. Singh, A. Kamen, and M. Siebert, “Prediction uncertainty estimates elucidate the limitation of current NSCLC subtype classification in representing mutational heterogeneity,” Sci. Rep., vol. 14, no. 1, p. 6779, 2024.

H. Lee, U. Ozbulak, H. Park, S. Depuydt, W. De Neve, and J. Vankerschaver, “Assessing the reliability of point mutation as data augmentation for deep learning with genomic data,” BMC Bioinformatics, vol. 25, no. 1, p. 170, 2024.

N. Alamsyah, A. P. Kurniati, and others, “A Novel Airfare Dataset To Predict Travel Agent Profits Based On Dynamic Pricing,” in 2023 11th International Conference on Information and Communication Technology (ICoICT), IEEE, 2023, pp. 575–581.

N. Alamsyah, B. Budiman, T. P. Yoga, and R. Y. R. Alamsyah, “COMPARISON LINEAR REGRESSION AND RANDOM FOREST MODELS FOR PREDICTION OF UNDERGROUND DROUGHT LEVELS IN FOREST FIRES,” J. Techno Nusa Mandiri, vol. 21, no. 2, pp. 81–86, 2024.

N. Alamsyah, A. Hendra, E. Setiana, T. P. Yoga, V. R. Danestiara, and others, “Improved Prediction Of Global Temperature Via LSTM Using ReLU Activation And Hyperparameter Optimization,” in 2024 International Conference on Information Technology Research and Innovation (ICITRI), IEEE, 2024, pp. 41–46.

A. G. Putrada, I. D. Oktaviani, M. N. Fauzan, and N. Alamsyah, “CNN Pruning for Edge Computing-Based Corn Disease Detection with a Novel NG-Mean Accuracy Loss Optimization,” Telematika, vol. 17, no. 2, pp. 68–83, 2024.




DOI: https://doi.org/10.17509/coelite.v4i1.81356

Refbacks

  • There are currently no refbacks.


Journal of Computer Engineering, Electronics and Information Technology (COELITE)


is published by UNIVERSITAS PENDIDIKAN INDONESIA (UPI),
and managed by Department of Computer Enginering.
Jl. Dr. Setiabudi No.229, Kota Bandung, Indonesia - 40154
email: [email protected]
e-ISSN: 2829-4149
p-ISSN: 2829-4157
Owner: OBS

https://poltekkesjakut.org/

https://poltekkeskalteng.org/

nana4d

https://poltekkessulsel.org/

https://poltekkesjaksel.org/

slot88

slot88

slot88

nana4d

nana4d

https://processoseletivo.fumec.br/

rokokbet

https://sigindonesia.com/

nana4d

slot88

https://jimki.bapin.or.id/

matauangslot

matauangslot

matauangslot

matauangslot

matauangslot

matauangslot

matauangslot

matauangslot

matauangslot

matauangslot

matauangslot

matauangslot

matauangslot

matauangslot

slot thailand

matauangslot

matauangslot

matauangslot

matauangslot

himpsi

himpsi

himpsi

himpsi

chope

mayora

matauangslot

matauangslot

matauangslot

matauangslot

matauangslot

barbartoto

toto777

barbartoto

toto777

barbartoto

barbartoto

toto777

toto777

toto777

barbartoto

toto777

toto777

barbartoto

barbartoto

barbartoto

barbartoto

barbartoto

barbartoto

barbartoto

barbartoto

barbartoto

barbartoto