Improving Transformer Performance for Text Summarization in Video Transcription

Rizky Dwi Putra, Aldy Rialdy Atmadja, Yana Aditia Gerhana

Abstract


In conveying information today, it can take the form of online video content. The rapid growth of online video content has created a strong need for automatic text summarization to improve information efficiency. Summarization is important because it allows audiences to quickly capture the essence of lengthy materials, reduces information overload, and ensures that key points can be accessed without going through the entire content. This study explores the use of Whisper Turbo for transcription and mT5 for summarizing Indonesian-language YouTube videos. Whisper Turbo produces accurate transcriptions, although the results vary depending on audio quality and topic complexity. The transcribed text is then summarized using mT5, which achieves strong performance with a ROUGE-1 F1 score of 54.13% and a ROUGE-L score of 49.39%. These findings indicate that mT5 outperforms the standard T5 model despite using less training data. Overall, the combination of Whisper Turbo and mT5 offers an effective solution for generating concise and reliable summaries of video content, with broad potential applications in education, journalism, and digital documentation.


Keywords


Machine Learning, Deep Learning, Text Summarization, Natural Language Processing (NLP)

Full Text:

PDF

References


A. Faidlatul Habibah and I. Irwansyah, “Era Masyarakat Informasi sebagai Dampak Media Baru,” Jurnal Teknologi Dan Sistem Informasi Bisnis, vol. 3, no. 2, pp. 350–363, Jul. 2021, doi: 10.47233/jteksis.v3i2.255.

B. Rahmat and D. Darmiati, “Pengembangan Media Pembelajaran dengan Video Based Learning di Akademi Kebidanan Pelamonia,” Lectura: Jurnal Pendidikan, vol. 12, no. 2, pp. 149–165, Aug. 2021, doi: 10.31849/lectura.v12i2.7268.

H. Haerawan, W. Cale, and U. Barroso, “The Effectiveness of Interactive Videos in Increasing Student Engagement in Online Learning,” Journal of Computer Science Advancements, vol. 2, no. 5, pp. 244–258, Oct. 2024, doi: 10.70177/jsca.v2i5.1322.

H. Burhan Ul Haq, M. Asif, M. Asif, and M. Bin Ahmad, “Video Summarization Techniques: A Review Article in,” International Journal of Scientific & Technology Research, 2021, [Online]. Available: www.ijstr.org

H. Lai and X. Yan, “Multimodal Sentiment Analysis with Asymmetric Window Multi-Attentions,” Multimed Tools Appl, vol. 81, no. 14, pp. 19415–19428, Jun. 2022, doi: 10.1007/s11042-021-11234-y.

P. Saini, K. Kumar, S. Kashid, A. Saini, and A. Negi, “Video Summarization Using Deep Learning Techniques: A Detailed Analysis and Investigation,” Artif Intell Rev, vol. 56, no. 11, pp. 12347–12385, Nov. 2023, doi: 10.1007/s10462-023-10444-0.

J. Xie, X. Chen, S. Zhao, and S.-P. Lu, “Video Summarization via Knowledge-Aware Multimodal Deep Networks,” Knowl Based Syst, vol. 293, p. 111670, Jun. 2024, doi: 10.1016/j.knosys.2024.111670.

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Supervision,” Dec. 2022, [Online]. Available: http://arxiv.org/abs/2212.04356

R. S. A. Pratama and A. Amrullah, “Analysis Of Whisper Automatic Speech Recognition Performance on Low Resource Language,” Jurnal Pilar Nusa Mandiri, vol. 20, no. 1, pp. 1–8, Mar. 2024, doi: 10.33480/pilar.v20i1.4633.

Roissyah Fernanda Khoiroh, Eric Julianto, Safrizal Ardana Ardiyansa, H. A. Fajri, Aryaguna Abi Rafdi Yasa, and Brian Sangapta, “Implementasi Speech Recognition Whisper pada Debat Calon Wakil Presiden Republik Indonesia,” Explore, vol. 14, no. 2, pp. 67–74, Jul. 2024, doi: 10.35200/ex.v14i2.115.

S. Masri, Y. Raddad, F. Khandaqji, H. I. Ashqar, and M. Elhe-Nawy, “Transformer Models in Education: Summarizing Science Textbooks with AraBART, MT5, AraT5, and mBART.”

K. F. H. Holle, D. N. Munna, and E. W. Ekaputri, “Performance Evaluation of Transformer Models: Scratch, Bart, and Bert for News Document Summarization,” Jurnal Teknik Informatika (Jutif), vol. 6, no. 2, pp. 787–802, Apr. 2025, doi: 10.52436/1.jutif.2025.6.2.2534.

Y. Singh, R. Kumar, S. Kabdal, and P. Upadhyay, “YouTube Video Summarizer Using NLP: A Review,” International Journal of Performability Engineering, vol. 19, no. 12, pp. 817–823, Dec. 2023, doi: 10.23940/ijpe.23.12.p6.817823.

G. Hartawan, D. Sa’adillah Maylawati, and W. Uriawan, “JIP (Jurnal Informatika Polinema) Halaman| 535 Bidirectional and Auto-Regressive Transformer (Bart) For Indonesian Abstractive Text Summarization”.

M. F. Fadlilah, A. R. Atmadja, and M. D. Firdaus, “Pemanfaatan Transformer untuk Peringkasan Teks: Studi Kasus pada Transkripsi Video Pembelajaran,” Technology and Science (BITS), vol. 6, no. 3, 2024, doi: 10.47065/bits.v6i3.6342.

S. Nasution, R. Ferdiana, and R. Hartanto, “Towards Two-Step Fine-Tuned Abstractive Summarization for Low-Resource Language Using Transformer T5,” International Journal of Advanced Computer Science and Applications, vol. 16, no. 2, 2025, doi: 10.14569/IJACSA.2025.01602120.

A. Auriemma Citarella, M. Barbella, M. G. Ciobanu, F. De Marco, L. Di Biasi, and G. Tortora, “Assessing the effectiveness of ROUGE as unbiased metric in Extractive vs. Abstractive summarization techniques,” J Comput Sci, vol. 87, May 2025, doi: 10.1016/j.jocs.2025.102571.

K. K. R. Nareddy, S. Ternus, and J. Niebling, “Analyzing and Fine-Tuning Whisper Models for Multilingual Pilot Speech Transcription in the Cockpit,” Jun. 2025, [Online]. Available: http://arxiv.org/abs/2506.21990

S. Katkov, A. Liotta, and A. Vietti, “Benchmarking Whisper Under Diverse Audio Transformations and Real-Time Constraints,” 2025, pp. 82–91. doi: 10.1007/978-3-031-77961-9_6.

I. Gusti et al., “Abstractive Text Summarization to Generate Indonesian News Highlight Using Transformers Model,” Journal of Information Systems and Informatics, vol. 7, no. 2, 2025, doi: 10.51519/journalisi.v7i2.1082.




DOI: https://doi.org/10.17509/coelite.v4i2.89353

Refbacks

  • There are currently no refbacks.


Journal of Computer Engineering, Electronics and Information Technology (COELITE)


is published by UNIVERSITAS PENDIDIKAN INDONESIA (UPI),
and managed by Department of Computer Enginering.
Jl. Dr. Setiabudi No.229, Kota Bandung, Indonesia - 40154
email: coelite@upi.edu
e-ISSN: 2829-4149
p-ISSN: 2829-4157