Question Generator System of Sentence Completion in TOEFL Using NLP and K-Nearest Neighbor

Lala Septem Riza, Anita Dyah Pertiwi, Eka Fitrajaya Rahman, Munir Munir, Cep Ubad Abdullah


Test of English as a Foreign Language (TOEFL) is one of learning evaluation forms that requires excellent quality of questions. Preparing TOEFL questions using a conventional way certainly spends a lot of time. Computer technology can be used to solve the problem. Therefore, this research was conducted in order to solve the problem of making TOEFL questions with sentence completion type. The built system consists of several stages: (1) input data collection from foreign media news sites with excellent English grammar quality; (2) preprocessing with Natural Language Processing (NLP); (3) Part of Speech (POS) tagging; (4) question feature extraction; (5) separation and selection of news sentences; (6) determination and value collection of seven features; (7) conversion of categorical data value; (8) target classification of blank position word with K-Nearest Neighbor (KNN); (9) heuristic determination of rules from human experts; and (10) options selection or distraction based on heuristic rules. After conducting the experiment on 10 news, it is obtained that 20 questions based on the results of the evaluation showed that the generated questions had a very good quality with percentage of 81.93% (after the assessment by the human expert), and 70% was the same blank position from the historical data of TOEFL questions. So, it can be concluded that the generated question has the following characteristics: the quality of the result follows the data training from the historical TOEFL questions, and the quality of the distraction is very good because it is derived from the heuristics of human experts.


Automatic question generation, Natural Language Processing, Machine Learning; K-Nearest Neighbor; Education; Learning

Full Text:



Agarwal, M., Shah, R., and Mannem, P. (2011). Automatic question generation using discourse cues. In Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 1-9). Association for Computational Linguistics.

Alderson, J. C., and Hamp-Lyons, L. (1996). TOEFL preparation courses: A study of washback. Language Testing, 13(3), 280-297.

Aldabe, I., De Lacalle, M. L., Maritxalar, M., Martinez, E., and Uria, L. (2006). Arikiturri: an automatic question generator based on corpora and nlp techniques. In International Conference on Intelligent Tutoring Systems (pp. 584-594). Springer, Berlin, Heidelberg.

Aquino, J. F., Chua, D. D., Kabiling, R. K., Pingco, J. N., and Sagum, R. (2011). Text2Test: Question generator utilizing information abstraction techniques and question generation methods for narrative and declarative text. In Proceedings of the 8th National Natural Language Processing Research Symposium (pp. 29-34).

Araki, J., Rajagopal, D., Sankaranarayanan, S., Holm, S., Yamakawa, Y., and Mitamura, T. (2016). Generating questions and multiple-choice answers using semantic analysis of texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 1125-1136).

Cen, G., Dong, Y., Gao, W., Yu, L., See, S., Wang, Q., and Jiang, H. (2010). A implementation of an automatic examination paper generation system. Mathematical and Computer Modelling, 51(11-12), 1339-1342.

Chen, C. Y., Liou, H. C., and Chang, J. S. (2006). Fast: an automatic generation system for grammar tests. In Proceedings of the COLING/ACL on Interactive presentation sessions (pp. 1-4). Association for Computational Linguistics.

Chesla, E. (2002). TOEFL Exam success from LearningExpress . New York: LearningExpress.

Chowdhury, G. G. (2003). Natural language processing. Annual review of information science and technology, 37(1), 51-89.

Davy, E., and Davy, K. (2006). Peterson’s Master TOEFL Vocabulary. USA: Petersons Company, 2006.

ETS, TOEFL Practice TESTS volume 1, Princeton, 2003.

Goto, T., Kojiri, T., Watanabe, T., Iwata, T., and Yamada, T. (2010). Automatic generation system of multiple-choice cloze questions and its evaluation. Knowledge Management and E-Learning, 2(3), 210.

Hill, J., and Simha, R. (2016). Automatic Generation of Context-Based Fill-in-the-Blank Exercises Using Co-occurrence Likelihoods and Google n-grams. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 23-30).

Hoshino, A., and Nakagawa, H. (2005). A real-time multiple-choice question generation for language testing: a preliminary study. In Proceedings of the second workshop on Building Educational Applications Using NLP (pp. 17-20). Association for Computational Linguistics.

Huang, Y., and He, L. (2016). Automatic generation of short answer questions for reading comprehension assessment. Natural Language Engineering, 22(3), 457-489.

Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., and McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations (pp. 55-60).

Majumder, M., and Saha, S. K. (2015). A system for generating multiple choice questions: With a novel approach for sentence selection. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 64-72).

Marcus, M. P., Marcinkiewicz, M. A., and Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313-330.

Nilsson, N. J. (1998). Introduction to Machine Learning. California, Amerika.

Pannu, S., Krishna, A., Kumari, S., Patra, R., and Saha, S. K. (2018). Automatic Generation of Fill-in-the-Blank Questions From History Books for School-Level Evaluation. In Progress in Computing, Analytics and Networking (pp. 461-469). Springer, Singapore.

Papasalouros, A., Kanaris, K., and Kotis, K. (2008). Automatic Generation of Multiple Choice Questions From Domain Ontologies. In e-learning, 427-434.

Pardiyono, (2005). TOEFL Practical Strategy for The Best Scores. Yogyakarta: ANDI.

Phillips, D. (2001). Longman Complete Course for the TOEFL Test: Preparation for the Computer and Paper Tests. New York: Pearson Education.

Riyanto, S. (2011a). Easy TOEIC: Test of English for International Communication. Yogyakarta: Pustaka Pelajar.

Riyanto, S. (2011b). Easy TOEFL. Yogyakarta: Pustaka Pelajar.

Stufflebeam, D. L. (1971). The use of experimental design in educational evaluation. Journal of Educational Measurement, 8(4), 267-274.

Susanti, Y., Iida, R., and Tokunaga, T. (2015). Automatic generation of english vocabulary tests. In CSEDU (1) (pp. 77-87).



  • There are currently no refbacks.

Copyright (c) 2019 Indonesian Journal of Science and Technology

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Science and Technology is published by UPI.
StatCounter - Free Web Tracker and Counter
View My Stats