Stemming Algorithm Modification for Overstemming Cases

Stephanie Betha R.H.

Abstract


The stemming process plays an important role in the preprocessing of the text. One of the problems that occur in the stemming process is overstemming. Overstemming is an exaggerated word cut causing situations where a word has a very different meaning, but it produces the same stem. Therefore, to overcome these problems, it will be modified on the stemming process. This modification is done by combining two stemming algorithms (hybrid stemming) that is the look-up algorithm of dictionary table and affix removal algorithm using stemming porter. The modification of this stemming algorithm will be tested on title in scientific publication documents. The test results show that stemming process with modification of stemming algorithm can increase the recall value in the title attribute, although not very significant. The recall in an experiment using title attribute is 89,9%.

Keywords


Stemming modification; Stemming

Full Text:

PDF

References


Abu‐Salem, H., Al‐Omari, M., and Evens, M. W. (1999). Stemming methodologies over individual query words for an Arabic information retrieval system. Journal of the American Society for Information Science, 50(6), 524-529.

Attar, R., and Fraenkel, A. S. (1977). Local feedback in full-text retrieval systems. Journal of the ACM (JACM), 24(3), 397-417.

Bakar, Z. A., Sembok, T. M. T., and Yusoff, M. (2000). An evaluation of retrieval effectiveness using spelling‐correction and string‐similarity matching methods on Malay texts. Journal of the American Society for Information Science, 51(8), 691-706.

Jivani, A. G. (2011). A comparative study of stemming algorithms. Int. J. Comp. Tech. Appl, 2(6), 1930-1938.

Joshi, A., Thomas, N., and Dabhade, M. (2016). Modified porter stemming algorithm. International Journal of Computer Science and Information Technologies, 7(1), 266-269.

Karaa, W. B. A. (2013). A new stemmer to improve information retrieval. International Journal of Network Security and Its Applications, 5(4), 143.

Krallinger, M., Rabal, O., Lourenco, A., Oyarzabal, J., and Valencia, A. (2017). Information retrieval and text mining technologies for chemistry. Chemical reviews, 117(12), 7673-7761.

Moral, C., de Antonio, A., Imbert, R., and Ramírez, J. (2014). A survey of stemming algorithms in information retrieval. Information Research: An International Electronic Journal, 19(1), n1.

Paice, C. D. (1996). Method for evaluation of stemming algorithms based on error counting. Journal of the American Society for Information Science, 47(8), 632-649.

Panigrahi, P. K., and Bele, N. (2016). A review of recent advances in text mining of Indian languages. International Journal of Business Information Systems, 23(2), 175-193.

Patel, M., and Shah, A. (2016). An unsupervised stemming: A review. International Journal of Computer Science and Information Security, 14(7), 476.

Singh, J., and Gupta, V. (2017). A systematic review of text stemming techniques. Artificial Intelligence Review, 48, 157-217.




DOI: https://doi.org/10.17509/jcs.v4i2.71186

Refbacks

  • There are currently no refbacks.