Determining Trending Topics in Twitter with a Data-Streaming Method in R

Melani Mediayani, Yudi Wibisono, Lala Septem Riza, Alejandro Rosales Pérez

Abstract


Trending topics in Twitter is a collection of certain topics that are widely discussed by users. This study aims to design a model and strategy for finding trending topics from data streams on Twitter. The research approach was carried out in four stages, namely twitter data collection, preprocessing data, data analysis with sequential K-Means clustering and information processing. Sequential K-Means is used because it can receive input data sequentially and the cluster center can be updated. Testing of the model is carried out in three scenarios where each scenario is distinguished between the amount of data, time and parameter values. After that, evaluation of the results of clustering will be done using the Dunn Index method. Trending topics twitter application were created using the R language and produce output in the form of histograms. There are five topics being the trending topics in New York before the new year. The topic of "Times" relates to the presence of a new year's celebration night concert in Times Square. The "Hours" topic deals with the calculation of time and seconds towards 2017. "Eve" and "Party" topics relate to celebrations and the topic "Resolution" relating to hope and change for New Yorkers in in 2017.


Keywords


Trending topics; Streaming data; Machine learning; Large datasets; Clustering; Data analysis

Full Text:

PDF

References


Aiello, L. M., Petkos, G., Martin, C., Corney, D., Papadopoulos, S., Skraba, R., and Jaimes, A. (2013). Sensing trending topics in Twitter. IEEE Transactions on Multimedia, 15(6), 1268-1282.

Becker, H., Naaman, M., and Gravano, L. (2011). Beyond trending topics: Real-world event identification on twitter. In Fifth international AAAI conference on weblogs and social media.

Benhardus, J., and Kalita, J. (2013). Streaming trend detection in twitter. International Journal of Web Based Communities, 9(1), 122-139.

Dunn, J. C. (1974). Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4(1), 95-104.

Firdaus, C., Wahyudin, W., and Nugroho, E. P. (2017). Monitoring System with Two Central Facilities Protocol. Indonesian Journal of Science and Technology, 2(1), 8-25.

Kim, D., Kim, D., Rho, S., and Hwang, E. (2013). Detecting trend and bursty keywords using characteristics of Twitter stream data. International Journal of Smart Home, 7(1), 209-220.

Lau, J.H., Collier, N., and Baldwin, T. (2012). On-line trend analysis with topic models:# twitter trends detection topic model online. COLING 2012, 10, 1519-1534.

Lu, R., and Yang, Q. (2012). Trend analysis of news topics on twitter. International Journal of Machine Learning and Computing, 2(3), 327.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Berkeley symposium on mathematical statistics and probability, 1(14), 281-297.

Màrquez, L., and Rodríguez, H. (1998). Part-of-speech tagging using decision trees. European Conference on Machine Learning, 1, 25-36.

Mathioudakis, M., and Koudas, N. (2010). Twittermonitor: trend detection over the twitter stream. ACM SIGMOD International Conference on Management of data, 1, 1155-1158.

Miller, E., Vodrahalli, K., and Lee, A. (2015). Estimating trending topics on twitter with small subsets of the total data. Allen Institute for Artificial Intelligence.

Rachmadany, A., Pranoto, Y.M., Multazam, M.T., Nandiyanto, A.B.D., Abdullah, A.G., and Widiaty, I. (2018). Classification of Indonesian quote on Twitter using Naïve Bayes. IOP Conference Series: Materials Science and Engineering, 288(1), 012162.

Riza, L.S., Asyari, A.H., Prabawa, H.W., Kusnendar, J., and Rahman, E.F. (2018). Parallel particle swarm optimization for determining pressure on water distribution systems in R. Advanced Science Letters, 24(10), 7501-7506.

Riza, L.S., Handian, D., Megasari, R., Abdullah, A.G., Nandiyanto, A.B.D., and Nazir, S. (2018). Development of R package and experimental analysis on prediction of the CO2 compressibility factor using gradient descent. Journal of Engineering Science and Technology, 13(8), 2342-2351.

Riza, L.S., Janusz, A., Bergmeir, C., Cornelis, C., Herrera, F., Śle, D., and Benítez, J.M. (2014). Implementing algorithms of rough set theory and fuzzy rough set theory in the R package “roughsets”. Information Sciences, 287, 68-89.

Riza, L.S., Pradini, M., and Rahman, E.F. (2017). An expert system for diagnosis of sleep disorder using fuzzy rule-based classification systems. IOP Conference Series: Materials Science and Engineering, 185(1), 012011.

Riza, L.S., Nasrulloh, I.F., Junaeti, E., Zain, R., and Nandiyanto, A.B.D. (2016). gradDescentR: An R package implementing gradient descent and its variants for regression tasks. International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), 1, 125-129.

Riza, L. S., Zainafif, A., and Rasim, S. N. (2018). Fuzzy rule-based classification systems for the gender prediction from handwriting. TELKOMNIKA, 16(6), 2725-2732.

Sahdev, R. and Kabra, P. (2013). Prediction of trending topics in online social networks like Twitter. Birla Institute of Technology and Science, Hyderabad.

Salton, G., and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing and management, 24(5), 513-523.

Schweinberger, M. (2016). Part-Of-Speech Tagging with R.

Tan, P. N. (2018). Introduction to data mining. Pearson Education India.

Zubiaga, A., Spina, D., Martínez, R., and Fresno, V. (2015). Real‐time classification of twitter trends. Journal of the Association for Information Science and Technology, 66(3), 462-473.




DOI: http://dx.doi.org/10.17509/ijost.v4i1.15807

Refbacks

  • There are currently no refbacks.


Copyright (c) 2019 Indonesian Journal of Science and Technology

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Science and Technology is published by UPI.
StatCounter - Free Web Tracker and Counter
View My Stats