Sentiment Analysis of Flagship Smartphones on Social Media Using Python TextBlob And Naive Bayes Algorithm

Social media plays a crucial role in the advancement of organizations, industries, and businesses nowadays. Almost everyone is connected to social media. Each individual can interact and exchange knowledge due to the fusion of technology and social relationships. Sentiment analysis is a technique that allows extracting information from users expressing emotions, perspectives, and opinions on the internet. One strategic sector for implementing sentiment analysis is the technology sector, especially the smartphone industry. The wide range of smartphone variants available today poses a problem for individuals in finding the best smartphone product. The sentiment analysis of flagship smartphones conducted in this article aims to find the best solution between two flagship smartphones from renowned manufacturers, namely the Samsung S22 Ultra and the Xiaomi 12 Pro. The data is collected from various social media platforms such as Twitter, YouTube, and GSMArena. The collected data is then analyzed using Python TextBlob, and the analysis results in negative, positive, and neutral sentiments displayed through various visualizations. The final outcome is the assessment of Net Brand Reputation, which evaluates the reputation of a brand across multiple social media platforms.


A B S T R A C T A R T I C L E I N F O
Social media plays a crucial role in the advancement of organizations, industries, and businesses nowadays.Almost everyone is connected to social media.Each individual can interact and exchange knowledge due to the fusion of technology and social relationships.Sentiment analysis is a technique that allows extracting information from users expressing emotions, perspectives, and opinions on the internet.One strategic sector for implementing sentiment analysis is the technology sector, especially the smartphone industry.The wide range of smartphone variants available today poses a problem for individuals in finding the best smartphone product.The sentiment analysis of flagship smartphones conducted in this article aims to find the best solution between two flagship smartphones from renowned manufacturers, namely the Samsung S22 Ultra and the Xiaomi 12 Pro.The data is collected from various social media platforms such as Twitter, YouTube, and GSMArena.The collected data is then analyzed using Python TextBlob, and the analysis results in negative, positive, and neutral sentiments displayed through various visualizations.The final outcome is the assessment of Net Brand Reputation, which evaluates the reputation of a brand across multiple social media platforms.

INTRODUCTION
Currently, many businesses employ various methods to enhance their products.Reaching out to consumers is a commonly used approach for businesses to learn about their customers' sentiments.Customer satisfaction assessment, review surveys, and customer activity tracking are common methods used to gather feedback and insights from.This feedback can provide valuable information for businesses to improve the quality of their products.
In recent years, the smartphone business has experienced significant growth, not only in conventional sales but also in online sales.However, not all smartphones possess high quality to meet the customers' needs, and this is something that customers should be aware of (Bianchi and Andrews, 2015).When purchasing a smartphone, customers should be able to recognize and understand the characteristics and functions of a smartphone, which can be obtained from opinions and user testimonials available on the internet (Ridwan et al., 2013).
Consequently, it has become common on the internet for individuals acting as customers to share their thoughts and personal experiences regarding a smartphone product.However, it takes a considerable amount of time for customers to read through these reviews in their entirety (Zahidi et al., 2021).Sentiment classification is introduced to address this issue by categorizing user reviews into positive or negative categories.
Sentiment analysis is a technique used to detect opinions about a subject (such as individuals, organizations, or products) within a dataset (Christina and Ronaldo, 2018).The widespread use of the internet allows users to engage in product discussions through blog/web posts, online debates, product review sites, and social media platforms.Social media serves as a medium for users to express themselves (Muntinga et al., 2011).Users can utilize social media platforms such as Twitter, YouTube, Facebook, and Google to communicate their opinions and feelings.The output of these activities generates a substantial amount of data rich in concepts, making it highly strategic for business purposes.
This article describes a system that assists customers in understanding the overall emotions regarding flagship smartphones based on user reviews.It aims to aid the decisionmaking process of whether to purchase the related products or not.Data preprocessing will be conducted on the collected corpus of both smartphones from two renowned brands, namely the Samsung S22 Ultra and the Xiaomi 12 Pro, both of which are flagship smartphones from their respective brands.Flagship smartphones represent the best phones each brand has to offer, possessing high-quality specifications (Wong et al., 2019).
The sentiment analysis implemented in this article utilizes Python with the TextBlob package (Utami et al., 2021).TextBlob is an easy-to-use library commonly used for Natural Language Processing (NLP) tasks.The data will be automatically labeled with sentiment using TextBlob before performing analysis and testing on text accuracy, precision, recall, and F1score using the Naive Bayes algorithm.
Previous research has shown that the Naïve Bayes method has a higher accuracy rate than K-NN.Naïve Bayes achieved an accuracy of 87.48%, while K-NN achieved 85.40% in sentiment analysis of explicit homosexual pornography tweets in Indonesia (Pudjajana et al., 2018).In a study conducted by, sentiment analysis on film reviews using the Naive Bayes classification algorithm based on Term Objects Keywords resulted in a final accuracy of 28% for the Naive Bayes classifier (Bilal et al, 2016), with a sample of 100 positive sentiment data and 100 negative sentiment data.(Septian et al., 2019) conducted sentiment analysis of Twitter users' opinions on Indonesian football polemics using TF-IDF weighting and K-Nearest Neighbor.They achieved an optimal accuracy of 79.99% on 2,000 Indonesian-language tweets with k=23.(Astari et al., 2020)  virus using the Naive Bayes Classifier method.The Naive Bayes method achieved an accuracy of 67% and an error rate of 33%.Additionally (Dawid and Skene, 1979), conducted sentiment analysis on the relocation of the Indonesian capital on Twitter.The Bernoulli Naïve Bayes classification achieved an accuracy of 68.10% using scenario 11.The difference between these previous studies and the present study lies in the keywords used (smartphone flagship), the sentiment labeling using Python TextBlob, and the data sources obtained by crawling data from Twitter, YouTube, and the GSMArena website.

METHODS
Methods in research are necessary to provide structure and ensure that the results align with the research objectives (De and Van, 2009).The stages of the research method are depicted as shown in Figure 1.

Crawling Data
Crawling is the activity of creating relevant copies taken from the World Wide Web (Hidayatullah et al., 2020).In this study, crawling was conducted on three different sources: Twitter, YouTube, and the GSMArena website.The data crawled pertains to the Samsung S22 Ultra and Xiaomi 12 Pro, and the crawling process was conducted on December 19, 2022 (Hayuningtyas, 2019).The raw datasets were then saved in CSV (Comma Separated Values) format for further processing.The results of crawling for the keyword "Samsung S22 Ultra" on Twitter, YouTube, and GSMArena were 381, 220, and 2,605, respectively.Meanwhile, the results of crawling for the keyword "Xiaomi 12 Pro" on Twitter, YouTube, and GSMArena were 199, 226, and 725, respectively.

Preprocessing Data
After the data crawling process and saving the data in CSV format, the next step is data preprocessing, as the dataset is still unstructured.The main task of data preprocessing is to remove and handle data noise to achieve optimal calculation results (Kotsiantis et al, 2006).The stages of data preprocessing include data cleaning, case folding, tokenizing, and filtering.The preprocessing data stages are illustrated as shown in Figure 2 (Nurkholis et al., 2022).

Figure 2. Preprocessing data stages
The first stage of data preprocessing is data cleaning, which is necessary to remove noise from the data.The cleaning process involves removing special characters, punctuation marks, hashtags, URLs, and numbers from the data.The next stage is case folding, which is the process of converting all letters to lowercase.Following that, tokenizing is performed, which involves breaking down sentences into smaller units or tokens.The final stage of data preprocessing is filtering, which involves removing meaningless words (stopwords) (Kalaivani and Marivendan, 2021).

TextBlob Labeling Figure 3. Textblob labeling stages
Figure 3 shows the automatic sentiment labeling is performed using the TextBlob library.Important to note that the TextBlob library can only recognize English.Sentiments are categorized as negative, positive, or neutral during this stage of the process.A sentiment is assigned a value of "+1" for positive sentiment, "-1" for negative sentiment, and "0" for neutral sentiment.

Naive Bayes Algorithm Test
The Naive Bayes method is a simple statistical method based on Bayes theorem that assumes the presence or absence of a class based on other features (Jadhav and Channe, 2016).The Naive Bayes method is commonly used for classification tasks to determine p-ISSN 2774-1656 e-ISSN 2774-1699 f1_score accuracy, recall, and precision (Anjasmoros et al., 2020).In this study, the author utilized the Sklearn library for implementing the Naive Bayes formula.

RESULTS AND DISCUSSION
Figure 4 shows he sentiment processing results for the keyword Samsung S22 Ultra indicate 1,552 positive sentiments, 1,019 neutral sentiments, and 635 negative sentiments, as shown in Table 1.The sentiment processing results for the keyword Xiaomi 12 Pro indicate 568 positive sentiments, 380 neutral sentiments, and 203 negative sentiments, as shown in Table 2.After performing crawling, preprocessing, automatic labeling with TextBlob, and classification with Naive Bayes, the next step is to measure the accuracy, precision, recall, and F1 score using the Sklearn library in the Python programming language.The testing data is set at 0.2 and the training data at 0.8.The Naive Bayes testing scores are obtained for each brand (Samsung, Xiaomi) and calculated values are presented as shown in Table 3.

Net Brand Reputation
Net Brand Reputation is a method used to assess the reputation of a brand across various social media platforms.The application of NBR essentially measures the level of customer loyalty.The formula for calculating NBR is as follows:

𝑁𝐵𝑅 = 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 − 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 + 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 𝑥100
Using the aforementioned NBR formula, resulting in the brand reputation where Xiaomi emerges as the preferred brand over Samsung.With the following NBR scores being as follows.

CONCLUSION
From the conducted research, it can be concluded that Sentiment Analysis is a valuable tool for gaining insights into the overall sentiment of the global public towards flagship smartphones from renowned brands such as Samsung S22 Ultra and Xiaomi 12 Pro.The aim is to determine the level of positivity, neutrality, or negativity towards these two smartphone keywords.After automatic labeling using Python TextBlob with a total of 4357 data corpus, the percentages of positive, negative, and neutral sentiments for Samsung S22 Ultra are 48.4%,19.8%, and 31.7%,respectively.For Xiaomi 12 Pro, the percentages of positive, negative, and neutral sentiments are 49.3%, 17.6%, and 33%, respectively.
The testing and training data were used with a ratio of 0.2 and 0.8, respectively, for the Naive Bayes algorithm.The results yielded an accuracy of 66%, precision of 66%, recall of 66%, and an F1 score of 63% for the Samsung S22 Ultra keyword.As for the Xiaomi 12 Pro keyword, the Naive Bayes algorithm resulted in an accuracy of 64%, precision of 62%, recall of 64%, and an F1 score of 61%.Additionally, the Net Brand Reputation was calculated, yielding a score of 41.9% for Samsung and 47.3% for Xiaomi.Based on the calculations and findings above, it can be concluded that the public is more interested in the Xiaomi brand compared to Samsung in terms of flagship smartphones, as Xiaomi has a better reputation score than Samsung.

AUTHOR'S NOTE
The authors state that there are no conflicts of interest related to the publication of this article.The authors also ensure that the paper is free from plagiarism.

Table 1 .
Samsung S22 ultra sentiment results

Table 3 .
Naive bayes test results