Segmentation of Credit Card Customers Based on Their Credit Card Usage Behavior using The K-Means Algorithm

The intensity of credit card customers in making transactions has increased in the last 10 years in Indonesia. This is both a challenge and an opportunity for the Bank. Customer segmentation information is beneficial to reduce bad debts or increasing customer credit card limit capacity. This study aims to segment credit card customers based on their usage behavior with a clustering approach using the K-means algorithm. While the process of evaluating segmentation results using the silhouette index. Based on the experimental results, six groups are the best number of clusters. The six groups are shopping hobbies, payment process at maturity, payment by installments, withdrawing cash, buying expensive goods, and types that rarely use credit cards.


INTRODUCTION
The era of digital financial development is rapidly linear to the lifestyle of some people in accessing a relatively expensive product.Ease of payment is offered in the form of credit.So that usually, the traders provide choices in payment by installments each period.One payment method in installments is using a credit card (Kumar and Karlina, 2020).The use of credit cards in Indonesia has increased in the last 10 years and it was found that many banks began to be cautious in offering credit card limits due to the inability of prospective debtors to pay debts (Johan and Dewi, 2021).
Of the many factors that determine debtors to use credit cards, some factors that stand out include subjective norms, perceived behavioral control, and perceived benefits (Chien and Devaney, 2001).There are also impulse buying factors that influence credit card users (Cuandra and Kelvin, 2021).For students who use credit cards, there is a relationship between parental influence, financial knowledge, and the student's attitude toward using the credit card (Kashif et al., 2018).There is also a relationship that the number of cards owned negatively related to age and positively related to income level (Jung and Kang, 2021).On the other hand, it was found that the behavior of increasing compulsive shopping was comparable to people with high prestige and those with a low understanding of finance (Khandelwal et al., 2021;Palan et al., 2011).
Several previous studies underlie this research, among others: 1.The research uses the Random Forest algorithm to predict customer satisfaction with credit card services (Yaseen et al., 2020).2. The research aims to identify qualified credit card customers and develop product quality by creating features that suit their needs to increase customer satisfaction.This study used the k-means and CandRT algorithms (Hassani and Taati, 2020).3. The research aims to segment customers based on credit card usage behavior in Africa using k-means clustering (Umuhoza et al., 2020).4. Research aimed at assisting bank management in assessing credit card clients using bidirectional LSTM neural networks by modeling and predicting consumer behavior concerning two aspects: the probability of single and consecutive missed payments for credit card customers (Ala'raj et al., 2021).

METHODS
The research has several stages, including data preparation, modeling and evaluation, cluster interpretation, and visualization of the cluster.This dataset is processed using google collab.

Data
The data used in this study was created in 2018 and is available on the site (see https://www.kaggle.com/arjunbhasin2013/ccdata).This dataset comprises 8.950 rows (credit card owners) and 18 behavioral variables.

Balance
The amount of balance left in their account to make a purchase. 3

Balance Frequency
The intensity with which the balance is updated is scored between 0 and 1 (1 = updated frequently, 0 = not updated frequently). 4

Purchases
The number of purchases made from the account. 5

One Off Purchase
The maximum purchase amount made in a single transaction.

Installments Purchases
The number of purchases made on credit. 7

Cash Advance
The user provides the initial cash. 8

Purchases Frequency
The intensity of purchases made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased).9

One-Off Purchases Frequency
Purchase intensity occurs in one transaction (1 = frequently purchased, 0 = not frequently purchased). 10

Purchases Installments Frequency
The intensity of credit purchases made (1 = often done, 0 = not often done).
11 Cash Advance Frequency Initial cash intensity paid.
12 Cash Advance Trx Number of transactions made with cash in advance 13

Purchases Trx
The number of purchase transactions made.
14 Credit Limit Credit card limit for users.

Payments
The amount of payment made by the user.

Minimum Payments
The minimum amount of payment made by the user.
17 Prc Full Payment The percentage of full payment paid by the user.
18 Tenure Credit card service period for users.

Data Preparation Phase
The data preparation stage is essential in segmentation (Ziafat and Shakeri, 2016).One of the goals is to standardize the dataset.In this research, there are several data preparation processes carried out.
Missing Value: one of the problems faced in the dataset is the missing data component or missing value.This dataset has a lot of missing data, so special handling is needed for this problem.The most common method to replace missing values is to replace the value with the average value of the attribute (Yadav and Roychoudhury, 2018).
There are many outlier values in the dataset.If the outlier is removed, it will cause the loss of so many data records that the model performance will not be maximized (Ala'raj et al., 2021).So, to overcome this, a range is created to handle extreme values.

Segmentation Phase
The credit card customer segmentation process will use a Clustering approach.There are many types of clustering algorithms.However, Partitioning Clustering is the best clustering algorithm to determine k clusters (Yadav and Roychoudhury, 2018;Naeem and Wumaier, 2018).One of the well-known partitioning clustering algorithms is the K-means algorithm.
The K-means algorithm is a non-hierarchical cluster analysis method that partitions objects into one or more groups based on the similarity of their characteristics (Muthahharah and Juhari, 2021)  while objects that have different characteristics are grouped into other clusters (Khormarudin, 2016).In other words, the K-Means algorithm aims to minimize variation between data in a cluster and maximize variation with data in different clusters (Sulistiyawati and Supriyanto, 2021).The learning algorithm groups the data into k clusters without knowing the target class.This learning is included in unsupervised learning (Urva, 2016).Determining k is assisted using the elbow method (Bholowalia and Kumar, 2018).The following process is to evaluate the segmentation results.The evaluation process uses the Silhouette Index (Si).The stages follow (Gustriansyah et al., 2019).

RESULT AND DISCUSSION
This section presents and discusses the results of credit card customer segmentation, including determining the number of clusters, evaluating the number of clusters, and interpreting the best cluster.

Cluster determination
After the data preparation stage, we tested the best number of clusters for the K-means algorithm and obtained the results of the elbow method test with k = 6.This information is the initial reference in the clustering process.However, other k numbers will also be tested,

Evaluation of the number of clusters
This section will evaluate each number of clusters tested.The best cluster criteria are determined and interpreted based on the Si value (Table 2).The number of k tested includes k = 5,6,7,8,9,10.The clusters obtained are powerful 0,51 < Si < 0,71 Data can be clustered 0,25 < Si < 0,5 The clusters obtained are weak Si < 0,25 No clusters found Based on the experimental results, the number of clusters with the highest Si value is k = 6.In other words, the best number of clusters obtained is k = 6.This result also confirms and strengthens the interpretation results of the elbow method, which provides the same k recommendation.Based on the information in Table 2, the result of k = 6 has not produced strong clusters, the data can be clustered, but the cluster members are not coherent.However, if we continue to look for another number of clusters, then the quality of the resulting cluster is not necessarily as good as k = 6.The reason is that the k > 6 obtained Si pattern tends to shrink.

Best cluster interpretation
We use the Principal Component Analysis (PCA) method to visualize the cluster results to transform the data into 2 dimensions (Abdulhafedh, 2021).The visualization can be seen in Figure 4.
• Red clusters are credit card customers with behaviors that make all types of purchases.
In other words, this type of customer likes to shop.• Dark blue clusters are credit card customers with due payment behavior.Every credit card usage is paid according to the due date.• Green cluster is a customer with behavior where every purchase is made in installments.
• Yellow clusters are customers who use credit cards to withdraw cash.In other words, credit card debt.• Orange cluster is a customer who buys expensive and luxurious goods.
• Purple cluster is a customer with a behavior that does not spend much money.In other words, this type of customer rarely uses credit cards.

CONCLUSION
Segmentation of credit card customers can be done using a clustering approach with the K-means algorithm.In this research, segmenting customers based on their credit card usage behavior is carried out with the K-means algorithm and the PCA method in the visualization process.The best number of clusters obtained is 6, with the interpretation of quality that is not bad.The six customer groups obtained are shopping hobbies, payment processes at maturity, payment by installments, withdrawing cash, buying expensive goods, and types that rarely use credit cards.In the future, it is possible to improve algorithms and handle many variables to obtain good cluster results.

Table 3 .
Interpretation of Si value.