Handwritten Digit Recognition Using Machine Learning Algorithms

Handwritten character recognition is one of the practically important issues in pattern recognition applications. The applications of digit recognition include in postal mail sorting, bank check processing, form data entry, etc. The main problem lies within the ability on developing an efficient algorithm that can recognize hand written digits, which is submitted by users by the way of a scanner, tablet, and other digital devices. This paper presents an approach to off-line handwritten digit recognition based on different machine learning techniques. The main objective of this paper is to ensure the effectiveness and reliability of the approached recognition of handwritten digits. Several machines learning algorithms (i.e. Multilayer Perceptron, Support Vector Machine, Naïve Bayes, Bayes Net, Random Forest, J48, and Random Tree) have been used for the recognition of digits using Waikato Environment for Knowledge Analysis (WEKA). The experimental results showed that the highest accuracy was obtained by Multilayer Perceptron with the value of 90.37%. © 2018 Tim Pengembang Jurnal UPI Article History: Received 03 August 2017 Revised 05 January 2018 Accepted 05 February 2018 Available online 09 April 2018 ____________________ Keyword: Pattern recognition, Handwritten recognition, Digit recognition, Machine learning, Off-line handwritten recognition, Machine learning algorithm.


INTRODUCTION
Intelligent image analysis is one of the most attractive research area in artificial intelligence.One of the subjects in the intellegent image analysis is handwritten digits recognition.Handwritten digits recognition is a wellresearched subarea within the field that is concerned with learning models to distinguish pre-segmented handwritten digits.This has been used widely in data mining, machine learning, and pattern recognition along with many other disciplines of artificial intelligence (Watada & Pedrycz, 2008).The main application has determined efficacious in conforming decisive systems, which compete to human performance and accomplish far for the improvement of classical artificial intelligence systems (Seewald, 2011).However, not all Indonesian Journal of Science & Technology features of those specific models have been previously inspected.
A great attempt of research worker in machine learning and data mining has been contrived to achieve efficient approaches for approximation of recognition from data.In the twenty first century, handwritten digit communication has its own standard.Most of the times in daily life for using this communication have been known in the conversation and the recording for the information to be shared with individuals.One of the challenges in handwritten characters recognition wholly lies in the variation and distortion of handwritten character set.This is because distinct community may use diverse style of handwriting and control to draw the similar pattern of the characters of their recognized script.
Identification of digit from where the best discriminating features can be extracted is one of the major tasks in the area of digit recognition system.To locate such regions, different kinds of region sampling techniques are used in pattern recognition (Das et al., 2012).The challenge in handwritten character recognition is mainly caused by the large variation of individual writing styles (Plamondon & Srihari, 2000).Hence, robust feature extraction is very important to improve the performance of a handwritten character recognition system.Nowadays, handwritten digit recognition has obtained lot of concentration in the area of pattern recognition system sowing to its application in diverse fields.In the next days, character recognition system might serve as a cornerstone to initiate paperless surroundings by digitizing and processing existing paper documents.
Handwritten digit dataset is vague in nature because there may not always be sharp and perfectly straight lines.The main goal in digit recognition is feature extraction is to remove the redundancy from the data and gain a more effective embodiment of the word image through a set of numerical attributes.It deals with extracting most of the essential information from image raw data (AlKhateeb et al., 2011).In addition, the curves are not necessarily smooth like the printed characters.Furthermore, characters dataset can be drawn in different sizes and the orientation, which are always supposed to be written on a guideline in an upright or downright point.Accordingly, an efficient handwritten recognition system can be developed by considering these limitations.It is quiet exhausting that sometimes to identify hand written characters as it can be seen that most of the human beings can not even recognize their own written scripts.Hence, there exists constraint for a writer to write apparently for recognition of handwritten documents.
Before revealing the method used in conducting this research, software engineering module is firstly presented.Pattern recognition along with image processing plays compelling role in the area of handwritten character recognition.The previous study (Tokas & Bhadu, 2012), describes numerous types of classification of feature extraction techniques, such as structural feature-based methods, statistical feature-based methods, and global transformation techniques.Statistical approaches are established for planning on how data are selected.It utilizes the information of the statistical distribution of pixels in the image.(Neves, 2011) provided method based offline handwritten digit recognition system.Authors claim that this method outperforms in the experiment.Experiment is carried out on NIST SD19 standard dataset.Other previous study provide the conversion of handwritten data into electronic data, nature of handwritten characters and the neural network approach to form machine that competent for recognizing hand-written characters (Perwej, 2012).The other study addresses a comprehensive criterion of handwritten digit recognition with various state of the art approaches, feature representations, and datasets (Liu, et al., 2003).However, the relationship in the training set size versus accuracy/error and the dataset-independence of the trained models are analysed (Firdaus, et al., 2017).Either paper presents convolution neural networks into the handwritten digit recognition research and describes a system which can still be considered state of the art (LeCun, et al., 1998).
This paper presents an approach to offline handwritten digit recognition based on different machine learning techniques.The main objective of this paper is to ensure the effectiveness and reliability of the approached recognition of handwritten digits.Several machines learning algorithms (i.e.Multilayer Perceptron, Support Vector Machine, Naïve Bayes, Bayes Net, Random Forest, J48, and Random Tree) have been used for the recognition of digits using Waikato Environment for Knowledge Analysis (WEKA).

Multilayer Perceptions
A neural network-based classifier, called Multi-Layer perception (MLP), is used to classify the handwritten digits.MLP consists of three different layers: input layer, hidden layer, and output layer.Each of the layers can have certain number of nodes also called neurons, and each node in a layer is connected to all other nodes to the next layer (Bhowmik, et al., 2014).For this reason, it is also known as feed forward network.The number of nodes in the input layer depends upon the number of attributes present in the dataset.The number of nodes in the output layer relies on the number of apparent classes exist in the dataset.The convenient number of hidden layers or the convenient number of nodes in a hidden layer for a specific problem is hard to determine.But in general, these numbers are selected experimentally.In multilayer perceptron, the connection between two nodes consists of a weight.During training process, it basically learns the accurate weight adjustment, which is corresponds to each connection (Kruse, et al., 2013).For the learning purpose, it uses a supervised learning technique named as Back propagation algorithm.

Support Vector Machine
Support Vector Machine (SVM) is a specific type of supervised ML method that intents to classify the data points by maximizing the margin among classes in a high-dimensional space (Pereira, 2009).SVM is a representation of examples as points in space, mapped due to the examples of the separate classes are divided by a fair gap that is as extensive as possible.After that new examples are mapped into that same space and anticipated to reside to a category based on which side of the gap they fall.The optimum algorithm is developed through a "training" phase in which training data are adopted to develop an algorithm capable to discriminate between groups earlier defined by the operator (e.g.patients vs. controls), and the "testing" phase in which the algorithm is adopted to blindpredict the group to which a new perception belongs (Orru, et al., 2012).It also provides a very accurate classification performance over the training records and produces enough search space for the accurate classification of future data parameters.Hence, it always ensures a series of parameter combinations no less than on a sensible subset of the data.In SVM, it is better to scale the data always because it will extremely improve the results.Therefore, to be cautious with big dataset, it may lead to the increase in the training time.

J48
The J48 algorithm is developed for the MONK project along with WEKA (Cooper & Herskovits, 1991).The algorithm is an extension for C4.5 decision tree algorithm (Salzberg, 1994).There are many options for tree pruning in case of J48 algorithm.The classification algorithms convenient in WEKA try to clarify the results or prune.This method will help us to produce more generic results and also can be used to correct potential over fitting issues.J48 helps to recursively classify until each of the leaf is pruned, that is to classify as close knit to the data.Hence, this will help to ensure the accuracy although excessive rules will be produced.However, pruning will cause to less accuracy of a model on training data.This is due to pruning employs various means to relax the specificity of the decision tree, hopefully improving its performance on the test data.The complete concept is to increasingly generalize a decision tree until it gains a balance of accuracy together with flexibility.The J48 applies two pruning methods.The first method is known as a subtree replacement.This concludes that nodes in the decision tree can be replaced with a leaf --which reduces the number of tests along a particular path.This process begins from the leaves of the completely formed tree and attempts backwards toward the root.The second method category known as pruning adopted in J48 is termed subtree rising.In this respect, a node can be moved upwards towards the root of the tree, replacing other nodes another way.Subtree rising repeatedly has an insignificant effect on decision tree models.
There is generally no clear way to anticipate the utility of the option, though it may be desirable to try turning it off if the induction process is catching a long time.This is because of the fact that subtree rising may be somewhat computationally complicated.Error rates are needed to make actual conclusions about which parts of the tree to rise or replace.There exist multiple ways to perform this.The straight forward way is to reserve a portion of the training data in order to test on decision tree.Reserved portion may then be adopted as test data for the decision tree, aiding to reduce potential over fitting.This method is recognized as reduced-error pruning.Though the approach is straightforward, it also decreases the overall volume of data available for training the model.For specifically small datasets, it may be advisable to avert using reduced error pruning.

Random Forest Algorithm
Random forest as is an ensemble of unpruned regression or classification trees, activated from bootstrap samples of the training data, adopting random feature selection in the tree imitation process.The prediction is made by accumulating the predictions of the ensemble by superiority voting for classification.It returns generalization error rate and is more potent to noise.Still, similar to most classifiers, RF may also suffer from the curse of learning from an intensely imbalanced training data set.Since it is constructed to mitigate the overall error rate, it will tend to focus more on the prediction efficiency of the majority class, which repeatedly results in poor accuracy for the minority class.

Naive Bayes
The Naive Bayes classifier contributes a simple method, representing and learning probabilistic knowledge with clear semantics (Nguyen & Choi, 2008).It is termed naive due to it relies on two important simplifying assumes that predictive attributes are conditionally self-reliant given the class, and it considers that no hidden attributes influence the prediction method.It is a probabilistic classifier which relies upon Bayes theorem with robust and naive independence assumptions.It is one of the best basic text classification approaches with numerous applications in personal email sorting, email spam detection, sexually explicit content detection, document categorization, sentiment detection, language detection (bin Othman & Yau, 2007).Although the naïve design and oversimplified assumptions that this approach uses, Naive Bayes accomplishes well in many complicated real-world problems.All though it is often out performed by other approaches such as boosted trees, Max Entropy, Support Vector Machines, random forests etc, Naive Bayes classifier is very potential as it is less computationally intensive (in both memory and CPU) and it needs a small extent of training data.Moreover, the training time with Naive Bayes is considerably smaller as opposed to alternative approaches.

Bayes Net
Bayesian networks are a powerful probabilistic representation, and their use for classification has received considerable attention (Bouckaert, 1994).It reflects the states of some part of a world that is being modeled and it describes how those states are related by probabilities.Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest.When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis.The first advantages are because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing.Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention.This classifier learns from training data the conditional probability of each attribute given the class label (Buntine, 1991; online available at "Random Tree", 2015).

Random Tree
The algorithm may deal with both regression and classification problems.Random trees are an ensemble of tree predictors which is called forest.The classification performs as follows: random trees classifier takes the input feature vector, categories it with individual tree in the forest, outputs the class label which received the most of "votes".In the event of a regression, the classifier response is the average of the responses over all the trees in the forest (Trier, et al., 1996).In random tree algorithm, all the trees are trained with the same parameters but on different training sets.These sets are created from the original training set adopting the bootstrap procedure and for each training set, randomly choose the same number of vectors as in the initial set.The vectors are chosen with replacement.That is, some vectors occur more than once and some will be absent.In random trees, there is no need for any accuracy estimation techniques, like cross-validation or bootstrap, or a separate test set to obtain an estimate of the training error.The error is estimated internally during the training.

Dataset Description
The handwritten digit recognition is an extensive research topic which gives a comprehensive survey of the area including major feature sets, learning datasets, and algorithms (Koerich, et al., 2003).Contrary to optical character recognition which focuses on recognition of machine-printed output, special fonts can be used, the variability between characters along with the same size, font, and font attributes is fairly small.The feature extraction and the classification technique play an important role in offline character recognition system performance.Various feature extraction approaches have been proposed for character recognition system (Rahman, et al., 2002).The problems faced in handwritten numeral recognition has been studied while using the techniques like Dynamic programming, HMM, neural network, Knowledge system and combinations of above techniques (Chandrasekaran, et al.,1984).Wider ranging work has been carried out for digit recognition in so many languages like English, Chinese, Japanese, and Arabic.In Indian mainly worked in Devanagari, Tamil, Telugu and Bengali numeral recognition (Seewald, 2005) .
In our experiment, we used digit dataset provided by Austrian Research Institute for Artificial Intelligence, Austria (see Figure 1).This data set indicate that arbitrary scaling and a blur setting of 2.5 for the Mitchell down-sampling filter should perform well and used down-sample to 16x16 pixels.This dataset is divided in two parts training set and testing set.Training set has 1893 samples and test set has 1796 samples.The detail of the dataset is provided in (online available at "Weka Software Documentation", accessed, August 2015).

EXPERIMENTAL TOOLS
WEKA is a prominent suite of machine learning which is written in Java and developed at the University of Waikato.It is free software accessible under the GNU General Public License.It contains a collection of algorithms and visualization tools for predictive modelling, data analysis, along with graphical user interfaces for smooth access to this functionality.It supports various standard data mining tasks, more particularly, data pre-processing, classification, visualization, clustering, feature selection, regression.All of Weka's approaches are predicated on the assumption that the data is convenient as a single flat file or relation, where each data point is characterized through a fixed number of attributes (Le, 2013).
WEKA has numerous user interfaces.Its main user interface is the Explorer, however essentially the same functionality can be accessed by the component-based Knowledge Flow interface and from the command line.
The Experimenter allows the systematic comparison of the predictive performance of the Weka's machine learning algorithms on an accumulation of datasets.

RESULTS AND DISCUSSION
WEKA has several graphical user interfaces that enable easy access to the underlying functionality.To gauge and investigate the performance on the selected methods or algorithms namely Support Vector Machine, Multilayer Perceptron, Random Forest Algorithm, Random Tree, Naïve Bayes, Bayes Net and j48 Decision tree algorithms are used.We used the same experiment procedure as suggested by WEKA.
In WEKA, all dataset is considered as instances and features in the data are also known as attributes.The experiment results are partitioned into several sub division for easier analysis and evaluation.In the first part, correctly and incorrectly classified instances will be divided in numeric and percentage value and subsequently Kappa statistic, mean absolute error and root mean squared error will be in numeric value.Experiment shows the relative absolute error and root relative squared error in percentage (%) for references and in evaluation process.Our simulation results are shown in Table 1 and 2. In Table 1 mainly summarizes the result based on accuracy and time taken for each simulation in our experiment.Moreover, Table 2 shows the result based on error during the simulation in WEKA.
Based on the above Table 1, the highest accuracy is 90.37 % and the lowest is 75.06%.The other algorithm yields an average accuracy of around 83.89%.In fact, the highest accuracy belongs to the Multilayer Perceptron classifier, followed by Support Vector Machine with a percentage of 87.97% and subsequently Random Forest Algorithm (85.75%),Bayes Net (84.35%),Naïve Bayes (81.85%), Figure 1.A small portion of handwritten dataset example Based on (Ciregan, 2012), experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not.Their experiment is only sensitive to high-level concepts such as cat faces and human bodies.Multi-column deep neural networks for image classification have been presented in (Schmidhuber, 2015).They only improve the state-of-the-art on a plethora of common image classification benchmarks.Supervised learning unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks has been presented in (Das et al., 2014).They proposed only how different technique can be used for pattern recognition.In (Das et al., 2014) recognition of handwritten bangla basic characters and digits using convex hull-based feature set has been proposed (Liu, et al., 2013).Their experiment result shows that with a database of 10000 samples, the maximum recognition rate of 76.86% is observed for handwritten Bangla characters.Online and offline handwritten Chinese character recognition has been proposed in literature (Zhang, et al., 2018).Their experiment result reported that the highest test accuracies of 89.55% for offline.In our experiment, different machine learning algorithm has been used for handwrite digit recognition we obtained the highest value of 90.37% for accuracy obtained for Multilayer Perceptron.

CONCLUSION
This study has investigated a representation of isolated handwritten digits for allowing their effective recognition.This paper used different machine learning algorithm for recognition of handwritten numerals.In any recognition process, the important problem is to address the feature extraction and correct classification approaches.The proposed algorithm tries to address both the factors and well in terms of accuracy and time complexity.The overall highest accuracy of 90.37% is achieved in the recognition process by Multilayer Perceptron.This work is carried out as an initial attempt, and the aim of the paper is to facilitate for recognition of handwritten numeral without using any standard classification techniques.

Table 1 .
Simulation result based on accuracy and time consumption.