Introduction to Classification Algorithm [Types]

What is the Classification Algorithm?

Our task in the analysis part starts from the step to know the targeted class. So this whole process is said to be classification. An algorithm is a procedure or formula for solving the problems of mathematics and computer science, which is based on doing the steps in the sequence of specified actions. We can view the computer program as a detailed algorithm. We use an algorithm in almost all information technology.

For example- taking into consideration the search engine algorithm, it takes search strings of keyboards. It operates as input, which is associated with the concerned web pages and gives us results, respectively. An encryption algorithm like that of the US Department of defence’s data encryption standard (des) uses the secret key algorithm to protect the data from getting it hacked or getting viral because leakage of the country's information can put them into danger. As long as the algorithm is sufficiently cited, no one lacking the key can decrypt the secured data.

Some of the Examples of the Target Class

To an analysis of the buyer data to predict whether he would be buying the computer accessories (target class: yes or no)

Grouping and differentiating fruits based on its colour, taste, size, weight (target class: apple, mango, litchi, cherry, papaya, orange, melon, and tomato)

Differentiating the gender from hair length (target class: male or female)

Now, we are going to understand the concept of the classification algorithm with differentiating them according to the gender-based on their hair length (by no means am I trying to boilerplate by gender, this is only for example sake). We should have the proper hair length value. Let us suppose the discern boundary hair length value is 25.0 cm; then, we say that is hair length is more than that, then gender could be male or female.

Dataset Sources and Content

The dataset contains salaries. The following is descriptive of our dataset:

Of classes: 2(">50k" and "<=50k")
Of attributes (columns): 7
Of instances (rows): 48,842

This data was taken from the census bureau database.

Explanation

Two classes of salaries are taken into account. The first one is greater than 50k and the second one is equal to and less than 50k. If we take 7 attributes or columns and rows up to 48,842, considering data from the census bureau database, then we can easily distribute the names of the people with 7 attributes under the two group salaries considered in the initial phase. Hence several calculations and labour work can be avoided using this method.

Applications of Classification Algorithms

Email spam classification
Bank customers loan pay willingness prediction
Cancer tumour cells identification
Sentiment analysis
Drugs classification
Facial keypoints detection
Pedestrians detection in an automotive car driving

Types of Classification Algorithms

Classification algorithms could be broadly classified as the following:

Linear classifiers
Logistic regression
Naïve Bayes classifier
Fisher's linear discriminant
Support vector machines
Least squares support vector
Quadratic classifiers
Kernel estimation
K-nearest neighbour
Decision trees
Random forests
Neural networks
Learning vector quantization

Explanation of Some of the Important Types of Classification Algorithm

1. Logistic Regression

Logistic regression is a classification and not a regression algorithm.

R-code

X< - cbind (x_train, y_train)
# train the model using the training sets and check score logistic < - glm (y_train - ., data = x, family ="binomial")
Summary (logistic)
# predict output

Predicted= predict (logistic, x_test)

There are many steps which can help us to improve the model:

Include interaction terms
Remove features
Regularize techniques
Use a non-linear model

Advantage

It is designed for classification and is most useful to understand the influence of some independent variables on a single outcome variable.

Disadvantages

It works only when the predicted variable is binary.

Suggested Course

Master the Coding Interview: Data Structures + Algorithms

2. Decision Trees

The decision tree supports a supervised learning algorithm using classification problems.

R-code

Library (rpart)
X < - cbind (x_train, y_train)
# grow tree
Fit < - rpart (y_train - ., data = x, method="class")
Summary (fit)
# predict output
Predicted = predict (fit, x_test)

Advantages

A decision tree is simple to understand and visualize, requires little data preparation, and can handle both numerical and categorical data.

Disadvantages

It can be created as complex trees.

3. Naive Bayes Classifier

It takes into assumption the independence between predictors or what's known as Bayes theorem.

It helps us to calculate posterior probability p(c/x) from p(c), p(x) and p(x0/c)

P(c/x) = (p(x/c) p(c)) / p(x)

Here,

P(c/x) is the posterior probability of class (target) stated predictor (attribute).

Example:

Now we will classify it on the bases of the weather that the players will be playing or not.

Step 1: Firstly, we have to convert data set to the frequency table.
Step 2: Now, we have to create a likelihood table by finding the overcast probability = 0.29 and probability of playing is 0.64
Step 3: After the second step, we have to calculate the posterior probability for each class by using the naïve bayesian equation.

Example:

A Golf player will play if the weather is sunny. Is this statement correct?

We will be solving the equation by

P (yes/sunny) =p (sunny/yes)*p (yes)/p (sunny)

Now, p (sunny/yes) =3/9 =0.33,

P (sunny) =5/14 =0.36,
P (yes) =9/14 =0.64.

Now, p (yes/sunny) =0.33*0.64/0.36 =0.60

This has a higher probability.

R-code

Library (e1071)
X < - cbind (x_train, y_train)
# fitting model
Fit < -naivebayes (y_train - ., data= summary (fit)
#predict output
Predicted = predict (fit, x_test)

Advantages

This type of algorithm needs a small amount of training data to estimate the required parameters.
This method is enormously fast compared to more cosmopolitan methods.

Disadvantages

It does not make good estimates.

4. SVM (Support Vector Machine)

It helps in coordinating groups with different features. For example, if we only had two features like the height and hair length of an individual, firstly, we had to plot these two-dimensional spaces where each point has two coordinates, which are known as support vectors.

R-code

Library (e1071)
X < - cbind (x_train, y_train)
# fitting model
Fit < - svm (y_train - ., data = x)
Summary (fit)
#predict output
Predicted = predict (fit, x_test)

Advantages

Good in high dimensional spaces and uses a subset of training points in the decision function, so it is also memory efficient.

Disadvantages

It gives out complex outcomes that may be difficult to understand and analyze.

It would take into consideration any of the groups as per the directions of the users even if they are not relevant.

5. Stochastic Gradient Descent

Stochastic gradient descent is used when the sample size is large.

R-code

From sklearn.linear_model import sgdclassifier

Sgd = sgdclassifier (loss = "modified_huber", shuffle = true, random_state =101)
Sgd.fit (x_train, y_train)
Y_pred = sgd.predict (x_test)

Advantages

Efficiency and ease of implementation.

Disadvantages

It requires several hyper-parameters, and it is sensitive to feature scaling.

Conclusion

Classification Algorithms help ineffective analysis of the buyer data to predict whether he would be buying the computer accessories. It also helps in grouping items and differentiating the inputs from one another, which saves a huge lot of time and effort. As a result, analysis becomes easier, and the process of classification supports speed up the decision-making process, which is vital for maintaining the sustenance and growth of business in the highly competitive world.

Interested in more mathmatical thinking? We recommend this MasterClass course with Terence Tao. His approach to common problem-solving can also be applied to more complex classification algorithms. Or try these DataCamp courses for data science projects.

People are also reading:

Introduction to Classification Algorithm [Types]

What is the Classification Algorithm?

Some of the Examples of the Target Class

Dataset Sources and Content

Applications of Classification Algorithms

Types of Classification Algorithms

Explanation of Some of the Important Types of Classification Algorithm

1. Logistic Regression

R-code

2. Decision Trees

R-code

3. Naive Bayes Classifier

R-code

4. SVM (Support Vector Machine)

R-code

5. Stochastic Gradient Descent

R-code

Conclusion

Learn More

Always be in the loop.