What is the Naive Bayes Algorithm?
The naive Bayes model, irrespective of the strong assumptions that it makes, is often used in practice, because of its simplicity and the small number classification of parameters required. The model is generally used for classification — deciding, based on the values of the evidence variables for a given instance, the class to which the instance is most likely to belong.
A Naive Bayes Classifier algorithm based on Bayes theorem, which offers an insight that it is possible to adjust the probability of an event as new data introduces. It is a probabilistic algorithm which means it calculates the probability of each tag for a given text, and then the output tag with the highest one. The algorithm is not a single one but a collection of different machine learning algorithms that use statistical independence, which is easy to write and run more efficiently than complex Bayes algorithms.
Working of Naive Bayes: Example
Classification
Suppose we have the dataset in which we have the outlook, the humidity and we need to find whether we should play or not on that day. The outlook could be a sunny overcast or rainy and the humidity is high or normal. The wind is categorized into two feeders which are weak winds and strong winds.
Dataset
Day | Outlook | Humidity | Wind | Play |
D1 | Sunny | High | Weak | No |
D2 | Sunny | High | Strong | No |
D3 | Overcast | High | Weak | Yes |
D4 | Rain | High | Weak | Yes |
D5 | Rain | Normal | Weak | Yes |
D6 | Rain | Normal | Strong | No |
D7 | Overcast | Normal | Strong | Yes |
D8 | Sunny | High | Weak | No |
D9 | Sunny | Normal | Weak | Yes |
D10 | Rain | Normal | Weak | Yes |
D11 | Sunny | Normal | Strong | Yes |
D12 | Overcast | High | Strong | Yes |
D13 | Overcast | Normal | Weak | Yes |
D14 | Rain | High | Strong | No |
Frequency tables for each attribute of the data set are given as:
The following are the Likelihood tables generated for each frequency table:
P(x|c) = P(Sunny|Yes) = 3/10 = 0.3
P(x) = P(Sunny) = 5/14 = 0.36
P(c) = P(Yes) = 10/14 = 0.71
Attribute: Outlook: Sunny
Likelihood of “Yes” given sunny is:
P(c|x) = P(Yes|Sunny) = P(Sunny|Yes) * P(Yes) | P(Sunny)
= 0.3 x 0.71/ 0.36 = 0.591
Likelihood of “No” given sunny is:
P(c|x) = P(No|Sunny) = P(Sunny|No) * P(No) | P(Sunny)
= 0.4 x 0.36 / 0.36 = 0.40
Attribute: Humidity: High
Likelihood of “Yes” given high humidity is:
P(c|x) = P(Yes|Humidity) = P(Humidity|Yes) * P(Yes) | P(High)
= 0.33 x 0.6 / 0.36 = 0.42
Likelihood of “No” given high humidity is:
P(c|x) = P(No|High) = P(High|No) * P(No) | P(High).
= 0.8 x 0.36 / 0.5 = 0.58
Attribute: Wind: Weak
Likelihood of “Yes” given weak wind is:
P(c|x) = P(Yes|Humidity) = P(Humidity|Yes) * P(Yes) | P(High)
= 0.67 x 0.64 / 0.57 = 0.75
Likelihood of “No” given weak wind is:
P(c|x) = P(No|High) = P(High|No) * P(No) | P(High)
= 0.4 x 0.36/ 0.57 = 0.25
Suppose we have a day with the following values:
Outlook: Rain
Humidity: High
Wind: Weak
Play:?
Likelihood of ‘No’ on that day.
P(Outlook= Rain|No) * P(Humidity=High|No) * P(Wind = Weak|No) * P(No)
P(Yes)= 0.0199/ (0.0199 + 0.0166) = 0.55
P(No) = 0.0166 / (0.0199 + 0.0166)= 0.45
Thus, the model predicts that there is a 55% chance that there would be a game tomorrow.
Data Science & Machine Learning: Naive Bayes in Python
Steps involve Naive Bayes Algorithm
EXAMPLE: PIMA DIABETIC TEST
The problem comprises 768 observations of medical details of the patients’ records describes instantaneous measurement is taken from the patients such as they age, the number of times pregnant, blood workgroup. All the attributes are numeric and units vary from attribute to attribute. Each record has a class value that indicates whether the patients suffered on the set of diabetes within five years. The whole process can be brought down to five steps:
Step 1: Handling Data
Data is loaded from the CSV File and spread into training and tested assets.
Step 2: Summarizing the Data
Summarise the properties in the training data set to calculate the probabilities and make predictions.
Step 3: Making a Prediction
A particular prediction is made using a summarise of the data set to make a single prediction.
Step 4: Making all the Predictions
Generate prediction given a test data set and a summarise data set.
Step 5: Evaluate Accuracy
Accuracy of the prediction model for the test data set as a percentage correct out of them all the predictions made.
Step 6: Tying all Together
Finally, we tie to all steps together and form our own model of Naive Bayes Classifier.
Code
import csv
import random
import math
import numpy as np
def load_csv(filename):
"""
:param filename: name of csv file
:return: data set as a 2 dimensional list where each row in a list
"""
lines = csv.reader(open(filename, 'r'))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[I]]
return dataset
# data = load_csv('pima-indians-diabetes.data.csv')
# print(data)
def split_dataset(dataset, ratio):
"""
split dataset into training and testing
:param dataset: Two dimensional list
:param ratio: Percentage of data to go into the training set
:return: Training set and testing set
"""
size_of_training_set = int(len(dataset) * ratio)
train_set = []
test_set = list(dataset)
while len(train_set) < size_of_training_set:
index = random.randrange(len(test_set))
train_set.append(test_set.pop(index))
return [train_set, test_set]
# training_set, testing_set = split_dataset(data, 0.67)
# print(training_set)
# print(testing_set)
def separate_by_label(dataset):
"""
:param dataset: two dimensional list of data values
:return: dictionary where labels are keys and
values are the data points with that label
"""
separated = {}
for x in range(len(dataset)):
row = dataset[x]
if row[-1] not in separated:
separated[row[-1]] = []
separated[row[-1]].append(row)
return separated
# separated = separate_by_label(data)
# print(separated)
# print(separated[1])
# print(separated[0])
def calc_mean(last):
return sum(lst) / float(len(last))
def calc_standard_deviation(last):
avg = calc_mean(last)
variance = sum([pow(x - avg, 2) for x in lst]) / float(len(lst) - 1)
return math.sqrt(variance)
# numbers = [1, 2, 3, 4, 5]
# print(calc_mean(numbers))
# print(calc_standard_deviation(numbers))
def summarize_data(last):
"""
Calculate the mean and standard deviation for each attribute
:param lst: list
:return: list with mean and standard deviation for each attribute
"""
summaries = [(calc_mean(attribute), calc_standard_deviation(attribute))
for attribute in zip(*lst)]
del summaries[-1]
return summaries
# summarize_me = [[1, 20, 0], [2, 21, 1], [3, 22, 0]]
# print(summarize_data(summarize_me))
def summarize_by_label(data):
"""
Method to summarize the attributes for each label
:param data:
:return: dict label: [(atr mean, atr stdv), (atr mean, atr stdv)....]
"""
separated_data = separate_by_label(data)
summaries = {}
for label, instances in separated_data.items():
summaries[label] = summarize_data(instances)
return summaries
# fake_data = [[1, 20, 1], [2, 21, 0], [3, 22, 1], [4,22,0]]
# fake_summary = summarize_by_label(fake_data)
def calc_probability(x, mean, standard_deviation):
"""
:param x: value
:param mean: average
:param standard_deviation: standard deviation
:return: probability of that value given a normal distribution
"""
# e ^ -(y - mean)^2 / (2 * (standard deviation)^2)
exponent = math.exp(-(math.pow(x - mean, 2) / (2 * math.pow(standard_deviation, 2))))
# ( 1 / sqrt(2π) ^ exponent
return (1 / (math.sqrt(2 * math.pi) * standard_deviation)) * exponent
# x = 57
# mean = 50
# stand_dev = 5
# print(calc_probability(x, mean, stand_dev))
def calc_label_probabilities(summaries, input_vector):
"""
the probability of a given data instance is calculated by multiplying together
the attribute probabilities for each class. The result is a map of class values
to probabilities.
:param summaries:
:param input_vector:
:return: dict
"""
probabilities = {}
for label, label_summaries in summaries.items():
probabilities[label] = 1
for i in range(len(label_summaries)):
mean, standard_dev = label_summaries[I]
x = input_vector[I]
probabilities[label] *= calc_probability(x, mean, standard_dev)
return probabilities
# fake_input_vec = [1.1, 2.3]
# fake_probabilities = calc_label_probabilities(fake_summary, fake_input_vec)
# print(fake_probabilities)
def predict(summaries, input_vector):
"""
Calculate the probability of a data instance belonging
to each label. We look for the largest probability and return
the associated class.
:param summaries:
:param input_vector:
:return:
"""
probabilities = calc_label_probabilities(summaries, input_vector)
best_label, best_prob = None, -1
for label, probability in probabilities.items():
if best_label is None or probability > best_prob:
best_prob = probability
best_label = label
return best_label
# summaries = {'A': [(1, 0.5)], 'B': [(20, 5.0)]}
# inputVector = 1.1
# print(predict(summaries, inputVector))
def get_predictions(summaries, test_set):
"""
Make predictions for each data instance in our
test dataset
"""
predictions = []
for i in range(len(test_set)):
result = predict(summaries, test_set[i])
predictions.append(result)
return predictions
# summaries = {'A': [(1, 0.5)], 'B': [(20, 5.0)]}
# testSet = [1.1, 19.1]
# predictions = get_predictions(summaries, testSet)
# print(predictions)
def get_accuracy(test_set, predictions):
"""
Compare predictions to class labels in the test dataset
and get our classification accuracy
"""
correct = 0
for i in range(len(test_set)):
if test_set[i][-1] == predictions[i]:
correct += 1
return (correct / float(len(test_set))) * 100
# fake_testSet = [[1, 1, 1, 'a'], [2, 2, 2, 'a'], [3, 3, 3, 'b']]
# fake_predictions = ['a', 'a', 'a']
# fake_accuracy = get_accuracy(fake_testSet, fake_predictions)
# print(fake_accuracy)
def main(filename, split_ratio):
data = load_csv(filename)
training_set, testing_set = split_dataset(data, split_ratio)
print("Size of Training Set: ", len(training_set))
print("Size of Testing Set: ", len(testing_set))
# create model
summaries = summarize_by_label(training_set)
# test mode
predictions = get_predictions(summaries, testing_set)
accuracy = get_accuracy(testing_set, predictions)
print('Accuracy: %'.format(accuracy))
main('pima-indians-diabetes.data.csv', 0.70)
Pros of the Algorithm
- Naive Bayes Algorithm is a highly scalable and fast algorithm.
- Binary and Multiclass classification uses the Naive Bayes algorithm. GaussianNB, MultinomialNB, BernoulliNB are different kinds of algorithms.
- The algorithm depends on doing a bunch of counts.
- An excellent choice for Text Classification problems. It’s a popular choice for spam email classification.
- It can be easily trained on a small dataset.
Cons of the Algorithm
- According to the “Zero Conditional Probability Problem.”, if a given feature and class have frequency 0, then the conditional probability estimate for that category comes out as 0. This problem is cumbersome as it wipes out all the information in other probabilities too. “Laplacian Correction.” is one of the sample correction techniques to fix this problem.
- Another con is that it makes a strong assumption of independence class features. It is nearly impossible to find such data sets in real life.
Applications of Naive Bayes Algorithm
Uses of the Naive Bayes algorithm in multiple real-life scenarios are:
- Text classification: Used as a probabilistic learning method for text classification. The algorithm is the most successful algorithms when classifying text documents, i.e., whether a text document belongs to one or more categories.
- Spam filtration: An example of text classification, is a popular mechanism to distinguish legitimate email from a spam email. Many modern email services implement Bayesian spam filtering. Several server-side email filters, such as SpamBayes, SpamAssassin, DSPAM, ASSP, and Bogofilter, make use of this technique.
- Sentiment Analysis: It is used to analyze the tone of tweets, comments, and reviews, i.e., whether they are positive, neutral, or negative.
- Recommendation System: The Naive Bayes algorithm, combined with collaborative filtering, is used to build hybrid recommendation systems uses, which help in predicting if a user would like a given resource or not
Conclusion
Hopefully, now you have understood what Naive Bayes is, and text classification makes use of it. This simple method works well for classification problems and, computationally speaking, it's also very cheap. Whether the user is a Machine Learning expert or not, they have the tools to build their own Naive Bayes classifier.
Where do you see this algorithm at use? Let us know! Comment Below.
People are also reading: