Assignment - 4 (Machine Learning)

k-NN

Author: Jimut Bahan Pal

As usual, importing the necessary libraries

In [27]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_classification
from sklearn.metrics import confusion_matrix
import matplotlib.patches as mpatches
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import math
%matplotlib inline
sns.set()

Part - 1

Generating artificial data using multivariate Gaussian distributions. For 2 features and 3 classes.

In [28]:
# Artificial data for 2 features and 3 classes

X1 = np.random.multivariate_normal(np.array([0,3]),np.array([[1,0],[0,1]]),200).reshape(200,2)
y1 = np.ones(200).reshape(200,1)

X2 = np.random.multivariate_normal(np.array([4,6]),np.array([[1,0],[0,1]]),200).reshape(200,2)
#X2 = [x + [2] for x in list(X2)]
y2 = np.ones(200).reshape(200,1)*2

X3 = np.random.multivariate_normal(np.array([0,8]),np.array([[1,0],[0,1]]),200).reshape(200,2)
#X3 = [x + [3] for x in list(X3)]
y3 = np.ones(200).reshape(200,1)*3

Plotting the data just for fun

In [29]:
fig = plt.figure(figsize=(16,9))
#ax = plt.axes(projection='3d')
#ax.scatter(full_dataset[:,0],full_dataset[:,1], c=full_dataset[:,2], cmap='viridis', linewidth=0.5);
plt.scatter(list(X1[:,0]), list(X1[:,1]), marker='^',color='red')
plt.scatter(list(X2[:,0]), list(X2[:,1]), marker='o',color='green')
plt.scatter(list(X3[:,0]), list(X3[:,1]), marker='*',color='blue')
#plt.scatter(list(X4[:,0]), list(X4[:,1]), marker='+',color='blue')
plt.show()

Part - 2

Segregating the generated data into training and test datasets (80% and 20%, respectively)

In [30]:
# Split the data into training/testing sets


X_train = np.append(X1[:-40],X2[:-40],axis=0)
X_train = np.append(X_train,X3[:-40],axis=0)

X_test = np.append(X1[-40:],X2[-40:],axis=0)
X_test = np.append(X_test,X3[-40:],axis=0)


y_train = np.append(y1[:-40],y2[:-40],axis=0)
y_train = np.append(y_train,y3[:-40],axis=0)


y_test = np.append(y1[-40:],y2[-40:],axis=0)
y_test = np.append(y_test,y3[-40:],axis=0)
In [31]:
print(len(X_train)," ",len(X_test))
print(len(y_train)," ",len(y_test))
480   120
480   120

Part - 3 and 4

Defining our own function for finding the Euclidean distance between training examples and an test example

$$ d(x^{(i)},x^{(j)}) = \sqrt{\sum_{p=1}^{D} (x_{p}^{(i)} - x_{p}^{(j)})^2} $$

Also, sorting the distances in ascending order and extracting the class information of the k closest examples in the same function

In [32]:
# Computing the euclidean distance and then assigning the most relevant class to the example

def predict_class(
   point_x,                # for the new point's x coordinate
   point_y,                # for the new point's y coordinate
   x_training_dataset,     # contains the training dataset for x values
   y_training_dataset,     # contains the training dataset for y values
   k_val):
    
    # distance between training examples and test example
    distance_class = []
    for item,class_item in zip(x_training_dataset,y_training_dataset):
        x_item = item[0]
        y_item = item[1]
        #print(x_item," ",y_item)
        #print(class_item)
        dist = math.sqrt((math.pow((point_x - x_item),2) + math.pow((point_y - y_item),2)))
        distance_class.append([dist,class_item])
    distance_class.sort()
    get_info = distance_class[:k_val]
    class_val = []
    for item in get_info:
        # get the class values
        class_val.append(int(item[1]))
    return class_val

Part - 5

Assigning the test example to the class with the highest number of occurrences in k nearest examples

In [33]:
class_val = predict_class(X_test[0][0],X_test[0][1],X_train,y_train,k_val=5)
#X_test[0][0]

# assign the most frequent class to the class for maximum value
max(set(class_val), key = class_val.count)
Out[33]:
1

Part - 6

Trying different values of k and computing the accuracy of the model using the test dataset

for K = 1 to 40

In [34]:
# iterating over all the test examples and then computing the accuracy of the model
# Accuracy = No. of correct predictions/ total number of predictions

acc_metrics = []
for k in range(1,40):
    cor_pred = 0
    for item1,item2 in zip(X_test,y_test):
        #print(item1,item2)
        class_val = predict_class(item1[0],item1[1],X_train,y_train,k_val=k)
        class_got = max(set(class_val), key = class_val.count)
        if class_got == int(item2):
            cor_pred += 1
        else:
            print("Wrong class predicted = ",class_got," instead of = ",int(item2))
    accuracy = cor_pred/len(X_test)
    print("K = ",k)
    print("Accuracy = ",accuracy)
    acc_metrics.append([k,accuracy])
Wrong class predicted =  3  instead of =  2
Wrong class predicted =  1  instead of =  3
K =  1
Accuracy =  0.9833333333333333
Wrong class predicted =  1  instead of =  2
Wrong class predicted =  1  instead of =  3
Wrong class predicted =  1  instead of =  3
K =  2
Accuracy =  0.975
Wrong class predicted =  3  instead of =  2
K =  3
Accuracy =  0.9916666666666667
K =  4
Accuracy =  1.0
Wrong class predicted =  3  instead of =  2
K =  5
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  6
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  7
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  8
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  9
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  10
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  11
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  12
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  13
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  14
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  15
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  16
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  17
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  18
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  19
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  20
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  21
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  22
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  23
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  24
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  25
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  26
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  27
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  28
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  29
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  30
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  31
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  32
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  33
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  34
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  35
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  36
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  37
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  38
Accuracy =  0.9916666666666667
Wrong class predicted =  3  instead of =  2
K =  39
Accuracy =  0.9916666666666667
In [35]:
#acc_metrics
sci_acc_metrics = []
for k in range(1,40):
    cor_pred = 0
    model_neigh = KNeighborsClassifier(n_neighbors=k)
    model_neigh.fit(X_train, y_train)
    predicted_values = model_neigh.predict(X_test)
    #print(predicted_values)
    for item1,item2 in zip(predicted_values,y_test):
        if item1 == int(item2):
            cor_pred += 1
        else:
            print("Wrong class predicted = ",item1," instead of = ",int(item2))
    accuracy = cor_pred/len(X_test)
    print("K = ",k)
    print("Accuracy = ",accuracy)
    sci_acc_metrics.append([k,accuracy])
#sci_acc_metrics
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
Wrong class predicted =  3.0  instead of =  2
Wrong class predicted =  1.0  instead of =  3
K =  1
Accuracy =  0.9833333333333333
Wrong class predicted =  1.0  instead of =  2
Wrong class predicted =  1.0  instead of =  3
Wrong class predicted =  1.0  instead of =  3
K =  2
Accuracy =  0.975
Wrong class predicted =  3.0  instead of =  2
K =  3
Accuracy =  0.9916666666666667
K =  4
Accuracy =  1.0
Wrong class predicted =  3.0  instead of =  2
K =  5
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  6
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  7
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  8
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  9
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  10
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  11
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  12
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  13
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  14
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  15
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  16
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  17
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  18
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  19
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  20
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  21
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  22
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  23
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  24
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  25
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  26
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  27
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  28
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  29
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  30
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  31
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  32
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  33
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  34
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  35
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  36
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  37
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  38
Accuracy =  0.9916666666666667
Wrong class predicted =  3.0  instead of =  2
K =  39
Accuracy =  0.9916666666666667
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
<ipython-input-35-9507eeea4e15>:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  model_neigh.fit(X_train, y_train)
In [36]:
print(len(sci_acc_metrics))
39
In [37]:
get_my = []
get_sci = []
for item1,item2 in zip(acc_metrics,sci_acc_metrics):
    get_my.append(item1[1])
    get_sci.append(item2[1])

Part - 7

Comparing our result with the k-NN algorithm of scikit-learn

In [38]:
fig,ax = plt.subplots(figsize=(16,9))
plt.plot(get_my,color='red')
plt.plot(get_sci,color='blue')
red_patch = mpatches.Patch(color='red', label='custom made kNN')
blue_patch = mpatches.Patch(color='blue', label='scikit learn kNN')
plt.legend(handles=[red_patch,blue_patch],loc="lower right", title="Legend")
ax.set_title('The plot for k values Vs Accuracy')
plt.ylabel('Accuracy ->')
plt.xlabel('k value ->') 
ax.grid(True)
plt.show()

We can clearly see that the accuracy obtained overlaps for different values of k for our algorithm and scikitlearn's implementation (so the red plot lies behind the blue plot), which shows the results are satisfactory.

References

  1. Dripta Maharaj, (2020), Slides, availabe on web https://sites.google.com/view/da220-2019-20 , last accessed on 16.2.2020.
  1. Scikitlearn Contributors, (2019), sklearn.neighbors.KNeighborsClassifier, https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html, last accessed on 16.2.2020.

Acknowledgements

  • Dripta Maharaj