LinearClassification - SciPy wiki dump (original) (raw)

Fisher's Linear Discriminant

The first example shows the implementation of Fisher's Linear Classifier for 2-class problem and this algorithm is precisely described in book "Pattern Recognition and Machine Learning" by Christopher M Bishop (p 186, Section 4.1). The main idea of this algorithm is that we try to reduce the dimensionality of input vector X and project it onto 1D space using the equation y=W.T X where W.T - row vector of weights, and we adjust the weight vector W and choose the projection that maximizes the class separation. The following program use the famouse data set Iris with 150 number of instances and 4 attributes (4D space), target vector which contains labels: "Iris-setosa", "Iris-virginica", "Iris-versicolor", therefore, we have 3 classes, but, in this case, we may assume that we have class 1 with labels "Iris-setosa" and class 2 with other instances. Iris data set is available here: http://archive.ics.uci.edu/ml/datasets/Iris/ or here (comma separated format) - bezdekIris.data.txt

1 from future import division 2 import numpy as np 3 import matplotlib.pyplot as plt 4 5 def read_data(): 6 f=open("Iris.txt", 'r') 7 lines=[line.strip() for line in f.readlines()] 8 f.close() 9 10 lines=[line.split(",") for line in lines if line] 11 12 class1=np.array([line[:4] for line in lines if line[-1]=="Iris-setosa"], dtype=np.float) 13 14 class2=np.array([line[:4] for line in lines if line[-1]!="Iris-setosa"], dtype=np.float) 15
16 return class1, class2 17
18
19 def main(): 20 21 class1, class2=read_data() 22 23 mean1=np.mean(class1, axis=0) 24 mean2=np.mean(class2, axis=0) 25
26
27 Sw=np.dot((class1-mean1).T, (class1-mean1))+np.dot((class2-mean2).T, (class2-mean2)) 28
29
30 w=np.dot(np.linalg.inv(Sw), (mean2-mean1)) 31 32 print "vector of max weights", w 33
34 plt.plot(np.dot(class1, w), [0]*class1.shape[0], "bo", label="Iris-setosa") 35 plt.plot(np.dot(class2, w), [0]*class2.shape[0], "go", label="Iris-versicolor and Iris-virginica") 36 plt.legend() 37
38 plt.show() 39 40 main()

Probabilistic Generative Model

This program is the implementation of Probabilistic Generative Model for K-class problem which is also described in book "Pattern Recognition and Machine Learning" by Christopher M Bishop (p 196, Section 4.2). We try to learn the class-conditional densities (likelihood) p(x|Ck) for each class K, and prior probability density p(Ck), then we can compute posterior probability p(Ck|x) by using Bayes rule. Here we assume that p(x|Ck) are 4D Gaussians with parameters uk - mean vector of class K, Sk - covariance matrix of class K, also p(Ck) for all k is 1/3. Then we compute so called quantities ak (variables pc's in the program) and if ak>>aj for all k!=j then assign p(Ck|x)=1 and p(Cj|x)=0.

1 from future import division 2 import numpy as np 3 import matplotlib.pyplot as plt 4 import math 5 6 def read_data(): 7 f=open("Iris.txt", 'r') 8 lines=[line.strip() for line in f.readlines()] 9 f.close() 10 11 lines=[line.split(",") for line in lines if line] 12 13 data=np.array([line[:4] for line in lines if line], dtype=np.float) 14 15 class1=np.array([line[:4] for line in lines if line[-1]=="Iris-setosa"], dtype=np.float) 16 17 class2=np.array([line[:4] for line in lines if line[-1]=="Iris-virginica"], dtype=np.float) 18 19 class3=np.array([line[:4] for line in lines if line[-1]=="Iris-versicolor"], dtype=np.float) 20 21
22 labels=[] 23 for line in lines: 24 strt=line.pop() 25 labels.append(strt) 26
27 labels=[line.split(",") for line in labels if line] 28 t=np.zeros(shape=(150, 3)) 29
30 for i in xrange(len(data)): 31 if labels[i]==["Iris-setosa"]: t[i][0]=1 32 elif labels[i]==["Iris-versicolor"]: t[i][1]=1 33 elif labels[i]==["Iris-virginica"]: t[i][2]=1 34
35 return class1, class2, class3, data, t 36 37 def gaussian(x, mean, cov): 38 xm=np.reshape((x-mean), (-1, 1)) 39 px=1/(math.pow(2.0*math.pi, 2))*1/math.sqrt(np.linalg.det(cov))*math.exp(-(np.dot(np.dot(xm.T, np.linalg.inv(cov)), xm))/2) 40 return px 41 42 def main(): 43 class1, class2, class3, data, t=read_data() 44
45 count=np.zeros(shape=(150,1)) 46 t_assigned=np.zeros(shape=(150, 3)) 47 cov=np.zeros(shape=(3, 4, 4)) 48 mean=np.zeros(shape=(3, 4)) 49 50
51 mean1=class1.mean(axis=0) 52 mean2=class2.mean(axis=0) 53 mean3=class3.mean(axis=0) 54
55 cov1=np.cov(class1, rowvar=0) 56 cov2=np.cov(class2, rowvar=0) 57 cov3=np.cov(class3, rowvar=0) 58
59 60 for i in xrange(len(data)): 61 px1=(1/3.0)*gaussian(data[i], mean1, cov1) 62 px2=(1/3.0)*gaussian(data[i], mean2, cov2) 63 px3=(1/3.0)*gaussian(data[i], mean3, cov3) 64 m=np.max([px1, px2, px3]) 65
66 pc1=((math.exp(px1)*math.exp(-m))*math.exp(m))/((math.exp(px2)*math.exp(-m)+math.exp(px3)*math.exp(-m))*math.exp(m)) 67 pc2=((math.exp(px2)*math.exp(-m))*math.exp(m))/((math.exp(px1)*math.exp(-m)+math.exp(px3)*math.exp(-m))*math.exp(m)) 68 pc3=((math.exp(px3)*math.exp(-m))*math.exp(m))/((math.exp(px1)*math.exp(-m)+math.exp(px2)*math.exp(-m))*math.exp(m)) 69
70 if pc1>pc2 and pc1>pc3: t_assigned[i][0]=1 71 elif pc3>pc1 and pc3>pc2: t_assigned[i][1]=1 72 elif pc2>pc1 and pc2>pc3: t_assigned[i][2]=1 73
74 for j in xrange(3): 75 if t[i][j]-t_assigned[i][j]!=0: count[i]=1 76
77 cov=[cov1, cov2, cov3] 78 mean=[mean1, mean2, mean3] 79
80 t1=np.zeros(shape=(len(class1), 1)) 81 t2=np.zeros(shape=(len(class2), 1)) 82 t3=np.zeros(shape=(len(class3), 1)) 83 for i in xrange(len(data)): 84 for j in xrange(len(class1)): 85 if t_assigned[i][0]==1: t1[j]=1 86 elif t_assigned[i][1]==1: t2[j]=2 87 elif t_assigned[i][2]==1: t3[j]=3 88
89 plt.plot(t1, "bo", label="Iris-setosa") 90 plt.plot(t2, "go", label="Iris-versicolor") 91 plt.plot(t3, "ro", label="Iris-virginica") 92 plt.legend() 93 plt.show() 94
95 print "number of misclassifications", sum(count), "assigned labels to data points", t_assigned, "target data", t 96 97 main()

This program resulted in the number of misclassifications = 3 out of all 150 instances

SciPy: Cookbook/LinearClassification (last edited 2015-10-24 17:48:23 by anonymous)