lesson/3-NeuralNetworks/03-Perceptron

分类问题（classification）

感知器（Perceptron）能够解决分类问题。从最简单的二元分类开始

# 引入必要的库
import pylab
from matplotlib import gridspec
from sklearn.datasets import make_classification
import numpy as np
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
import pickle
import os
import gzip

# pick the seed for reproducability - change it to explore the effects of random variations
np.random.seed(1)
import random

构造数据

教程中为了方便理解，给数据赋予了实际含义：用两个指标来判断肿瘤是良性还是恶性：时间和尺寸
即我们的任务就是训练一个感知器，它能够在给定一个肿瘤的 [时间、尺寸] 下，判断出肿瘤是良性还是恶性。下面的程序中用+1和-1来区分良性和恶性

n = 50
X, Y = make_classification(n_samples = n, n_features=2,
                           n_redundant=0, n_informative=2, flip_y=0)
Y = Y*2-1 # convert initial 0/1 values into -1/1
X = X.astype(np.float32); Y = Y.astype(np.int32) # features - float, label - int

# Split the dataset into training and test
train_x, test_x = np.split(X, [ n*8//10])
train_labels, test_labels = np.split(Y, [n*8//10])
print(train_x.shape, train_labels.shape)
print("Features:\n",train_x[0:4])
print("Labels:\n",train_labels[0:4])

bclassificationDataPlot
可以看到我们产生了一些数据，并且用不同的颜色划分了他们的类别（classification）
（忽略了画图的代码，这类工具类的代码用到的时候copy一下就好）

感知器Perceptron

将问题抽象：即对于一个输入向量x（在之前肿瘤的情境下，这里的输入就是一个二维的向量[age , size]），我们需要找到一个矩阵W和一个函数f使得对于任意的输入x，输出我们需要的分类
$$y(x) = f(W^Tx)$$

对于二元分类，很容易想到f可以使用一个分段函数来分类

$$ f(x) = \begin{cases} +1,\,\,x>0\\ -1,\,\,x\le0\\ \end{cases} $$

而对于矩阵W的理解，可以认为是对向量x的一个变换，将两个类别的分界点通过W变换到f需要的分界点（即x == 0）处

此处Suchan想到的一个疑问是：$W^T x$得到的仍然是一个向量，如何放到f中计算：

对于二元分类来说是一维的，即f的输入仅是一个数，所以W就应该是一个向量，[1 * n]与[n * 1]的向量乘积将得到一个数。倘若是多元的分类（例如二维，四元分类），则对于f则需要对应维度的x来表示这个点应当归属到哪个class中

事实上，一个感知器是线性的，在获得W之后我们通常还需要一个常量（bias），即完整的式子是
$$y(x) = f(W^T x + b)$$
熟悉线性代数的朋友应该能反应过来，可以通过增加维度的方式，将bias糅合到W中，以下是一个例子

$$ \begin{bmatrix} 1 & 2 \\ 3&4\end{bmatrix} \begin{bmatrix} 1 \\ 1\end{bmatrix} + \begin{bmatrix} 1 \\ 1\end{bmatrix} = \begin{bmatrix} 4 \\ 8\end{bmatrix} $$

可以等价于

$$ \begin{bmatrix} 1 & 2 & 1\\ 3&4& 1\end{bmatrix} \begin{bmatrix} 1 \\ 1 \\ 1\end{bmatrix} = \begin{bmatrix} 4 \\ 8\end{bmatrix} $$

所以我们直接给数据增加一个维度，这个维度的值始终为1，就将bias这一项变成了简单的矩阵乘法

训练模型

训练模型的过程就是找到一个矩阵W，使得通过这个W获得的结果在训练集(train_data)上错误(error)最小

定义损失函数(loss)

通常对于回归问题来说，损失函数可以是直接对输出(output)与标签(label)的差的绝对值求和，或者对其平方求和

回归问题(regression)：对于给定的输入，输出一个值。训练的模型通常用于预测值，如股票等

通常对于分类问题，损失函数通常使用交叉熵（之后介绍）
对于此处的问题，我们借助于输出的分类仅有可能是+1或者-1来做文章：$t \in \{ +1 , -1 \} $
规定当模型的输出与标签不相符时

$$E(w) = - \sum W^t x t$$

简单来说就是当输出与标签不相符时，$xt$的结果小于0，与前面的负号相抵消得到一个正值，loss越大，效果越差。
只是不好的点在于：这样定义的loss只是告诉了你当前的结果是差的，但是没有反应到底有多差。比如我们通常希望在$W_1$的结果下loss是1，当更新到$W_2$的时候loss是0.5，这样会在一定程度上告诉我们训练模型是否有进展

修改权重矩阵W

最常用的就是梯度下降法(SGD)，简单来说就是将loss函数对要更新的参数求偏导后，往梯度下降的方向更新参数，从而获得更小的loss
对于这个case下的loss函数，我们对W求偏导后得到
$$\frac{\partial E}{\partial W} = -\sum xt $$
所以我们更新W的方式就是
$$W^{T+1} = W^T - \eta \frac{\partial E}{\partial W} = W^T + \eta\sum xt$$
式中$ \eta $代表学习速率，或者更新步长，通常是一个常数(在特殊的优化方法下将是可变的)

def train(positive_examples, negative_examples, num_iterations = 100):
    num_dims = positive_examples.shape[1]
    
    # Initialize weights. 
    # We initialize with 0 for simplicity, but random initialization is also a good idea
    weights = np.zeros((num_dims,1)) 
    
    pos_count = positive_examples.shape[0]
    neg_count = negative_examples.shape[0]
    
    report_frequency = 10
    
    for i in range(num_iterations):
        # Pick one positive and one negative example
        pos = random.choice(positive_examples)
        neg = random.choice(negative_examples)

        z = np.dot(pos, weights)   
        if z < 0: # 正值标签获得了负值输出
            weights = weights + pos.reshape(weights.shape)

        z  = np.dot(neg, weights)
        if z >= 0: # 负值标签获得了正值输出
            weights = weights - neg.reshape(weights.shape)
            
        # Periodically, print out the current accuracy on all examples 
        if i % report_frequency == 0:             
            pos_out = np.dot(positive_examples, weights)
            neg_out = np.dot(negative_examples, weights)        
            pos_correct = (pos_out >= 0).sum() / float(pos_count)
            neg_correct = (neg_out < 0).sum() / float(neg_count)
            print("Iteration={}, pos correct={}, neg correct={}".format(i,pos_correct,neg_correct))

    return weights

运行教程中的程序可以看到经历几轮的训练，正确率在逐步上升
exampleResult
通过其可视化数据来看，我们训练的模型确实找到了一个分界点(线)来将数据分类
visualizeResult

相信通过上图，读者朋友能够很直观的看出来，为什么前文提到感知器是线性的

检验结果

教程中为我们编写好了accuracy函数帮助我们后续检测我们自己训练的模型的准确率有多高

def accuracy(weights, test_x, test_labels):
    res = np.dot(np.c_[test_x,np.ones(len(test_x))],weights)
    return (res.reshape(test_labels.shape)*test_labels>=0).sum()/float(len(test_labels))

accuracy(wts, test_x, test_labels)

limitation of the Perceptron

通过上面的可视化结果我们可以联想到：当前case由于其特性能够找到一条直线将数据分为两个类，倘若分界线是一条曲线该如何？
确实，一个经典的案例就是异或问题(XOR)
visualXOR
当数据真值呈异或分布的时候，一个线性的感知器无论如何划线，其正确率始终无法高于75%。其解决办法将会在之后的小节讲述：多层感知器（深度学习）

较复杂的例子：MNIST

MNIST是一个手写数字的数据库。该数据库常用于做数字识别。
所有的数字都用28 * 28像素的灰度图表示
我们要做的就是利用MNIST训练一个模型，使其能够识别其中的数字
数据库下载连接(原教程中被不小心删除一直没有添加)

加载数据库

1 2	with gzip.open('./lab/mnist.pkl.gz', 'rb') as mnist_pickle:#下载了数据库后存放的位置 MNIST = pickle.load(mnist_pickle)

通过教程的绘图代码可以形象看到数据的形式
dataplot

训练

本例中的分类模型是二元分类，教程尝试的是拿出两个数字的灰度图进行分类，一个数字做+1一个数字做-1

def set_mnist_pos_neg(positive_label, negative_label):
    positive_indices = [i for i, j in enumerate(MNIST['Train']['Labels']) 
                          if j == positive_label]
    negative_indices = [i for i, j in enumerate(MNIST['Train']['Labels']) 
                          if j == negative_label]

    positive_images = MNIST['Train']['Features'][positive_indices]
    negative_images = MNIST['Train']['Features'][negative_indices]

    fig = pylab.figure()
    ax = fig.add_subplot(1, 2, 1)
    pylab.imshow(positive_images[0].reshape(28,28), cmap='gray', interpolation='nearest')
    ax.set_xticks([])
    ax.set_yticks([])
    ax = fig.add_subplot(1, 2, 2)
    pylab.imshow(negative_images[0].reshape(28,28), cmap='gray', interpolation='nearest')
    ax.set_xticks([])
    ax.set_yticks([])
    pylab.show()
    
    return positive_images, negative_images

pos1,neg1 = set_mnist_pos_neg(1,0)#用0和1来做分类
train(pos1 , neg1)#训练模型

运行程序可以看到训练的准确率在上升，此时模型可以辨别1和0两个数字

讨论

由于一些原因，模型对数字2和5更难区分。为了理解其中的原因，尝试使用PCA分析来降低feature的维度，使得将高维的输入(MNIST每个图片是一个784维的向量)降低到两个维度

from sklearn.decomposition import PCA

def pca_analysis(positive_label, negative_label):
    positive_images, negative_images = set_mnist_pos_neg(positive_label, negative_label)
    M = np.append(positive_images, negative_images, 0)

    mypca = PCA(n_components=2)
    mypca.fit(M)
    
    pos_points = mypca.transform(positive_images[:200])
    neg_points = mypca.transform(negative_images[:200])

    pylab.plot(pos_points[:,0], pos_points[:,1], 'bo')
    pylab.plot(neg_points[:,0], neg_points[:,1], 'ro')


pca_analysis(1,0)
pca_analysis(2,5)