做企业网站怎么样,360网站做推广,赣州安全教育平台,即将上市的手机文章目录前言多分类交叉熵损失函数梯度下降法函数准备python 实战SoftmaxRegression运行结果使用原始特征#xff08;4位二进制#xff09;使用多项式特征#xff08;增强特征空间#xff09;参数分析与对比总结当你迷茫的时候#xff0c;请回头看看 目录大纲#xff0c;…文章目录前言多分类交叉熵损失函数梯度下降法函数准备python 实战SoftmaxRegression运行结果使用原始特征4位二进制使用多项式特征增强特征空间参数分析与对比总结当你迷茫的时候请回头看看 目录大纲也许有你意想不到的收获前言上篇我们讲到了二分类逻辑回归问题的求解并且以异或门举例论证那么本篇就来讲讲多分类逻辑回归问题多分类二分类其实可以看作是特殊的多分类结果就只有是或否可以看作为是分类与否分类当其中一个概率为 1 时true另一个分类概率肯定为 0false因此不需要多分类那样多个输出只需要知晓其中的一个输出 z另一个对立的输出可以通过 1-z 求出来。多分类就是每个分类都有一个输出的概率假设总共有 m 个样本总类别有 C 个例如手写数字输出会有10个类别C10即 0-9 这10个数字的概率。那么它的似然函数就变成了:L ( w ) ∏ i 1 m ∏ j 1 C z i j y i j w j [ w j 1 w j 2 … w j n ] , x i [ x i 1 x i 2 … x i n ] z i j σ ( x i T w j ) z i [ z i 1 z i 2 … z i c ] [ σ ( x i T w 1 ) σ ( x i T w 2 ) … σ ( x i T w c ) ] % 似然函数 L(w)\prod\limits_{i1}^{m}\prod\limits_{j1}^{C} z_{ij}^{y_{ij}}\\[10pt] % w_j 参数 w_j\begin{bmatrix} w_{j1}\\w_{j2}\\\dots\\ w_{jn} \end{bmatrix}, % x_i 输入 x_i\begin{bmatrix} x_{i1}\\x_{i2}\\\dots\\ x_{in} \end{bmatrix}\\[10pt] z_{ij}\sigma(x_i^Tw_j)\\[10pt] z_i\begin{bmatrix} z_{i1}\\ z_{i2}\\ \dots\\ z_{ic}\\ \end{bmatrix} \begin{bmatrix} \sigma(x_i^Tw_1)\\ \sigma(x_i^Tw_2)\\ \dots\\ \sigma(x_i^Tw_c)\\ \end{bmatrix}\\[10pt]L(w)i1∏mj1∏Czijyijwjwj1wj2…wjn,xixi1xi2…xinzijσ(xiTwj)zizi1zi2…zicσ(xiTw1)σ(xiTw2)…σ(xiTwc)这里其实有个中间矩阵第i个样本u i ∈ R c × 1 u_i \in R^{c \times 1}ui∈Rc×1u i [ u i 1 u i 2 … u i c ] [ x i T w 1 x i T w 2 … x i T w c ] [ w 1 T w 2 T … w c T ] x i [ w 11 w 12 … w 1 n w 21 w 22 … w 2 n … w c 1 w c 2 … w c n ] [ x i 1 x i 2 … x i n ] u_i\begin{bmatrix} u_{i1}\\ u_{i2}\\ \dots\\ u_{ic}\\ \end{bmatrix} \begin{bmatrix} x_i^Tw_1\\ x_i^Tw_2\\ \dots\\ x_i^Tw_c\\ \end{bmatrix} \begin{bmatrix} w_1^T\\ w_2^T\\ \dots\\ w_c^T\\ \end{bmatrix}x_i \begin{bmatrix} w_{11}w_{12}\dotsw_{1n}\\ w_{21}w_{22}\dotsw_{2n}\\ \dots\\ w_{c1}w_{c2}\dotsw_{cn}\\ \end{bmatrix} \begin{bmatrix} x_{i1}\\x_{i2}\\\dots\\ x_{in} \end{bmatrix}\\[10pt]uiui1ui2…uicxiTw1xiTw2…xiTwcw1Tw2T…wcTxiw11w21…wc1w12w22wc2………w1nw2nwcnxi1xi2…xin全部样本U ∈ R c × m U \in R^{c \times m}U∈Rc×mU [ u 1 u 2 … u m ] [ w 11 w 12 … w 1 n w 21 w 22 … w 2 n … w c 1 w c 2 … w c n ] [ x 11 x 21 … x m 1 x 12 x 22 … x m 2 … x 1 n x 2 n … x m n ] U\begin{bmatrix} u_1u_2\dotsu_m \end{bmatrix} \begin{bmatrix} w_{11}w_{12}\dotsw_{1n}\\ w_{21}w_{22}\dotsw_{2n}\\ \dots\\ w_{c1}w_{c2}\dotsw_{cn}\\ \end{bmatrix} \begin{bmatrix} x_{11}x_{21}\dotsx_{m1}\\ x_{12}x_{22}\dotsx_{m2}\\ \dots\\ x_{1n}x_{2n}\dotsx_{mn}\\ \end{bmatrix}\\[10pt]U[u1u2…um]w11w21…wc1w12w22wc2………w1nw2nwcnx11x12…x1nx21x22x2n………xm1xm2xmn不过通通常习惯写成 X 在前W 在后那么其实就是转置一下U ∈ R m × c U \in R^{m \times c}U∈Rm×cU [ u 1 T u 2 T … u m T ] [ u 11 u 12 … u 1 c u 21 u 22 … u 2 c … u m 1 u m 2 … u m c ] [ x 11 x 12 … x 1 n x 21 x 22 … x 2 n … x m 1 x m 2 … x m n ] [ w 11 w 21 … w c 1 w 12 w 22 … w c 2 … w 1 n w 2 n … w c n ] X W U\begin{bmatrix} u_1^T\\u_2^T\\\dots\\ u_m^T \end{bmatrix} \begin{bmatrix} u_{11}u_{12}\dotsu_{1c}\\ u_{21}u_{22}\dotsu_{2c}\\ \dots\\ u_{m1}u_{m2}\dotsu_{mc}\\ \end{bmatrix} \begin{bmatrix} x_{11}x_{12}\dotsx_{1n}\\ x_{21}x_{22}\dotsx_{2n}\\ \dots\\ x_{m1}x_{m2}\dotsx_{mn}\\ \end{bmatrix} \begin{bmatrix} w_{11}w_{21}\dotsw_{c1}\\ w_{12}w_{22}\dotsw_{c2}\\ \dots\\ w_{1n}w_{2n}\dotsw_{cn}\\ \end{bmatrix} XWUu1Tu2T…umTu11u21…um1u12u22um2………u1cu2cumcx11x21…xm1x12x22xm2………x1nx2nxmnw11w12…w1nw21w22w2n………wc1wc2wcnXW那么 Z 矩阵为Z [ f ( u 11 ) f ( u 12 ) … f ( u 1 c ) f ( u 21 ) f ( u 22 ) … f ( u 2 c ) … … … … f ( u m 1 ) f ( u m 2 ) … f ( u m c ) ] Z\begin{bmatrix} f{(u_{11})}f{(u_{12})}\dotsf{(u_{1c})}\\ f{(u_{21})}f{(u_{22})}\dotsf{(u_{2c})}\\ \dots\dots\dots\dots\\ f{(u_{m1})}f{(u_{m2})}\dotsf{(u_{mc})}\\ \end{bmatrix}Zf(u11)f(u21)…f(um1)f(u12)f(u22)…f(um2)…………f(u1c)f(u2c)…f(umc)它里面每个项z i j f ( u i j ) z_{ij}f(u_{ij})zijf(uij)i 为样本j 为分类这里只是用来展示全部数据清晰于实际没多大用处实际上还是处理小矩阵运算在二分类问题中z i σ ( u i ) 1 1 e − u i z_i\sigma({u_i})\frac{1}{1e^{-u_i}}ziσ(ui)1e−ui1在多分类问题中我们一般用 softmax 函数在某个样本中softmax 函数如下z k P ( y k ∣ x ; W ) e w k T x ∑ j 1 C e w j T x z_k P(yk|x;W) \frac{e^{w_k^T x}}{\sum_{j1}^{C} e^{w_j^T x}}zkP(yk∣x;W)∑j1CewjTxewkTx对于K个类别输入x ∈ R n x \in \mathbb{R}^{n}x∈Rn参数矩阵W ∈ R C × n W \in \mathbb{R}^{C \times n}W∈RC×n其中w k ∈ R n w_k \in \mathbb{R}^{n}wk∈Rn是第k类的参数向量。其中z k z_kzk第k类的预测概率z [ z 1 , z 2 , . . . , z C ] T z [z_1, z_2, ..., z_C]^Tz[z1,z2,...,zC]T是概率向量多分类对数似然函数ℓ ( w ) log L ( w ) log ( ∏ i 1 m ∏ j 1 C z i j y i j ) ∑ i 1 m ∑ j 1 C ( y i j log z i j ) \ell(w)\log{L(w)}\log{(\prod\limits_{i1}^{m}\prod\limits_{j1}^{C} z_{ij}^{y_{ij}})}\\[10pt] \sum\limits_{i1}^{m}\sum\limits_{j1}^{C}({y_{ij}\log{z_{ij}}})ℓ(w)logL(w)log(i1∏mj1∏Czijyij)i1∑mj1∑C(yijlogzij)交叉熵损失函数交叉熵通常形式H ( p , q ) − ∑ i 1 m ∑ j 1 C ( p j log q j ) H(p,q)-\sum\limits_{i1}^{m}\sum\limits_{j1}^{C}{(p_j\log{q_j})}H(p,q)−i1∑mj1∑C(pjlogqj)与多分类的对数似然函数很像ℓ ( w ) ∑ i 1 m ∑ j 1 C ( y i j log z i j ) \ell(w)\sum\limits_{i1}^{m}\sum\limits_{j1}^{C}({y_{ij}\log{z_{ij}}})ℓ(w)i1∑mj1∑C(yijlogzij)只不过交叉熵比对数似然函数多了一个负号即交叉熵为负对数似然函数H ( p , q ) H ( y i , z i ) − ℓ ( w ) − ∑ i 1 m ∑ j 1 C ( y i j log z i j ) H(p,q)H(y_i,z_i)-\ell(w)-\sum\limits_{i1}^{m}\sum\limits_{j1}^{C}({y_{ij}\log{z_{ij}}})H(p,q)H(yi,zi)−ℓ(w)−i1∑mj1∑C(yijlogzij)当p y i p y_ipyi为 独热编码one-hot时只有 1 个p k y i k 1 p_ky_{ik}1pkyik1其他的p j 0 p_j0pj0则可以简化为H ( p , q ) H ( y i , z i ) − l ( w ) − ∑ i 1 m ( y i k log z i k ) − ∑ i 1 m log z i k H(p,q)H(y_i,z_i)-l(w)-\sum\limits_{i1}^{m}({y_{ik}\log{z_{ik}}})-\sum\limits_{i1}^{m}\log{z_{ik}}H(p,q)H(yi,zi)−l(w)−i1∑m(yiklogzik)−i1∑mlogzik也就是只要盯着某一个类别 k 进行统计。梯度下降法线性问题回归通过 RSS 来判断函数拟合的优劣对于回归回题目标则是求极大似然估计似然函数或对数似然函数越大函数拟合的就越好也可以采用梯度下降法进行求解。这里我们令交叉熵求均值作为损失函数 EE − 1 m ∑ i 1 m ∑ j 1 C ( y i j log z i j ) E -\frac{1}{m}\sum\limits_{i1}^{m}\sum\limits_{j1}^{C}({y_{ij}\log{z_{ij}}})E−m1i1∑mj1∑C(yijlogzij)函数准备令 i 为样本号j 为分类号eg.z i j z_{ij}zij为第 i 个样本第 j 类预测输出的概率y i j y_{ij}yij为真实值u i j x i T w j w j 1 x i 1 w j 2 x i 2 . . . b [ x i 1 x i 2 … 1 ] [ w j 1 w j 2 … b ] z i j f ( u i j ) e u i j ∑ k 1 C e u i k u_{ij}x_i^T w_jw_{j1}x_{i1}w_{j2}x_{i2}...b\\[10pt] \begin{bmatrix} x_{i1} x_{i2}\dots 1 \end{bmatrix} \begin{bmatrix} w_{j1}\\ w_{j2}\\\dots\\ b \end{bmatrix} \\[10pt] z_{ij}f(u_{ij})\frac{e^{u_{ij}}}{\sum_{k1}^{C} e^{u_{ik}}}\\[10pt]uijxiTwjwj1xi1wj2xi2...b[xi1xi2…1]wj1wj2…bzijf(uij)∑k1Ceuikeuij可以把 b 作为w j n w_{jn}wjn那么x j n 1 x_{jn}1xjn1它们的偏导为∂ u i j ∂ w j k x i k \frac{\partial u_{ij}}{\partial w_{jk}} x_{ik}\\[10pt]∂wjk∂uijxik可引入一个 AA 是与u i j u_{ij}uij无关的量对u i j u_{ij}uij偏导时 A 可以视为常量A ∑ k 1 C e u i k − e u i j z i j f ( u i j ) e u i j ∑ k 1 C e u i k e u i j A e u i j 1 − A A e u i j A\sum_{k1}^{C} e^{u_{ik}}-e^{u_{ij}}\\[10pt] z_{ij}f(u_{ij})\frac{e^{u_{ij}}}{\sum_{k1}^{C} e^{u_{ik}}} \frac{e^{u_{ij}}}{Ae^{u_{ij}}}1-\frac{A}{Ae^{u_{ij}}}\\[10pt]Ak1∑Ceuik−euijzijf(uij)∑k1CeuikeuijAeuijeuij1−AeuijA所以∂ z i j ∂ u i j \frac{\partial z_{ij}}{\partial u_{ij}}∂uij∂zij为∴ ∂ z i j ∂ u i j A ⋅ e u i j ( A u i j ) 2 A A e u i j ⋅ e u i j A e u i j ( 1 − e u i j A e u i j ) ⋅ e u i j A e u i j ( 1 − z i j ) ⋅ z i j \therefore\frac{\partial z_{ij}}{\partial u_{ij}} \frac{A \cdot e^{u_{ij}}}{(Au_{ij})^2}\\[10pt] \frac{A}{Ae^{u_{ij}}}\cdot\frac{e^{u_{ij}}}{Ae^{u_{ij}}}\\[10pt] (1-\frac{e^{u_{ij}}}{Ae^{u_{ij}}})\cdot\frac{e^{u_{ij}}}{Ae^{u_{ij}}}\\[10pt] (1-z_{ij})\cdot z_{ij}∴∂uij∂zij(Auij)2A⋅euijAeuijA⋅Aeuijeuij(1−Aeuijeuij)⋅Aeuijeuij(1−zij)⋅zij令E i j y i j ln z i j ∂ E i j ∂ z i j y i j z i j E_{ij}{y_{ij}\ln{z_{ij}}}\\[10pt] \frac{\partial E_{ij}}{\partial z_{ij}}\frac{y_{ij}}{z_{ij}}\\[10pt]Eijyijlnzij∂zij∂Eijzijyij所以有∂ E i j ∂ w j k ∂ E i j ∂ z i j ⋅ ∂ z i j ∂ u i j ⋅ ∂ u i j ∂ w j k y i j z i j ⋅ z i j ( 1 − z i j ) ⋅ x i k y i j ⋅ ( 1 − z i j ) ⋅ x i k \frac{\partial E_{ij}}{\partial w_{jk}} \frac{\partial E_{ij}}{\partial z_{ij}} \cdot \frac{\partial z_{ij}}{\partial u_{ij}}\cdot \frac{\partial u_{ij}}{\partial w_{jk}}\\[10pt] \frac{y_{ij}}{z_{ij}} \cdot z_{ij}(1-z_{ij}) \cdot x_{ik}\\[10pt] y_{ij} \cdot (1-z_{ij}) \cdot x_{ik}∂wjk∂Eij∂zij∂Eij⋅∂uij∂zij⋅∂wjk∂uijzijyij⋅zij(1−zij)⋅xikyij⋅(1−zij)⋅xik损失函数EE − 1 m ∑ i 1 m ∑ j 1 C ( y i j log z i j ) − 1 m ∑ i 1 m ∑ j 1 C E i j ∴ ∂ E ∂ w j k − 1 m ∑ i 1 m ∂ E i j ∂ w j k − 1 m ∑ i 1 m ( y i j ⋅ ( 1 − z i j ) ⋅ x i k ) 1 m ∑ i 1 m ( y i j ⋅ ( z i j − 1 ) ⋅ x i k ) E -\frac{1}{m}\sum\limits_{i1}^{m}\sum\limits_{j1}^{C}({y_{ij}\log{z_{ij}}}) -\frac{1}{m}\sum\limits_{i1}^{m}\sum\limits_{j1}^{C}E_{ij}\\[10pt] \therefore \frac{\partial E}{\partial w_{jk}} -\frac{1}{m}\sum\limits_{i1}^{m}\frac{\partial E_{ij}}{\partial w_{jk}} -\frac{1}{m}\sum\limits_{i1}^{m}(y_{ij} \cdot (1-z_{ij}) \cdot x_{ik}) \frac{1}{m}\sum\limits_{i1}^{m}(y_{ij} \cdot (z_{ij}-1) \cdot x_{ik})E−m1i1∑mj1∑C(yijlogzij)−m1i1∑mj1∑CEij∴∂wjk∂E−m1i1∑m∂wjk∂Eij−m1i1∑m(yij⋅(1−zij)⋅xik)m1i1∑m(yij⋅(zij−1)⋅xik)所以w j k w_{jk}wjk更新如下j 为分类k为 w 向量中的一个w j k ( t 1 ) w j k ( t ) − η Δ r ∂ E ∂ w j k w j k ( t ) − η Δ r 1 m ∑ i 1 m ( y i j ⋅ ( z i j − 1 ) ⋅ x i k ) w_{jk}^(t1) w_{jk}^(t) - \eta\Delta{r}\frac{\partial E}{\partial w_{jk}}\\[10pt] w_{jk}^(t) - \eta\Delta{r}\frac{1}{m}\sum\limits_{i1}^{m}(y_{ij} \cdot (z_{ij}-1) \cdot x_{ik})wjk(t1)wjk(t)−ηΔr∂wjk∂Ewjk(t)−ηΔrm1i1∑m(yij⋅(zij−1)⋅xik)python 实战我们就不用手写数字了因为要用到卷积神经网络后面会讲到这里简化一下用二进制代替手写数字图片每个二进制位是一个输入ix i 4 x_{i4}xi4x i 3 x_{i3}xi3x i 2 x_{i2}xi2x i 1 x_{i1}xi1y i y_iyii100000i200011i300102i400113i501004i601015i701106i801117i910008i1010019SoftmaxRegression定义训练类及相应函数importnumpyasnpimportmatplotlib.pyplotaspltfromsklearn.metricsimportconfusion_matrix,classification_reportimportseabornassnsclassSoftmaxRegression:多分类逻辑回归Softmax回归def__init__(self,n_classes,learning_rate0.1,n_iterations5000,lambda_reg0.01):self.n_classesn_classes self.lrlearning_rate self.n_itern_iterations self.lambda_reglambda_reg# L2正则化系数self.thetaNoneself.loss_history[]self.accuracy_history[]defsoftmax(self,z):Softmax函数数值稳定实现# 减去最大值防止数值溢出z_stablez-np.max(z,axis1,keepdimsTrue)exp_znp.exp(z_stable)returnexp_z/np.sum(exp_z,axis1,keepdimsTrue)defone_hot_encode(self,y):将标签转换为one-hot编码mlen(y)y_one_hotnp.zeros((m,self.n_classes))y_one_hot[np.arange(m),y]1returny_one_hotdefcompute_loss(self,X,y):计算交叉熵损失mX.shape[0]# 添加偏置项X_biasnp.c_[np.ones((m,1)),X]# 添加这行scoresX_bias.dot(self.theta.T)# 使用X_biasprobsself.softmax(scores)# 交叉熵损失y_one_hotself.one_hot_encode(y)loss-np.sum(y_one_hot*np.log(probs1e-15))/m# 添加L2正则化loss(self.lambda_reg/(2*m))*np.sum(self.theta**2)returnlossdeffit(self,X,y,verboseTrue):训练模型m,nX.shape# m个样本n个特征# 添加偏置项增加一列全1X_biasnp.c_[np.ones((m,1)),X]n_featuresn1# 初始化参数矩阵K×(n1)np.random.seed(42)self.thetanp.random.randn(self.n_classes,n_features)*0.01# 训练循环foriterationinrange(self.n_iter):# 前向传播scoresX_bias.dot(self.theta.T)# m×Kprobsself.softmax(scores)# m×K# 将标签转换为one-hot编码y_one_hotself.one_hot_encode(y)# m×K# 计算梯度errorprobs-y_one_hot# m×Kgrad(1/m)*error.T.dot(X_bias)# K×(n1)# 添加L2正则化梯度grad(self.lambda_reg/m)*self.theta# 更新参数self.theta-self.lr*grad# 记录损失和准确率ifiteration%1000:lossself.compute_loss(X,y)y_predself.predict(X)accuracynp.mean(y_predy)self.loss_history.append(loss)self.accuracy_history.append(accuracy)ifverboseanditeration%5000:print(fIteration{iteration}: Loss {loss:.4f}, Accuracy {accuracy:.4f})defpredict_proba(self,X):预测概率mX.shape[0]X_biasnp.c_[np.ones((m,1)),X]scoresX_bias.dot(self.theta.T)returnself.softmax(scores)defpredict(self,X):预测类别probsself.predict_proba(X)returnnp.argmax(probs,axis1)defscore(self,X,y):计算准确率y_predself.predict(X)returnnp.mean(y_predy)defcreate_digit_binary_data():创建0-9数字的二进制数据# 数字0-9的4位二进制表示binary_digits{0:[0,0,0,0],1:[0,0,0,1],2:[0,0,1,0],3:[0,0,1,1],4:[0,1,0,0],5:[0,1,0,1],6:[0,1,1,0],7:[0,1,1,1],8:[1,0,0,0],9:[1,0,0,1]}X[]y[]# 为每个数字创建多个样本添加噪声np.random.seed(42)samples_per_digit50fordigit,binaryinbinary_digits.items():for_inrange(samples_per_digit):# 添加少量噪声noisy_binarybinarynp.random.normal(0,0.1,4)X.append(noisy_binary)y.append(digit)Xnp.array(X)ynp.array(y)returnX,ydefadd_polynomial_features(X):添加多项式特征# 原始特征x1, x2, x3, x4# 添加二次项和交互项X_polynp.hstack([X,X**2,# 平方项X[:,0:1]*X[:,1:2],# x1*x2X[:,0:1]*X[:,2:3],# x1*x3X[:,0:1]*X[:,3:4],# x1*x4X[:,1:2]*X[:,2:3],# x2*x3X[:,1:2]*X[:,3:4],# x2*x4X[:,2:3]*X[:,3:4],# x3*x4])returnX_polydefplot_training_history(model):绘制训练历史fig,axesplt.subplots(1,2,figsize(12,4))# 损失曲线axes[0].plot(model.loss_history)axes[0].set_xlabel(Iteration (x100))axes[0].set_ylabel(Loss)axes[0].set_title(Training Loss History)axes[0].grid(True,alpha0.3)# 准确率曲线axes[1].plot(model.accuracy_history)axes[1].set_xlabel(Iteration (x100))axes[1].set_ylabel(Accuracy)axes[1].set_title(Training Accuracy History)axes[1].grid(True,alpha0.3)plt.tight_layout()plt.show()defplot_confusion_matrix(y_true,y_pred,class_names):绘制混淆矩阵cmconfusion_matrix(y_true,y_pred)plt.figure(figsize(10,8))sns.heatmap(cm,annotTrue,fmtd,cmapBlues,xticklabelsclass_names,yticklabelsclass_names)plt.xlabel(Predicted)plt.ylabel(True)plt.title(Confusion Matrix)plt.show()defvisualize_decision_boundaries(X,y,model,title):可视化决策边界使用PCA降维到2Dfromsklearn.decompositionimportPCA# 使用PCA将特征降至2维以便可视化pcaPCA(n_components2)X_2dpca.fit_transform(X)# 创建网格点x_min,x_maxX_2d[:,0].min()-1,X_2d[:,0].max()1y_min,y_maxX_2d[:,1].min()-1,X_2d[:,1].max()1xx,yynp.meshgrid(np.arange(x_min,x_max,0.1),np.arange(y_min,y_max,0.1))# 预测网格点的类别grid_pointsnp.c_[xx.ravel(),yy.ravel()]# 逆变换回原始空间近似grid_original_spacepca.inverse_transform(grid_points)Zmodel.predict(grid_original_space)ZZ.reshape(xx.shape)# 绘制决策区域plt.figure(figsize(10,8))plt.contourf(xx,yy,Z,alpha0.3,cmapplt.cm.tab10)# 绘制数据点scatterplt.scatter(X_2d[:,0],X_2d[:,1],cy,cmapplt.cm.tab10,edgecolorsk,s100)plt.xlabel(Principal Component 1)plt.ylabel(Principal Component 2)plt.title(fDecision Boundaries -{title}\n(PCA Visualization))plt.colorbar(scatter,labelDigit)plt.grid(True,alpha0.3)plt.show()defmain():print(*60)print(多分类逻辑回归Softmax回归)print(数字0-9分类任务二进制输入)print(*60)# 1. 创建数据X,ycreate_digit_binary_data()print(f数据集形状: X{X.shape}, y{y.shape})print(f类别分布:{np.bincount(y)})print()# 2. 划分训练集和测试集np.random.seed(42)indicesnp.random.permutation(len(X))split_idxint(0.8*len(X))X_train,X_testX[indices[:split_idx]],X[indices[split_idx:]]y_train,y_testy[indices[:split_idx]],y[indices[split_idx:]]print(f训练集:{X_train.shape[0]}个样本)print(f测试集:{X_test.shape[0]}个样本)print()# 3. 使用原始特征print(*60)print(使用原始特征4位二进制)print(*60)model_simpleSoftmaxRegression(n_classes10,learning_rate0.1,n_iterations5000,lambda_reg0.01)model_simple.fit(X_train,y_train)train_accuracymodel_simple.score(X_train,y_train)test_accuracymodel_simple.score(X_test,y_test)print(f\n训练准确率:{train_accuracy:.4f})print(f测试准确率:{test_accuracy:.4f})# 显示预测示例print(\n前10个测试样本的预测:)y_predmodel_simple.predict(X_test[:10])y_probamodel_simple.predict_proba(X_test[:10])foriinrange(min(10,len(X_test))):true_labely_test[i]pred_labely_pred[i]proby_proba[i,pred_label]print(f样本{i1}: 真实{true_label}, 预测{pred_label}, 概率{prob:.4f})plot_training_history(model_simple)plot_confusion_matrix(y_test,model_simple.predict(X_test),[str(i)foriinrange(10)])# 4. 使用多项式特征print(\n*60)print(使用多项式特征增强特征空间)print(*60)X_train_polyadd_polynomial_features(X_train)X_test_polyadd_polynomial_features(X_test)print(f多项式特征维度:{X_train_poly.shape[1]})model_polySoftmaxRegression(n_classes10,learning_rate0.05,n_iterations8000,lambda_reg0.001)model_poly.fit(X_train_poly,y_train)train_accuracy_polymodel_poly.score(X_train_poly,y_train)test_accuracy_polymodel_poly.score(X_test_poly,y_test)print(f\n训练准确率多项式:{train_accuracy_poly:.4f})print(f测试准确率多项式:{test_accuracy_poly:.4f})plot_training_history(model_poly)# 5. 参数分析print(\n*60)print(参数分析)print(*60)print(简单模型参数形状:,model_simple.theta.shape)print(\n数字0对应的参数前5个:)print(model_simple.theta[0,:5])print(\n参数矩阵的L2范数:)foriinrange(10):normnp.linalg.norm(model_simple.theta[i])print(f数字{i}:{norm:.4f})# 6. 性能对比print(\n*60)print(性能对比总结)print(*60)print(f{模型:20}{训练准确率:15}{测试准确率:15})print(-*50)print(f{原始特征:20}{train_accuracy:15.4f}{test_accuracy:15.4f})print(f{多项式特征:20}{train_accuracy_poly:15.4f}{test_accuracy_poly:15.4f})运行结果使用原始特征4位二进制 多分类逻辑回归Softmax回归 数字0-9分类任务二进制输入 数据集形状: X(500, 4), y(500,) 类别分布: [50 50 50 50 50 50 50 50 50 50] 训练集: 400 个样本 测试集: 100 个样本 使用原始特征4位二进制 Iteration 0: Loss 2.2989, Accuracy 0.3050 Iteration 500: Loss 0.7375, Accuracy 1.0000 Iteration 1000: Loss 0.4195, Accuracy 1.0000 Iteration 1500: Loss 0.2925, Accuracy 1.0000 Iteration 2000: Loss 0.2256, Accuracy 1.0000 Iteration 2500: Loss 0.1847, Accuracy 1.0000 Iteration 3000: Loss 0.1571, Accuracy 1.0000 Iteration 3500: Loss 0.1373, Accuracy 1.0000 Iteration 4000: Loss 0.1224, Accuracy 1.0000 Iteration 4500: Loss 0.1108, Accuracy 1.0000 训练准确率: 1.0000 ... 样本7: 真实6, 预测6, 概率0.8818 样本8: 真实7, 预测7, 概率0.9502 样本9: 真实5, 预测5, 概率0.8706 样本10: 真实4, 预测4, 概率0.9324使用多项式特征增强特征空间 使用多项式特征增强特征空间 多项式特征维度: 14 Iteration 0: Loss 2.2965, Accuracy 0.2075 Iteration 500: Loss 0.5923, Accuracy 1.0000 Iteration 1000: Loss 0.3176, Accuracy 1.0000 Iteration 1500: Loss 0.2137, Accuracy 1.0000 Iteration 2000: Loss 0.1607, Accuracy 1.0000 Iteration 2500: Loss 0.1289, Accuracy 1.0000 Iteration 3000: Loss 0.1078, Accuracy 1.0000 Iteration 3500: Loss 0.0928, Accuracy 1.0000 Iteration 4000: Loss 0.0815, Accuracy 1.0000 Iteration 4500: Loss 0.0728, Accuracy 1.0000 Iteration 5000: Loss 0.0658, Accuracy 1.0000 Iteration 5500: Loss 0.0601, Accuracy 1.0000 Iteration 6000: Loss 0.0554, Accuracy 1.0000 Iteration 6500: Loss 0.0513, Accuracy 1.0000 Iteration 7000: Loss 0.0479, Accuracy 1.0000 Iteration 7500: Loss 0.0449, Accuracy 1.0000 训练准确率多项式: 1.0000 测试准确率多项式: 1.0000参数分析与对比 参数分析 简单模型参数形状: (10, 5) 数字0对应的参数前5个: [ 4.91711321 -2.4092373 -2.79234297 -3.09025926 -3.17665747] 参数矩阵的L2范数: 数字0: 7.5776 数字1: 7.3548 数字2: 7.5739 数字3: 7.0336 数字4: 7.5037 数字5: 7.0036 数字6: 6.9899 数字7: 6.3217 数字8: 7.0914 数字9: 6.5200 性能对比总结 模型 训练准确率 测试准确率 -------------------------------------------------- 原始特征 1.0000 1.0000 多项式特征 1.0000 1.0000总结Softmax回归成功地将二分类逻辑回归扩展到多分类问题使用多项式特征可以提升模型性能更多特征维度添加L2正则化防止过拟合模型学习到了从二进制表示到数字类别的映射关系证明了Softmax函数在多分类问题中的有效性