机器学习中的凸和非凸优化问题

2023-05-16

题目（145）：机器学习中的优化问题，哪些是凸优化问题，哪些是非凸优化问题？请各举一个例子。

凸优化定义
凸优化问题
非凸优化问题
凸优化定义：公式、geometric insight
凸优化问题：逻辑回归；通过Hessian matrix的半正定性质判定；局部最优等价于全部最优
非凸优化问题：PCA；PCA求解方式

凸优化问题

逻辑回归

L i ( θ ) = log ⁡ ( 1 + exp ⁡ ( − y i θ T x i ) ) L_i(\theta) = \log(1+\exp(-y_i \theta^T x_i)) Li(θ)=log(1+exp(−yiθTxi))

损失函数推导
logistic regression model:
log ⁡ p 1 − p = θ T x ⇒ p = exp ⁡ ( θ T x ) 1 + exp ⁡ ( θ T x ) \log \frac{p}{1-p}=\theta^T x \Rightarrow p = \frac{\exp(\theta^T x)}{1+\exp(\theta^T x)} log1−pp=θTx⇒p=1+exp(θTx)exp(θTx)

max ⁡ MLE ≃ − min ⁡ log ⁡ MLE : = min ⁡ L ( x , y ; θ ) \max \text{MLE} \simeq -\min \log \text{MLE}:= \min L(x,y;\theta) maxMLE≃−minlogMLE:=minL(x,y;θ)

L = − ( y log ⁡ p + ( 1 − y ) log ⁡ ( 1 − p ) ) = − y log ⁡ 1 1 + exp ⁡ ( − θ T x ) − ( 1 − y ) log ⁡ 1 1 + exp ⁡ ( θ T x ) = y log ⁡ ( 1 + exp ⁡ ( − θ T x ) ) + ( 1 − y ) log ⁡ ( 1 + exp ⁡ ( θ T x ) ) = log ⁡ ( 1 + exp ⁡ ( − θ T x ⋅ y ) ) , \begin{aligned} L &= - (y \log p + (1-y) \log (1-p)) \\ &= - y \log \frac{1}{1+\exp(-\theta^T x)} - (1-y) \log \frac{1}{1+\exp(\theta^T x)}\\ &= y \log (1+\exp(-\theta^T x)) + (1-y) \log (1+\exp(\theta^T x))\\ &=\log (1+\exp(-\theta^T x \cdot y)), \end{aligned} L=−(ylogp+(1−y)log(1−p))=−ylog1+exp(−θTx)1−(1−y)log1+exp(θTx)1=ylog(1+exp(−θTx))+(1−y)log(1+exp(θTx))=log(1+exp(−θTx⋅y)),

where Y ∈ { 0 , 1 } Y \in \{0,1\} Y∈{0,1} and p = P ( Y = 1 ∣ X = x ) p=P(Y=1|X=x) p=P(Y=1∣X=x).

其它例子：SVM, linear regression \textcolor{red}{\text{\small 其它例子：SVM, linear regression}} 其它例子：SVM, linear regression

非凸优化问题

PCA

min ⁡ V V T L ( V ) = ∥ X − V T V X ∥ F 2 \min_{V V^T}L(V)= \| X-V^T V X\|_F^2 VVTminL(V)=∥X−VTVX∥F2

(minimise the reconstruction error) \textcolor{gray}{\textit{\small (minimise the reconstruction error)}} (minimise the reconstruction error)

Formulation from the perspective of maximising the variance \textcolor{red}{\text{\small Formulation from the perspective of maximising the variance}} Formulation from the perspective of maximising the variance

验证该目标为非凸问题：检查定义
If V ∗ V^\ast V∗ is the minimum, then − V ∗ -V^\ast −V∗ is also the minimum as L ( V ∗ ) = L ( − V ∗ ) L(V^\ast)=L(-V^\ast) L(V∗)=L(−V∗).
L ( 1 2 V ∗ + 1 2 ( − V ∗ ) ) = L ( 0 ) = ∥ X ∥ F 2 > ∥ X − V ∗ T V ∗ X ∥ F 2 = 1 2 L ( V ∗ ) + 1 2 L ( − V ∗ ) \begin{aligned} L\large(\frac{1}{2} V^\ast + \frac{1}{2} (-V^\ast) \large)=L(0)&=\|X\|_F^2 \\ &> \| X-V^{\ast T} V^\ast X\|_F^2=\frac{1}{2} L(V^\ast) + \frac{1}{2} L(-V^\ast) \end{aligned} L(21V∗+21(−V∗))=L(0)=∥X∥F2>∥X−V∗TV∗X∥F2=21L(V∗)+21L(−V∗)

求解: SVD \textcolor{red}{\text{\small SVD}} SVD

其它例子：low-rank model (e.g. matrix decomposition), deep neural network \textcolor{red}{\text{\small 其它例子：low-rank model (e.g. matrix decomposition), deep neural network}} 其它例子：low-rank model (e.g. matrix decomposition), deep neural network

参考文献：

《百面机器学习》

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)