题目(145):机器学习中的优化问题,哪些是凸优化问题,哪些是非凸优化问题?请各举一个例子。
凸优化问题
逻辑回归
L
i
(
θ
)
=
log
(
1
+
exp
(
−
y
i
θ
T
x
i
)
)
L_i(\theta) = \log(1+\exp(-y_i \theta^T x_i))
Li(θ)=log(1+exp(−yiθTxi))
损失函数推导
logistic regression model:
log
p
1
−
p
=
θ
T
x
⇒
p
=
exp
(
θ
T
x
)
1
+
exp
(
θ
T
x
)
\log \frac{p}{1-p}=\theta^T x \Rightarrow p = \frac{\exp(\theta^T x)}{1+\exp(\theta^T x)}
log1−pp=θTx⇒p=1+exp(θTx)exp(θTx)
max
MLE
≃
−
min
log
MLE
:
=
min
L
(
x
,
y
;
θ
)
\max \text{MLE} \simeq -\min \log \text{MLE}:= \min L(x,y;\theta)
maxMLE≃−minlogMLE:=minL(x,y;θ)
L
=
−
(
y
log
p
+
(
1
−
y
)
log
(
1
−
p
)
)
=
−
y
log
1
1
+
exp
(
−
θ
T
x
)
−
(
1
−
y
)
log
1
1
+
exp
(
θ
T
x
)
=
y
log
(
1
+
exp
(
−
θ
T
x
)
)
+
(
1
−
y
)
log
(
1
+
exp
(
θ
T
x
)
)
=
log
(
1
+
exp
(
−
θ
T
x
⋅
y
)
)
,
\begin{aligned} L &= - (y \log p + (1-y) \log (1-p)) \\ &= - y \log \frac{1}{1+\exp(-\theta^T x)} - (1-y) \log \frac{1}{1+\exp(\theta^T x)}\\ &= y \log (1+\exp(-\theta^T x)) + (1-y) \log (1+\exp(\theta^T x))\\ &=\log (1+\exp(-\theta^T x \cdot y)), \end{aligned}
L=−(ylogp+(1−y)log(1−p))=−ylog1+exp(−θTx)1−(1−y)log1+exp(θTx)1=ylog(1+exp(−θTx))+(1−y)log(1+exp(θTx))=log(1+exp(−θTx⋅y)),
where
Y
∈
{
0
,
1
}
Y \in \{0,1\}
Y∈{0,1} and
p
=
P
(
Y
=
1
∣
X
=
x
)
p=P(Y=1|X=x)
p=P(Y=1∣X=x).
其它例子:SVM, linear regression
\textcolor{red}{\text{\small 其它例子:SVM, linear regression}}
其它例子:SVM, linear regression
非凸优化问题
PCA
min
V
V
T
L
(
V
)
=
∥
X
−
V
T
V
X
∥
F
2
\min_{V V^T}L(V)= \| X-V^T V X\|_F^2
VVTminL(V)=∥X−VTVX∥F2
(minimise
the
reconstruction
error)
\textcolor{gray}{\textit{\small (minimise the reconstruction error)}}
(minimise the reconstruction error)
Formulation from the perspective of maximising the variance
\textcolor{red}{\text{\small Formulation from the perspective of maximising the variance}}
Formulation from the perspective of maximising the variance
验证该目标为非凸问题:检查定义
If
V
∗
V^\ast
V∗ is the minimum, then
−
V
∗
-V^\ast
−V∗ is also the minimum as
L
(
V
∗
)
=
L
(
−
V
∗
)
L(V^\ast)=L(-V^\ast)
L(V∗)=L(−V∗).
L
(
1
2
V
∗
+
1
2
(
−
V
∗
)
)
=
L
(
0
)
=
∥
X
∥
F
2
>
∥
X
−
V
∗
T
V
∗
X
∥
F
2
=
1
2
L
(
V
∗
)
+
1
2
L
(
−
V
∗
)
\begin{aligned} L\large(\frac{1}{2} V^\ast + \frac{1}{2} (-V^\ast) \large)=L(0)&=\|X\|_F^2 \\ &> \| X-V^{\ast T} V^\ast X\|_F^2=\frac{1}{2} L(V^\ast) + \frac{1}{2} L(-V^\ast) \end{aligned}
L(21V∗+21(−V∗))=L(0)=∥X∥F2>∥X−V∗TV∗X∥F2=21L(V∗)+21L(−V∗)
求解:
SVD
\textcolor{red}{\text{\small SVD}}
SVD
其它例子:low-rank model (e.g. matrix decomposition), deep neural network
\textcolor{red}{\text{\small 其它例子:low-rank model (e.g. matrix decomposition), deep neural network}}
其它例子:low-rank model (e.g. matrix decomposition), deep neural network
参考文献:
- 《百面机器学习》
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)