Weakness of adversarial training: overfit to the attack in use and hence does not generalize to test data
Curriculum adversarial training
思想:train model from weak attack to strong attack
方法
Let
l
l
l denote the attack strength,
K
K
K denote the maximal attack strength.
A
(
l
)
\mathcal{A}(l)
A(l) denotes an attack class parameterized with
l
l
l.
Basic curriculum learning
i). start from no attack;
ii). train the model for one epoch and, once finished, calculate the
l
~
\tilde{l}
l~-accuracy;
iii-a). if
l
~
\tilde{l}
l~ increases at least once over the last 10 epoches, continue training;
iii-b). if
l
~
\tilde{l}
l~ does not increase over the last 10 epoches, set the parameters of the model to be the best ones (i.e. 10 epoches ago), and increase
l
l
l by 1;
iv). Stop when
l
>
K
l>K
l>K.
Benefit: Training efficiency
Additional optimization technique: batch mixing
Motivation: The basic curriculum training can achieve a significantly reduction on the training time, it does not increase the robustness. One issues is
forgetting
\textcolor{red}{\text{\small forgetting}}
forgetting: when the model is trained with a larger
l
l
l, it will forget the adversarial examples generated for a smaller
l
l
l.
Solution: Generate some adversarial examples using
P
G
D
(
i
)
PGD(i)
PGD(i) for each
i
∈
{
0
,
1
,
.
.
.
,
l
}
i \in \{0, 1, ..., l\}
i∈{0,1,...,l}, and combine them to form a batch. The loss function is updated accordingly as:
∑
i
=
0
k
α
i
∑
x
,
y
∼
D
L
(
f
θ
(
A
i
(
x
)
,
y
)
,
\sum_{i=0}^k \alpha_i \sum_{x,y \sim \mathcal{D}}\mathcal{L}(f_\theta(\mathcal{A}_i(x),y),
i=0∑kαix,y∼D∑L(fθ(Ai(x),y),
where
α
i
\alpha_i
αi's are hyperparameters such as
a
i
∈
[
0
,
1
]
,
∑
α
i
=
1
a_i \in [0,1],\sum \alpha_i=1
ai∈[0,1],∑αi=1. The authors set
α
i
=
1
l
+
1
\alpha_i=\frac{1}{l+1}
αi=l+11 and generate the same amount of adversarial examples for each attack strength.
Additional optimization technique: quantization
Motivation: The model trained with CAT may not defend against attacks that are stronger than the strongest attack used during training.
Solution: Employ quantization, i.e. restrict
x
∈
[
0
,
1
]
x \in [0,1]
x∈[0,1] to a
b
b
b-bit integer.
Rationale: Quantization reduces the space of adversarial examples. Specifically, let
x
⋆
x^\star
x⋆ denotes the adversarial example. The difference of
x
⋆
−
x
x^\star-x
x⋆−x takes value from an infinite space if
x
x
x is real-valued; in contrast, it takes value from a finite space if
x
x
x is quantized to an integer vector.
Remark: Quantization is a generic inference time defense technique. This technique alone is not shown to provide resilience against strong white-box attacks. However, it is effective when using together with CAT since the model remembers adversarial example generated by weak attacks. Although a stronger attack can better optimize the loss function, the adversarial examples that it generates are highly likely to coincide with those generated by a weaker attack, because the entire adversarial example space is small.
实验:Improve both efficiency and empirical worst-case accuracy against adversarial examples (termed resilience)
文献:
Cai, Qi-Zhi, Chang Liu, and Dawn Song. “Curriculum adversarial training.” In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 3740-3747. 2018.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)