Softmax 交叉熵损失爆炸

2024-01-08

我正在创建一个用于逐像素分类的深度卷积神经网络。我正在使用 adam 优化器,softmax 和交叉熵。

Github 存储库 https://github.com/dhasl002/Research-DeepLearning

I asked a similar question found here https://stackoverflow.com/questions/48600374/cross-entropy-loss-suddenly-increases-to-infinity but the answer I was given did not result in me solving the problem. I also have a more detailed graph of what it going wrong. Whenever I use softmax, the problem in the graph occurs. I have done many things such as adjusting training and epsilon rates, trying different optimizers, etc. The loss never decreases past 500. I do not shuffle my data at the moment. Using sigmoid in place of softmax results in this problem not occurring. However, my problem has multiple classes, so the accuracy of sigmoid is not very good. It should also be mentioned that when the loss is low, my accuracy is only about 80%, I need much better than this. Why would my loss suddenly spike like this?enter image description here

x = tf.placeholder(tf.float32, shape=[None, 7168])
y_ = tf.placeholder(tf.float32, shape=[None, 7168, 3])

#Many Convolutions and Relus omitted

final = tf.reshape(final, [-1, 7168])
keep_prob = tf.placeholder(tf.float32)
W_final = weight_variable([7168,7168,3])
b_final = bias_variable([7168,3])
final_conv = tf.tensordot(final, W_final, axes=[[1], [1]]) + b_final

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=final_conv))
train_step = tf.train.AdamOptimizer(1e-5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(final_conv, 2), tf.argmax(y_, 2))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

您需要标签平滑。

我刚刚遇到了同样的问题。我正在训练tf.nn.sparse_softmax_cross_entropy_with_logits这和你使用的一样tf.nn.softmax_cross_entropy_with_logits带有one-hot标签。我的数据集预测罕见事件的发生,因此训练集中的标签为 99% 0 类和 1% 1 类。我的损失将开始下降,然后停滞(但预测合理),然后突然爆炸,然后预测也变坏了。

使用tf.summary通过将内部网络状态记录到 Tensorboard 中,我观察到 logits 的绝对值不断增长。最终>1e8,tf.nn.softmax_cross_entropy_with_logits数值变得不稳定,这就是产生那些奇怪的损失峰值的原因。

在我看来,发生这种情况的原因在于 softmax 函数本身,这与 Jai 的评论一致,即在 softmax 之前放置一个 sigmoid 可以解决问题。但这也肯定会使 softmax 似然不可能准确,因为它限制了 logits 的值范围。但这样做可以防止溢出。

Softmax 定义为likelihood[i] = tf.exp(logit[i]) / tf.reduce_sum(tf.exp(logit[!=i]))。交叉熵定义为tf.reduce_sum(-label_likelihood[i] * tf.log(likelihood[i])因此,如果你的标签是单热的,那么它就会减少到目标可能性的负对数。实际上,这意味着你正在推动likelihood[true_class]尽可能接近1.0尽你所能。由于 softmax,唯一的方法是如果tf.exp(logit[!=true_class])变得尽可能接近0.0尽可能。

所以实际上,您已经要求优化器生成tf.exp(x) == 0.0做到这一点的唯一方法是x == - infinity。这就是数值不稳定的原因。

解决方案是“模糊”标签,而不是[0,0,1]你用[0.01,0.01,0.98]。现在优化器可以达到tf.exp(x) == 0.01这导致x == -4.6它安全地处于 GPU 计算准确可靠的数值范围内。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Softmax 交叉熵损失爆炸 的相关文章

随机推荐