Tensorflow pytorch及paddle交叉熵损失函数类标签及label smooth配置方法

2023-05-16

交叉熵损失函数类标签及label smooth配置方法

1 无class weight 无label smooth
- 1.1 pytorch 输出
- 1.2 paddle 输出
- 1.3 tensorflow 输出
2 有label smooth 没有class weight
- 2.1 pytorch 输出
- 2.1 paddle输出
- 2.3 tensorflow 输出
3 有class weight 无label smooth
- 3.1 pytorch 输出
- 3.2 paddle 输出
- 3.3 tensorflow 输出
4 即有class weight 又有 label smooth
- 4.1 pytorch 输出
- 4.2 paddle 输出
- 4.3 tensorflow输出
5 tensorflow的 class weight实现
- 5.1 WeightedCategoricalCrossentropy1 结果
- 5.2 WeightedCategoricalCrossentropy2 结果
- ５.3 总结

这篇文章主要是总结在使用不同深度学习框架使用分类交叉熵损失函数的一些经验和方法总结。不同框架分类模型所用的损失函数在使用label smooth和class weight时的配置方法。label smooth 是一种 regularization, class weight 则是在样本不均衡时，处理unbalanced data的一种法。以下将依次写明这些方法。

首先,导入必备的库及说明算法实现所用版本。

import torch 
import torch.nn as tnn
import paddle 
import paddle.nn as pnn
import copy
import numpy as np
import tensorflow as tf

print(torch.__version__)
print(paddle.__version__)
print(tf.__version__)

1.13.0+cpu
2.4.0
2.11.0

以上版本都是当前（2022.12.9)最新版本。实验是关注于损失函数的，所以输入最提前设定，两个样本，三分类的，非多标签。

1 无class weight 无label smooth

y_true = np.array([1,2],dtype=np.int64) # class index
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]],dtype=np.float32) # pred

1.1 pytorch 输出

y_true_torch = torch.from_numpy(y_true)
y_pred_torch = torch.from_numpy(y_pred)
ce_torch = tnn.CrossEntropyLoss(weight=None,label_smoothing=0.0,reduction="mean")
loss_torch=ce_torch(y_pred_torch,y_true_torch)
print("torch loss:",loss_torch.numpy())  
ce_torch = tnn.CrossEntropyLoss(weight=None,label_smoothing=0.0,reduction='none')
loss_torch=ce_torch(y_pred_torch,y_true_torch)
print("torch loss separate:", loss_torch.numpy())

torch loss: 0.9868951
torch loss separate: [0.5840635 1.3897266]

1.2 paddle 输出

y_true_paddle = paddle.to_tensor(y_true)
y_pred_paddle = paddle.to_tensor(y_pred)
y_true_paddle = pnn.functional.one_hot(y_true_paddle,num_classes=3)
ce_paddle = pnn.CrossEntropyLoss(weight=None,soft_label=True,reduction='mean')
loss_paddle=ce_paddle(y_pred_paddle,y_true_paddle)
print("paddle loss:",loss_paddle.numpy())  
ce_paddle = pnn.CrossEntropyLoss(weight=None,soft_label=True,reduction='none')
loss_paddle=ce_paddle(y_pred_paddle,y_true_paddle)
print("paddle loss separate:",loss_paddle.numpy())

paddle loss: [0.9868951]
paddle loss separate: [[0.58406353]
 [1.3897266 ]]

1.3 tensorflow 输出

y_true_tf = tf.convert_to_tensor(y_true)
y_pred_tf = tf.convert_to_tensor(y_pred)
y_true_tf = tf.one_hot(y_true_tf,3)
ce_tf = tf.keras.losses.CategoricalCrossentropy(from_logits=True,label_smoothing=0.0,reduction=tf.keras.losses.Reduction.AUTO)
loss_tf=ce_tf(y_true_tf,y_pred_tf,sample_weight=None)
print("tensorflow loss:",loss_tf.numpy())  
ce_tf= tf.keras.losses.CategoricalCrossentropy(from_logits=True,label_smoothing=0.0,reduction=tf.keras.losses.Reduction.NONE)
loss_tf=ce_tf(y_true_tf,y_pred_tf,sample_weight=None)
print("tensorflow loss separate:",loss_tf.numpy())

tensorflow loss: 0.9868951
tensorflow loss separate: [0.5840635 1.3897266]

从以上代码输出可以看到，这三个框架的输出结果是保持一致的

2 有label smooth 没有class weight

y_true = np.array([1,2],dtype=np.int64) # class index
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]],dtype=np.float32) # pred

2.1 pytorch 输出

y_true_torch = torch.from_numpy(y_true)
y_pred_torch = torch.from_numpy(y_pred)
ce_torch = tnn.CrossEntropyLoss(weight=None,label_smoothing=0.1,reduction="mean")
loss_torch=ce_torch(y_pred_torch,y_true_torch)
print("torch loss:",loss_torch.numpy())  
ce_torch = tnn.CrossEntropyLoss(weight=None,label_smoothing=0.1,reduction='none')
loss_torch=ce_torch(y_pred_torch,y_true_torch)
print("torch loss separate:", loss_torch.numpy())

torch loss: 1.0060617
torch loss separate: [0.64573014 1.3663933 ]

2.1 paddle输出

y_true_paddle = paddle.to_tensor(y_true)
y_pred_paddle = paddle.to_tensor(y_pred)
y_true_paddle = pnn.functional.one_hot(y_true_paddle,num_classes=3)
y_true_paddle = pnn.functional.label_smooth(y_true_paddle,epsilon=0.1)
ce_paddle = pnn.CrossEntropyLoss(weight=None,soft_label=True,reduction='mean')
loss_paddle=ce_paddle(y_pred_paddle,y_true_paddle)
print("paddle loss:",loss_paddle.numpy())  
ce_paddle = pnn.CrossEntropyLoss(weight=None,soft_label=True,reduction='none')
loss_paddle=ce_paddle(y_pred_paddle,y_true_paddle)
print("paddle loss separate:",loss_paddle.numpy())

paddle loss: [1.0060618]
paddle loss separate: [[0.6457302]
 [1.3663934]]

2.3 tensorflow 输出

y_true_tf = tf.convert_to_tensor(y_true)
y_pred_tf = tf.convert_to_tensor(y_pred)
y_true_tf = tf.one_hot(y_true_tf,3)
ce_tf = tf.keras.losses.CategoricalCrossentropy(from_logits=True,label_smoothing=0.1,reduction=tf.keras.losses.Reduction.AUTO)
loss_tf=ce_tf(y_true_tf,y_pred_tf,sample_weight=None)
print("tensorflow loss:",loss_tf.numpy())  
ce_tf= tf.keras.losses.CategoricalCrossentropy(from_logits=True,label_smoothing=0.1,reduction=tf.keras.losses.Reduction.NONE)
loss_tf=ce_tf(y_true_tf,y_pred_tf,sample_weight=None)
print("tensorflow loss separate:",loss_tf.numpy())

tensorflow loss: 1.0060618
tensorflow loss separate: [0.64573014 1.3663933 ]

从以上代码可以看出，三个框架的输出精度保持一致

3 有class weight 无label smooth

从代码角度来看，无label smooth不过是把label smooth设置成0.0

y_true = np.array([1,2],dtype=np.int64) # class index
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]],dtype=np.float32) # pred

3.1 pytorch 输出

y_true_torch = torch.from_numpy(y_true)
y_pred_torch = torch.from_numpy(y_pred)
weight_torch = torch.from_numpy(np.array([1,2,3],dtype=np.float32))
ce_torch = tnn.CrossEntropyLoss(weight=weight_torch,label_smoothing=0.0,reduction="mean")
loss_torch=ce_torch(y_pred_torch,y_true_torch)
print("torch loss:",loss_torch.numpy())  
ce_torch = tnn.CrossEntropyLoss(weight=weight_torch,label_smoothing=0.0,reduction='none')
loss_torch=ce_torch(y_pred_torch,y_true_torch)
print("torch loss separate:", loss_torch.numpy())

torch loss: 1.0674614
torch loss separate: [1.168127 4.16918 ]

3.2 paddle 输出

y_true_paddle = paddle.to_tensor(y_true)
y_pred_paddle = paddle.to_tensor(y_pred)
weight_paddle = paddle.to_tensor(np.array([1,2,3],dtype=np.float32))
y_true_paddle = pnn.functional.one_hot(y_true_paddle,num_classes=3)
y_true_paddle = pnn.functional.label_smooth(y_true_paddle,epsilon=0.0)
ce_paddle = pnn.CrossEntropyLoss(weight=weight_paddle,soft_label=True,reduction='mean')
loss_paddle=ce_paddle(y_pred_paddle,y_true_paddle)
print("paddle loss:",loss_paddle.numpy())  
ce_paddle = pnn.CrossEntropyLoss(weight=weight_paddle,soft_label=True,reduction='none')
loss_paddle=ce_paddle(y_pred_paddle,y_true_paddle)
print("paddle loss separate:",loss_paddle.numpy())

paddle loss: [1.0674614]
paddle loss separate: [[1.1681271]
 [4.16918  ]]

3.3 tensorflow 输出

这部分，由于tensorflow没有class weight这个功能，所以需要将class weight转换成sample weight

y_true_tf = tf.convert_to_tensor(y_true)
y_pred_tf = tf.convert_to_tensor(y_pred)
y_true_tf = tf.one_hot(y_true_tf,3)
weight_tf= tf.constant([1.,2.,3.])
weights = tf.reduce_sum(class_weight * y_true_tf, axis=-1)  #这部分就是class weight 转sample weight
ce_tf = tf.keras.losses.CategoricalCrossentropy(from_logits=True,label_smoothing=0.0,reduction=tf.keras.losses.Reduction.AUTO)
loss_tf=ce_tf(y_true_tf,y_pred_tf,sample_weight=weights)
print("tensorflow loss:",loss_tf.numpy())  
ce_tf= tf.keras.losses.CategoricalCrossentropy(from_logits=True,label_smoothing=0.0,reduction=tf.keras.losses.Reduction.NONE)
loss_tf=ce_tf(y_true_tf,y_pred_tf,sample_weight=weights)
print("tensorflow loss separate:",loss_tf.numpy())

tensorflow loss: 2.6686535
tensorflow loss separate: [1.168127 4.16918 ]

可以看出tensorflow 在reduction为none时输出和其它两个框架是完全一样的，而paddle和pytorch是完全相同的。tensorflow的差异在于求平均过程，tensorflow是除以
sample的个数，而其它两个框架是除以权重和得到的，如下

print("tensorflow output:",tf.reduce_mean(loss_tf).numpy())
print("pytorch and paddle output:",(tf.reduce_sum(loss_tf)/tf.reduce_sum(weights)).numpy())

tensorflow output: 2.6686535
pytorch and paddle output: 1.0674614

4 即有class weight 又有 label smooth

y_true = np.array([1,2],dtype=np.int64) # class index
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]],dtype=np.float32) # pred

4.1 pytorch 输出

y_true_torch = torch.from_numpy(y_true)
y_pred_torch = torch.from_numpy(y_pred)
weight_torch = torch.from_numpy(np.array([1,2,3],dtype=np.float32))
ce_torch = tnn.CrossEntropyLoss(weight=weight_torch,label_smoothing=0.1,reduction="mean")
loss_torch=ce_torch(y_pred_torch,y_true_torch)
print("torch loss:",loss_torch.numpy())  
ce_torch = tnn.CrossEntropyLoss(weight=weight_torch,label_smoothing=0.1,reduction='none')
loss_torch=ce_torch(y_pred_torch,y_true_torch)
print("torch loss separate:", loss_torch.numpy())

torch loss: 1.0553335
torch loss separate: [1.293127  3.9835405]

4.2 paddle 输出

y_true_paddle = paddle.to_tensor(y_true)
y_pred_paddle = paddle.to_tensor(y_pred)
weight_paddle = paddle.to_tensor(np.array([1,2,3],dtype=np.float32))
y_true_paddle = pnn.functional.one_hot(y_true_paddle,num_classes=3)
y_true_paddle = pnn.functional.label_smooth(y_true_paddle,epsilon=0.1)
ce_paddle = pnn.CrossEntropyLoss(weight=weight_paddle,soft_label=True,reduction='mean')
loss_paddle=ce_paddle(y_pred_paddle,y_true_paddle)
print("paddle loss:",loss_paddle.numpy())  
ce_paddle = pnn.CrossEntropyLoss(weight=weight_paddle,soft_label=True,reduction='none')
loss_paddle=ce_paddle(y_pred_paddle,y_true_paddle)
print("paddle loss separate:",loss_paddle.numpy())

paddle loss: [1.0722452]
paddle loss separate: [[1.2914604]
 [3.9625409]]


d:\ProgramData\Anaconda3\lib\site-packages\paddle\fluid\dygraph\math_op_patch.py:275: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.bool, the right dtype will convert to paddle.float32
  warnings.warn(

4.3 tensorflow输出

y_true_tf = tf.convert_to_tensor(y_true)
y_pred_tf = tf.convert_to_tensor(y_pred)
y_true_tf = tf.one_hot(y_true_tf,3)
weight_tf= tf.constant([1.,2.,3.])
weights = tf.reduce_sum(weight_tf * y_true_tf, axis=-1)  #这部分就是class weight 转sample weight
ce_tf = tf.keras.losses.CategoricalCrossentropy(from_logits=True,label_smoothing=0.1,reduction=tf.keras.losses.Reduction.AUTO)
loss_tf=ce_tf(y_true_tf,y_pred_tf,sample_weight=weights)
print("tensorflow loss:",loss_tf.numpy())  
ce_tf= tf.keras.losses.CategoricalCrossentropy(from_logits=True,label_smoothing=0.1,reduction=tf.keras.losses.Reduction.NONE)
loss_tf=ce_tf(y_true_tf,y_pred_tf,sample_weight=weights)
print("tensorflow loss separate:",loss_tf.numpy())

tensorflow loss: 2.6953201
tensorflow loss separate: [1.2914603 4.09918  ]

参照3.3 我们输出一下

print("tensorflow output:",tf.reduce_mean(loss_tf).numpy())
print("weight output:",(tf.reduce_sum(loss_tf)/tf.reduce_sum(weights)).numpy())

tensorflow output: 2.6953201
weight output: 1.0781281

从以上的结果看到三个框架的结果都是不同的，对于pytorch的结果分析，看了官方文档的计算公式，还没太明白，也没看到源码，这里就不多述，以后搞明白原因就补上。
关于paddle和tenssorflow的结果倒是有理解，接着看下一部分的代码。

5 tensorflow的 class weight实现

就个人理解而言，我更加认可4.3部分，在同时有class weight 和label smooth的结果是正确的，而结果可以是求均值，也可以是按权值的均值。我更倾向于平均值。
当然没有好的理论支撑。

y_true = np.array([1,2],dtype=np.int64) # class index
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]],dtype=np.float32) # pred



# 对于tensorflow 需要自己写一个带类权重的函数
# @tf.keras.utils.register_keras_serializable(package="weightedcrossentropyloss")
class WeightedCategoricalCrossentropy1(tf.keras.losses.Loss):
    """Implements WeightedCategoricalCrossentropy.
    
    Args:
        class_weight: a manual rescaling weight given to each class. If given, has to be a Tensor of size C
        from_logits: Whether y_pred is expected to be a logits tensor. By default, we assume that y_pred encodes a probability distribution.
        label_smoothing:Float in [0, 1]. When > 0, label values are smoothed, meaning the confidence on label values are relaxed. For example, 
        if 0.1, use 0.1 / num_classes for non-target labels and 0.9 + 0.1 / num_classes for target labels.
    """

    def __init__(self,class_weight=None,from_logits=True,label_smoothing=0.0, reduction='mean',**kwargs):
        super().__init__(**kwargs)
        self.label_smoothing = label_smoothing
        self.from_logits=from_logits
        self.class_weight=class_weight
        self.reduction=reduction


    def _labelsmoothing(self, y_true, class_num):
        if len(y_true.shape) == 1 or y_true.shape[-1] != class_num:
            raise Exception("Please use one hot label")
        y_true = y_true*(1 - self.label_smoothing)+self.label_smoothing / class_num

        return y_true

    def call(self, y_true, y_pred):
        y_pred = tf.convert_to_tensor(y_pred)
        y_true = tf.cast(y_true, y_pred.dtype)
        
        class_num = y_pred.shape[-1]
        if not self.class_weight is None:
            weights = tf.reduce_sum(self.class_weight * y_true, axis=-1)
        if self.label_smoothing:
            y_true = self._labelsmoothing(y_true,class_num)
        # weights = tf.reduce_sum(self.class_weight * y_true, axis=-1)
        if self.from_logits:
            y_pred = -tf.nn.log_softmax(y_pred, axis=-1)
        else:
            y_pred = -tf.math.log(y_pred, axis=-1)
            
        loss = tf.reduce_sum(y_pred * y_true, axis=-1)
        if not self.class_weight is None:
            loss = loss * weights
        if self.reduction==tf.keras.losses.Reduction.AUTO:
            loss = tf.reduce_mean(loss, axis=-1)
        elif self.reduction==tf.keras.losses.Reduction.SUM:
            loss = tf.reduce_mean(loss, axis=-1)
        else:
            pass
        return loss
    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "class_weight": self.class_weight,
                "from_logits": self.from_logits,
                "label_smoothing": self.label_smoothing,
            }
        )
        return config

class WeightedCategoricalCrossentropy2(tf.keras.losses.Loss):
    """Implements WeightedCategoricalCrossentropy.
    
    Args:
        class_weight: a manual rescaling weight given to each class. If given, has to be a Tensor of size C
        from_logits: Whether y_pred is expected to be a logits tensor. By default, we assume that y_pred encodes a probability distribution.
        label_smoothing:Float in [0, 1]. When > 0, label values are smoothed, meaning the confidence on label values are relaxed. For example, 
        if 0.1, use 0.1 / num_classes for non-target labels and 0.9 + 0.1 / num_classes for target labels.
    """

    def __init__(self,class_weight=None,from_logits=True,label_smoothing=0.0, reduction='mean',**kwargs):
        super().__init__(**kwargs)
        self.label_smoothing = label_smoothing
        self.from_logits=from_logits
        self.class_weight=class_weight
        self.reduction=reduction


    def _labelsmoothing(self, y_true, class_num):
        if len(y_true.shape) == 1 or y_true.shape[-1] != class_num:
            raise Exception("Please use one hot label")
        y_true = y_true*(1 - self.label_smoothing)+self.label_smoothing / class_num

        return y_true

    def call(self, y_true, y_pred):
        y_pred = tf.convert_to_tensor(y_pred)
        y_true = tf.cast(y_true, y_pred.dtype)
        
        class_num = y_pred.shape[-1]
        # weights = tf.reduce_sum(self.class_weight * y_true, axis=-1)
        if self.label_smoothing:
            y_true = self._labelsmoothing(y_true,class_num)
        if not self.class_weight is None:
            weights = tf.reduce_sum(self.class_weight * y_true, axis=-1)
        if self.from_logits:
            y_pred = -tf.nn.log_softmax(y_pred, axis=-1)
        else:
            y_pred = -tf.math.log(y_pred, axis=-1)
            
        loss = tf.reduce_sum(y_pred * y_true, axis=-1)
        if not self.class_weight is None:
            loss = loss * weights
        if self.reduction==tf.keras.losses.Reduction.AUTO:
            loss = tf.reduce_mean(loss, axis=-1)
        elif self.reduction==tf.keras.losses.Reduction.SUM:
            loss = tf.reduce_mean(loss, axis=-1)
        else:
            pass
        return loss
    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "class_weight": self.class_weight,
                "from_logits": self.from_logits,
                "label_smoothing": self.label_smoothing,
            }
        )
        return config

以上代码中，可以看到，这两部分代码只有一个小部分的区别，就是weights计算的位置，一个在label smooth 之前，另一个在label smooth之后，现在看一下结果

补充一点：

if self.from_logits:
    y_pred = -tf.nn.log_softmax(y_pred, axis=-1)
else:
    y_pred = -tf.math.log(y_pred, axis=-1)
    
loss = tf.reduce_sum(y_pred * y_true, axis=-1)

以上几行代码可以用：

 tf.nn.softmax_cross_entropy_with_logits(y_true,y_pred)

来代替的。

5.1 WeightedCategoricalCrossentropy1 结果

y_true_tf = tf.convert_to_tensor(y_true)
y_pred_tf = tf.convert_to_tensor(y_pred)
y_true_tf = tf.one_hot(y_true_tf,3)
weight_tf = tf.constant([1.,2.,3.])
ce_tf = WeightedCategoricalCrossentropy1(class_weight=weight_tf,from_logits=True,label_smoothing=0.1,reduction=tf.keras.losses.Reduction.AUTO)
loss_tf=ce_tf(y_true_tf,y_pred_tf)
print("tensorflow loss:",loss_tf.numpy())  
ce_tf= WeightedCategoricalCrossentropy1(class_weight=weight_tf,from_logits=True,label_smoothing=0.1,reduction=tf.keras.losses.Reduction.NONE)
loss_tf=ce_tf(y_true_tf,y_pred_tf)
print("tensorflow loss separate:",loss_tf.numpy())

tensorflow loss: 2.6953201
tensorflow loss separate: [1.2914603 4.09918  ]

y_true_tf = tf.convert_to_tensor(y_true)
y_true_tf = tf.one_hot(y_true_tf,3)
weight_tf = tf.constant([1.,2.,3.])
weights = tf.reduce_sum(weight_tf * y_true_tf, axis=-1)
print("weights:",weights)
print("tensorflow output:",tf.reduce_mean(loss_tf).numpy())
print("weight output:",(tf.reduce_sum(loss_tf)/tf.reduce_sum(weights)).numpy())

weights: tf.Tensor([2. 3.], shape=(2,), dtype=float32)
tensorflow output: 2.6953201
weight output: 1.0781281

可以看到这个结果与4.3结果是相同的，也就是说tensorflow 结果是在label smooth之前计算出class weight的

5.2 WeightedCategoricalCrossentropy2 结果

y_true_tf = tf.convert_to_tensor(y_true)
y_pred_tf = tf.convert_to_tensor(y_pred)
y_true_tf = tf.one_hot(y_true_tf,3)
weight_tf = tf.constant([1.,2.,3.])
ce_tf = WeightedCategoricalCrossentropy2(class_weight=weight_tf,from_logits=True,label_smoothing=0.1,reduction=tf.keras.losses.Reduction.AUTO)
loss_tf=ce_tf(y_true_tf,y_pred_tf)
print("tensorflow loss:",loss_tf.numpy())  
ce_tf= WeightedCategoricalCrossentropy2(class_weight=weight_tf,from_logits=True,label_smoothing=0.1,reduction=tf.keras.losses.Reduction.NONE)
loss_tf=ce_tf(y_true_tf,y_pred_tf)
print("tensorflow loss separate:",loss_tf.numpy())

tensorflow loss: 2.6270003
tensorflow loss separate: [1.2914603 3.9625404]

y_true_tf = tf.convert_to_tensor(y_true)
y_true_tf = tf.one_hot(y_true_tf,3)
y_true_tf = y_true_tf*(1 - 0.1)+0.1 / 3. # label smooth
weight_tf = tf.constant([1.,2.,3.])
weights = tf.reduce_sum(weight_tf * y_true_tf, axis=-1)

print("weights:",weights)
print("tensorflow output:",tf.reduce_mean(loss_tf).numpy())
print("weight output:",(tf.reduce_sum(loss_tf)/tf.reduce_sum(weights)).numpy())

weights: tf.Tensor([2.        2.8999999], shape=(2,), dtype=float32)
tensorflow output: 2.6270003
weight output: 1.0722451

可以看出，这个结果与4.2 paddle的结果是相同的，也就是说class weight 是在label smooth 之后计算的。

５.3 总结

以上自定义的两种class weight的方法都可以用官方的tf.keras.losses.CategoricalCrossentropy来实现的，自定义的第一种方法与4.3相同，自定义的第二种如下，只要改一下weight就行

y_true_tf = tf.convert_to_tensor(y_true)
y_pred_tf = tf.convert_to_tensor(y_pred)
y_true_tf = tf.one_hot(y_true_tf,3)
y_true_tf = y_true_tf*(1 - 0.1)+0.1 / 3. # label smooth
weight_tf= tf.constant([1.,2.,3.])
weights = tf.reduce_sum(weight_tf * y_true_tf, axis=-1)  #这部分就是class weight 转sample weight
ce_tf = tf.keras.losses.CategoricalCrossentropy(from_logits=True,label_smoothing=0.0,reduction=tf.keras.losses.Reduction.AUTO)
loss_tf=ce_tf(y_true_tf,y_pred_tf,sample_weight=weights)
print("tensorflow loss:",loss_tf.numpy())  
ce_tf= tf.keras.losses.CategoricalCrossentropy(from_logits=True,label_smoothing=0.0,reduction=tf.keras.losses.Reduction.NONE)
loss_tf=ce_tf(y_true_tf,y_pred_tf,sample_weight=weights)
print("tensorflow loss separate:",loss_tf.numpy())

tensorflow loss: 2.6270003
tensorflow loss separate: [1.2914603 3.9625404]

print("weights:",weights)
print("tensorflow output:",tf.reduce_mean(loss_tf).numpy())
print("weight output:",(tf.reduce_sum(loss_tf)/tf.reduce_sum(weights)).numpy())

weights: tf.Tensor([2.        2.8999999], shape=(2,), dtype=float32)
tensorflow output: 2.6270003
weight output: 1.0722451

可以看得出，结果是对的上的，重点理解那两个自定义的class weight 就可以明白 tensorflow 和paddle的计算结果。

以上全文第5节，可以理解有一个unweighted loss 和一个 weight,最后生成weighted lossm

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)