是否可以根据批次标签(y_true)分布更新每个批次的学习率?

2024-02-16

编辑:请参阅此问题的结尾以获取解决方案

TL;DR: 我需要找到一种方法来计算每批次的标签分布,并更新学习率。有没有办法访问当前模型的优化器来更新每批的learning_rate?

下面是如何计算标签分布。它可以在损失函数中完成,因为默认情况下损失是批量计算的。在哪里可以执行该代码并且可以访问模型的优化器?

def loss(y_true, y_pred):
    y = math_ops.argmax(y_true, axis=1)
    freqs = tf.gather(lf, y)  # equal to lf[y] if `lf` and `y` were numpy array's
    inv_freqs = math_ops.pow(freqs, -1)
    E = 1 / math_ops.reduce_sum(inv_freqs)  # value to use when updating learning rate

更多细节

为了实现学习率计划,如中所述这张纸 https://arxiv.org/abs/1906.07413,我相信我需要一种方法,通过根据真实标签的标签分布计算出的值来更新每批训练期间的学习率在批次中 (y_true正如它通常在 keras/tensorflow 中表示的那样)

在哪里 ...

x模型的输出

y相应的真实标签

Β小批量的m样本(e.g. 64)

ny the entire training sample size for ground truth label y

ny-1 the inverse label frequency

The portion of the formula I'm focused on is the part between α and Δθ

我可以从自定义损失函数中轻松实现这一点,但我不知道如何从损失函数中更新学习率(如果可以的话)。

def loss(y_true, y_pred):
    y = math_ops.argmax(y_true, axis=1)
    freqs = tf.gather(lf, y)  # equal to lf[y] if `lf` and `y` were numpy array's
    inv_freqs = math_ops.pow(freqs, -1)
    E = 1 / math_ops.reduce_sum(inv_freqs)  # value to use when updating learning rate

在哪里 ...

lf每个类别的样本频率。例如2 个类,c0 = 10 个示例,c1 = 100 -->lf == [10, 100]

有没有一些奇特的方法可以更新优化器的学习率,比如可以通过回调完成什么?

def on_batch_begin(self, batch, log):
    # note: batch is just an incremented value to indicate batch index
    self.model.optimizer.lr  # learning rate, can be modified from callback

预先感谢您的任何帮助!


SOLUTION

非常感谢 @mrk 推动我朝着正确的方向解决这个问题!

为了计算每批次标签分布,然后使用该值来更新优化器的学习率,必须......

  1. 创建一个自定义 Metric,用于计算每批次的标签分布,并返回频率数组(默认情况下 keras 是按批次进行优化的,因此每批次都会计算指标)。
  2. 通过子类化创建一个典型的学习率调度程序keras.callbacks.History class
  3. 覆盖on_batch_end调度程序的功能,logsdict 将包含该批次的所有计算指标包括我们的自定义标签分布指标!

创建自定义指标

class LabelDistribution(tf.keras.metrics.Metric):
    """
    Computes the per-batch label distribution (y_true) and stores the array as
    a metric which can be accessed via keras CallBack's

    :param n_class: int - number of distinct output class(es)
    """

    def __init__(self, n_class, name='batch_label_distribution', **kwargs):
        super(LabelDistribution, self).__init__(name=name, **kwargs)
        self.n_class = n_class
        self.label_distribution = self.add_weight(name='ld', initializer='zeros',
                                                  aggregation=VariableAggregation.NONE,
                                                  shape=(self.n_class, ))

    def update_state(self, y_true, y_pred, sample_weight=None):
        y_true = mo.cast(y_true, 'int32')
        y = mo.argmax(y_true, axis=1)
        label_distrib = mo.bincount(mo.cast(y, 'int32'))

        self.label_distribution.assign(mo.cast(label_distrib, 'float32'))

    def result(self):
        return self.label_distribution

    def reset_states(self):
        self.label_distribution.assign([0]*self.n_class)

创建 DRW 学习率调度程序

class DRWLearningRateSchedule(keras.callbacks.History):
    """
    Used to implement the Differed Re-weighting strategy from
    [Kaidi Cao, et al. "Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss." (2019)]
    (https://arxiv.org/abs/1906.07413)

    To be included as a metric to model.compile
    `model.compile(..., metrics=[DRWLearningRateSchedule(.01)])`
    """

    def __init__(self, base_lr, ld_metric='batch_label_distribution'):
        super(DRWLearningRateSchedule, self).__init__()

        self.base_lr = base_lr
        self.ld_metric = ld_metric  # name of the LabelDistribution metric

    def on_batch_end(self, batch, logs=None):
        ld = logs.get(self.ld_metric)  # the per-batch label distribution
        current_lr = self.model.optimizer.lr
        # example below of updating the optimizers learning rate
        K.set_value(self.model.optimizer.lr, current_lr * (1 / math_ops.reduce_sum(ld)))

Keras 基于损失的学习率自适应

经过一番研究我发现this https://www.kaggle.com/fergusoci/keras-loss-based-learning-rate-scheduler,您也可以为学习率定义另一个函数或值,而不是触发衰减。

from __future__ import absolute_import
from __future__ import print_function

import keras
from keras import backend as K
import numpy as np


class LossLearningRateScheduler(keras.callbacks.History):
    """
    A learning rate scheduler that relies on changes in loss function
    value to dictate whether learning rate is decayed or not.
    LossLearningRateScheduler has the following properties:
    base_lr: the starting learning rate
    lookback_epochs: the number of epochs in the past to compare with the loss function at the current epoch to determine if progress is being made.
    decay_threshold / decay_multiple: if loss function has not improved by a factor of decay_threshold * lookback_epochs, then decay_multiple will be applied to the learning rate.
    spike_epochs: list of the epoch numbers where you want to spike the learning rate.
    spike_multiple: the multiple applied to the current learning rate for a spike.
    """

    def __init__(self, base_lr, lookback_epochs, spike_epochs = None, spike_multiple = 10, decay_threshold = 0.002, decay_multiple = 0.5, loss_type = 'val_loss'):

        super(LossLearningRateScheduler, self).__init__()

        self.base_lr = base_lr
        self.lookback_epochs = lookback_epochs
        self.spike_epochs = spike_epochs
        self.spike_multiple = spike_multiple
        self.decay_threshold = decay_threshold
        self.decay_multiple = decay_multiple
        self.loss_type = loss_type


    def on_epoch_begin(self, epoch, logs=None):

        if len(self.epoch) > self.lookback_epochs:

            current_lr = K.get_value(self.model.optimizer.lr)

            target_loss = self.history[self.loss_type] 

            loss_diff =  target_loss[-int(self.lookback_epochs)] - target_loss[-1]

            if loss_diff <= np.abs(target_loss[-1]) * (self.decay_threshold * self.lookback_epochs):

                print(' '.join(('Changing learning rate from', str(current_lr), 'to', str(current_lr * self.decay_multiple))))
                K.set_value(self.model.optimizer.lr, current_lr * self.decay_multiple)
                current_lr = current_lr * self.decay_multiple

            else:

                print(' '.join(('Learning rate:', str(current_lr))))

            if self.spike_epochs is not None and len(self.epoch) in self.spike_epochs:
                print(' '.join(('Spiking learning rate from', str(current_lr), 'to', str(current_lr * self.spike_multiple))))
                K.set_value(self.model.optimizer.lr, current_lr * self.spike_multiple)

        else:

            print(' '.join(('Setting learning rate to', str(self.base_lr))))
            K.set_value(self.model.optimizer.lr, self.base_lr)


        return K.get_value(self.model.optimizer.lr)




def main():
    return

if __name__ == '__main__':
    main()


本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

是否可以根据批次标签(y_true)分布更新每个批次的学习率? 的相关文章

随机推荐