编辑:请参阅此问题的结尾以获取解决方案
TL;DR: 我需要找到一种方法来计算每批次的标签分布,并更新学习率。有没有办法访问当前模型的优化器来更新每批的learning_rate?
下面是如何计算标签分布。它可以在损失函数中完成,因为默认情况下损失是批量计算的。在哪里可以执行该代码并且可以访问模型的优化器?
def loss(y_true, y_pred):
y = math_ops.argmax(y_true, axis=1)
freqs = tf.gather(lf, y) # equal to lf[y] if `lf` and `y` were numpy array's
inv_freqs = math_ops.pow(freqs, -1)
E = 1 / math_ops.reduce_sum(inv_freqs) # value to use when updating learning rate
更多细节
为了实现学习率计划,如中所述这张纸 https://arxiv.org/abs/1906.07413,我相信我需要一种方法,通过根据真实标签的标签分布计算出的值来更新每批训练期间的学习率在批次中 (y_true
正如它通常在 keras/tensorflow 中表示的那样)
在哪里 ...
x模型的输出
y相应的真实标签
Β小批量的m样本(e.g. 64)
ny the entire training sample size for ground truth label y
ny-1 the inverse label frequency
The portion of the formula I'm focused on is the part between α and Δθ
我可以从自定义损失函数中轻松实现这一点,但我不知道如何从损失函数中更新学习率(如果可以的话)。
def loss(y_true, y_pred):
y = math_ops.argmax(y_true, axis=1)
freqs = tf.gather(lf, y) # equal to lf[y] if `lf` and `y` were numpy array's
inv_freqs = math_ops.pow(freqs, -1)
E = 1 / math_ops.reduce_sum(inv_freqs) # value to use when updating learning rate
在哪里 ...
lf每个类别的样本频率。例如2 个类,c0 = 10 个示例,c1 = 100 -->lf == [10, 100]
有没有一些奇特的方法可以更新优化器的学习率,比如可以通过回调完成什么?
def on_batch_begin(self, batch, log):
# note: batch is just an incremented value to indicate batch index
self.model.optimizer.lr # learning rate, can be modified from callback
预先感谢您的任何帮助!
SOLUTION
非常感谢 @mrk 推动我朝着正确的方向解决这个问题!
为了计算每批次标签分布,然后使用该值来更新优化器的学习率,必须......
- 创建一个自定义 Metric,用于计算每批次的标签分布,并返回频率数组(默认情况下 keras 是按批次进行优化的,因此每批次都会计算指标)。
- 通过子类化创建一个典型的学习率调度程序
keras.callbacks.History
class
- 覆盖
on_batch_end
调度程序的功能,logs
dict 将包含该批次的所有计算指标包括我们的自定义标签分布指标!
创建自定义指标
class LabelDistribution(tf.keras.metrics.Metric):
"""
Computes the per-batch label distribution (y_true) and stores the array as
a metric which can be accessed via keras CallBack's
:param n_class: int - number of distinct output class(es)
"""
def __init__(self, n_class, name='batch_label_distribution', **kwargs):
super(LabelDistribution, self).__init__(name=name, **kwargs)
self.n_class = n_class
self.label_distribution = self.add_weight(name='ld', initializer='zeros',
aggregation=VariableAggregation.NONE,
shape=(self.n_class, ))
def update_state(self, y_true, y_pred, sample_weight=None):
y_true = mo.cast(y_true, 'int32')
y = mo.argmax(y_true, axis=1)
label_distrib = mo.bincount(mo.cast(y, 'int32'))
self.label_distribution.assign(mo.cast(label_distrib, 'float32'))
def result(self):
return self.label_distribution
def reset_states(self):
self.label_distribution.assign([0]*self.n_class)
创建 DRW 学习率调度程序
class DRWLearningRateSchedule(keras.callbacks.History):
"""
Used to implement the Differed Re-weighting strategy from
[Kaidi Cao, et al. "Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss." (2019)]
(https://arxiv.org/abs/1906.07413)
To be included as a metric to model.compile
`model.compile(..., metrics=[DRWLearningRateSchedule(.01)])`
"""
def __init__(self, base_lr, ld_metric='batch_label_distribution'):
super(DRWLearningRateSchedule, self).__init__()
self.base_lr = base_lr
self.ld_metric = ld_metric # name of the LabelDistribution metric
def on_batch_end(self, batch, logs=None):
ld = logs.get(self.ld_metric) # the per-batch label distribution
current_lr = self.model.optimizer.lr
# example below of updating the optimizers learning rate
K.set_value(self.model.optimizer.lr, current_lr * (1 / math_ops.reduce_sum(ld)))