我正在为音频分类任务训练 CNN,并且使用带有自定义训练循环的 TensorFlow 2.0 RC(如中所述本指南 https://www.tensorflow.org/beta/guide/keras/training_and_evaluation#part_ii_writing_your_own_training_evaluation_loops_from_scratch来自他们的官方网站)。我发现拥有一个漂亮的进度条真的很方便,类似于通常的 Kerasmodel.fit
.
这是我的训练代码的概要(我使用 4 个 GPU,采用镜像分布策略):
strategy = distribute.MirroredStrategy()
distr_train_dataset = strategy.experimental_distribute_dataset(train_dataset)
if valid_dataset:
distr_valid_dataset = strategy.experimental_distribute_dataset(valid_dataset)
with strategy.scope():
model = build_model() # build the model
optimizer = # define optimizer
train_loss = # define training loss
train_metrics_1 = # AUC-ROC
train_metrics_2 = # AUC-PR
valid_metrics_1 = # AUC-ROC for validation
valid_metrics_2 = # AUC-PR for validation
# rescale loss
def compute_loss(labels, predictions):
per_example_loss = train_loss(labels, predictions)
return per_example_loss/config.batch_size
def train_step(batch):
audio_batch, label_batch = batch
with tf.GradientTape() as tape:
logits = model(audio_batch)
loss = compute_loss(label_batch, logits)
variables = model.trainable_variables
grads = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(grads, variables))
train_metrics_1.update_state(label_batch, logits)
train_metrics_2.update_state(label_batch, logits)
train_mean_loss.update_state(loss)
return loss
def valid_step(batch):
audio_batch, label_batch = batch
logits = model(audio_batch, training=False)
loss = compute_loss(label_batch, logits)
val_metrics_1.update_state(label_batch, logits)
val_metrics_2.update_state(label_batch, logits)
val_loss.update_state(loss)
return loss
@tf.function
def distributed_train(batch):
num_batches = 0
for batch in distr_train_dataset:
num_batches += 1
strategy.experimental_run_v2(train_step, args=(batch, ))
# print progress here
tf.print('Step', num_batches, '; Loss', train_mean_loss.result(), '; ROC_AUC', train_metrics_1.result(), '; PR_AUC', train_metrics_2.result())
gc.collect()
@tf.function
def distributed_valid(batch):
for batch in distr_valid_dataset:
strategy.experimental_run_v2(valid_step, args=(batch, ))
gc.collect()
for epoch in range(epochs):
distributed_train(distr_train_dataset)
gc.collect()
train_metrics_1.reset_states()
train_metrics_2.reset_states()
train_mean_loss.reset_states()
if valid_dataset:
distributed_valid(distr_valid_dataset)
gc.collect()
val_metrics_1.reset_states()
val_metrics_2.reset_states()
val_loss.reset_states()
Here train_dataset
and valid_dataset
是使用通常的 tf.data 输入管道生成的两个 tf.data.TFRecordDataset。
TensorFlow 提供了一个非常好的 tf.keras.utils.Progbar (这确实是您在使用model.fit
)。我已经看了一下它的源代码 https://github.com/tensorflow/tensorflow/tree/r1.14/tensorflow/python/keras/utils/generic_utils.py#L313-L480,它依赖于 numpy,所以我不能用它来代替tf.print()
语句(以图形模式执行)。
如何在自定义训练循环中实现类似的进度条(训练函数在图形模式下运行)?
如何model.fit
首先显示进度条?