无法理解tensorflow keras层中“build”方法的行为(tf.keras.layers.Layer)

2024-01-09

张量流keras中的层有一个方法build它用于将权重创建推迟到您了解输入内容的时间。图层的构建方法 https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer#build

我有几个问题无法找到答案:

  1. here https://www.tensorflow.org/guide/keras/custom_layers_and_models#layers_are_recursively_composable据说

    If you assign a Layer instance as attribute of another Layer, the outer layer will start tracking the weights of the inner layer.

跟踪层的权重意味着什么?

  1. The same link also mentions that

    我们建议在init方法(由于子层通常有一个构建方法,因此它们将在构建外层时构建)。

这是否意味着在运行时build子类(self)的方法,将迭代其所有属性self以及被发现是(的实例)的子类tf.keras.layer.Layer将会有他们的build方法自动运行?

  1. 我可以运行这段代码:
class Net(tf.keras.Model):
  """A simple linear model."""

  def __init__(self):
    super(Net, self).__init__()
    self.l1 = tf.keras.layers.Dense(5)
  def call(self, x):
    return self.l1(x)

net = Net()
print(net.variables)

但不是这个:

class Net(tf.keras.Model):
  """A simple linear model."""

  def __init__(self):
    super(Net, self).__init__()
    self.l1 = tf.keras.layers.Dense(5)
  def build(self,input_shape):
    super().build()
  def call(self, x):
    return self.l1(x)

net = Net()
print(net.variables)

why?


我会说build提到的意思是,当你构建一个自定义的 tf.keras.Model 时

net = Net()

那么你会得到所有tf.keras.layers.Layer对象创建于__init__,并存储在net这是一个可调用对象。这样的话就成为一个完成的对象供TF后面训练了,是这样说的to track。下次你打电话的时候net(inputs)你会得到你的输出。

下面是Tensorflow自定义解码器的例子,带attention

class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self, query, values):
    # query hidden state shape == (batch_size, hidden size)
    # query_with_time_axis shape == (batch_size, 1, hidden size)
    # values shape == (batch_size, max_len, hidden size)
    # we are doing this to broadcast addition along the time axis to calculate the score
    query_with_time_axis = tf.expand_dims(query, 1)

    # score shape == (batch_size, max_length, 1)
    # we get 1 at the last axis because we are applying score to self.V
    # the shape of the tensor before applying self.V is (batch_size, max_length, units)
    score = self.V(tf.nn.tanh(
        self.W1(query_with_time_axis) + self.W2(values)))

    # attention_weights shape == (batch_size, max_length, 1)
    attention_weights = tf.nn.softmax(score, axis=1)

    # context_vector shape after sum == (batch_size, hidden_size)
    context_vector = attention_weights * values
    context_vector = tf.reduce_sum(context_vector, axis=1)

    return context_vector, attention_weights

class Decoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
    super(Decoder, self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.dec_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')
    self.fc = tf.keras.layers.Dense(vocab_size)

    # used for attention
    self.attention = BahdanauAttention(self.dec_units)

  def call(self, x, hidden, enc_output):
    # enc_output shape == (batch_size, max_length, hidden_size)
    context_vector, attention_weights = self.attention(hidden, enc_output)

    # x shape after passing through embedding == (batch_size, 1, embedding_dim)
    x = self.embedding(x)

    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

    # passing the concatenated vector to the GRU
    output, state = self.gru(x)

    # output shape == (batch_size * 1, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))

    # output shape == (batch_size, vocab)
    x = self.fc(output)

    return x, state, attention_weights

我试过把tf.keras.layers.Layer对象在call并得到了非常糟糕的结果,我猜那是因为如果你把它放进去call那么每次发生前向后向传播时,它都会被多次调用。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

无法理解tensorflow keras层中“build”方法的行为(tf.keras.layers.Layer) 的相关文章

随机推荐