Huggingface Bert TPU 微调适用于 Colab,但不适用于 GCP

2024-04-14

我正在尝试在 TPU 上微调 Huggingface Transformers BERT 模型。它在 Colab 中工作,但当我切换到 GCP 上的付费 TPU 时失败。 Jupyter笔记本代码如下:

[1] model = transformers.TFBertModel.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
# works
[2] cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(
    tpu='[My TPU]',
    zone='us-central1-a',
    project='[My Project]'
)
tf.config.experimental_connect_to_cluster(cluster_resolver)
tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
tpu_strategy = tf.distribute.experimental.TPUStrategy(cluster_resolver)
#Also works. Got a bunch of startup messages from the TPU - all good.

[3] with tpu_strategy.scope():
    model = TFBertModel.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
#Generates the error below (long). Same line works in Colab.

这是错误消息:

NotFoundError                             Traceback (most recent call last)
<ipython-input-14-2cfc1a238903> in <module>
      1 with tpu_strategy.scope():
----> 2     model = TFBertModel.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

~/.local/lib/python3.5/site-packages/transformers/modeling_tf_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    309             return load_pytorch_checkpoint_in_tf2_model(model, resolved_archive_file, allow_missing_keys=True)
    310 
--> 311         ret = model(model.dummy_inputs, training=False)  # build the network with dummy inputs
    312 
    313         assert os.path.isfile(resolved_archive_file), "Error retrieving file {}".format(resolved_archive_file)

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    820           with base_layer_utils.autocast_context_manager(
    821               self._compute_dtype):
--> 822             outputs = self.call(cast_inputs, *args, **kwargs)
    823           self._handle_activity_regularization(inputs, outputs)
    824           self._set_mask_metadata(inputs, outputs, input_masks)

~/.local/lib/python3.5/site-packages/transformers/modeling_tf_bert.py in call(self, inputs, **kwargs)
    688 
    689     def call(self, inputs, **kwargs):
--> 690         outputs = self.bert(inputs, **kwargs)
    691         return outputs
    692 

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    820           with base_layer_utils.autocast_context_manager(
    821               self._compute_dtype):
--> 822             outputs = self.call(cast_inputs, *args, **kwargs)
    823           self._handle_activity_regularization(inputs, outputs)
    824           self._set_mask_metadata(inputs, outputs, input_masks)

~/.local/lib/python3.5/site-packages/transformers/modeling_tf_bert.py in call(self, inputs, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, training)
    548 
    549         embedding_output = self.embeddings([input_ids, position_ids, token_type_ids, inputs_embeds], training=training)
--> 550         encoder_outputs = self.encoder([embedding_output, extended_attention_mask, head_mask], training=training)
    551 
    552         sequence_output = encoder_outputs[0]

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    820           with base_layer_utils.autocast_context_manager(
    821               self._compute_dtype):
--> 822             outputs = self.call(cast_inputs, *args, **kwargs)
    823           self._handle_activity_regularization(inputs, outputs)
    824           self._set_mask_metadata(inputs, outputs, input_masks)

~/.local/lib/python3.5/site-packages/transformers/modeling_tf_bert.py in call(self, inputs, training)
    365                 all_hidden_states = all_hidden_states + (hidden_states,)
    366 
--> 367             layer_outputs = layer_module([hidden_states, attention_mask, head_mask[i]], training=training)
    368             hidden_states = layer_outputs[0]
    369 

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    820           with base_layer_utils.autocast_context_manager(
    821               self._compute_dtype):
--> 822             outputs = self.call(cast_inputs, *args, **kwargs)
    823           self._handle_activity_regularization(inputs, outputs)
    824           self._set_mask_metadata(inputs, outputs, input_masks)

~/.local/lib/python3.5/site-packages/transformers/modeling_tf_bert.py in call(self, inputs, training)
    341         hidden_states, attention_mask, head_mask = inputs
    342 
--> 343         attention_outputs = self.attention([hidden_states, attention_mask, head_mask], training=training)
    344         attention_output = attention_outputs[0]
    345         intermediate_output = self.intermediate(attention_output)

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    820           with base_layer_utils.autocast_context_manager(
    821               self._compute_dtype):
--> 822             outputs = self.call(cast_inputs, *args, **kwargs)
    823           self._handle_activity_regularization(inputs, outputs)
    824           self._set_mask_metadata(inputs, outputs, input_masks)

~/.local/lib/python3.5/site-packages/transformers/modeling_tf_bert.py in call(self, inputs, training)
    290         input_tensor, attention_mask, head_mask = inputs
    291 
--> 292         self_outputs = self.self_attention([input_tensor, attention_mask, head_mask], training=training)
    293         attention_output = self.dense_output([self_outputs[0], input_tensor], training=training)
    294         outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    820           with base_layer_utils.autocast_context_manager(
    821               self._compute_dtype):
--> 822             outputs = self.call(cast_inputs, *args, **kwargs)
    823           self._handle_activity_regularization(inputs, outputs)
    824           self._set_mask_metadata(inputs, outputs, input_masks)

~/.local/lib/python3.5/site-packages/transformers/modeling_tf_bert.py in call(self, inputs, training)
    222 
    223         batch_size = shape_list(hidden_states)[0]
--> 224         mixed_query_layer = self.query(hidden_states)
    225         mixed_key_layer = self.key(hidden_states)
    226         mixed_value_layer = self.value(hidden_states)

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    820           with base_layer_utils.autocast_context_manager(
    821               self._compute_dtype):
--> 822             outputs = self.call(cast_inputs, *args, **kwargs)
    823           self._handle_activity_regularization(inputs, outputs)
    824           self._set_mask_metadata(inputs, outputs, input_masks)

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/layers/core.py in call(self, inputs)
   1142         outputs = gen_math_ops.mat_mul(inputs, self.kernel)
   1143     if self.use_bias:
-> 1144       outputs = nn.bias_add(outputs, self.bias)
   1145     if self.activation is not None:
   1146       return self.activation(outputs)  # pylint: disable=not-callable

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/ops/nn_ops.py in bias_add(value, bias, data_format, name)
   2756     else:
   2757       return gen_nn_ops.bias_add(
-> 2758           value, bias, data_format=data_format, name=name)
   2759 
   2760 

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py in bias_add(value, bias, data_format, name)
    675       try:
    676         return bias_add_eager_fallback(
--> 677             value, bias, data_format=data_format, name=name, ctx=_ctx)
    678       except _core._SymbolicException:
    679         pass  # Add nodes to the TensorFlow graph.

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py in bias_add_eager_fallback(value, bias, data_format, name, ctx)
    703     data_format = "NHWC"
    704   data_format = _execute.make_str(data_format, "data_format")
--> 705   _attr_T, _inputs_T = _execute.args_to_matching_eager([value, bias], ctx)
    706   (value, bias) = _inputs_T
    707   _inputs_flat = [value, bias]

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/eager/execute.py in args_to_matching_eager(l, ctx, default_dtype)
    265         dtype = ret[-1].dtype
    266   else:
--> 267     ret = [ops.convert_to_tensor(t, dtype, ctx=ctx) for t in l]
    268 
    269   # TODO(slebedev): consider removing this as it leaks a Keras concept.

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/eager/execute.py in <listcomp>(.0)
    265         dtype = ret[-1].dtype
    266   else:
--> 267     ret = [ops.convert_to_tensor(t, dtype, ctx=ctx) for t in l]
    268 
    269   # TODO(slebedev): consider removing this as it leaks a Keras concept.

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/framework/ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
   1312 
   1313     if ret is None:
-> 1314       ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
   1315 
   1316     if ret is NotImplemented:

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/distribute/values.py in _tensor_conversion_mirrored(var, dtype, name, as_ref)
   1174 # allowing instances of the class to be used as tensors.
   1175 def _tensor_conversion_mirrored(var, dtype=None, name=None, as_ref=False):
-> 1176   return var._dense_var_to_tensor(dtype=dtype, name=name, as_ref=as_ref)  # pylint: disable=protected-access
   1177 
   1178 

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/distribute/values.py in _dense_var_to_tensor(self, dtype, name, as_ref)
    908     if _enclosing_tpu_context() is None:
    909       return super(TPUVariableMixin, self)._dense_var_to_tensor(
--> 910           dtype=dtype, name=name, as_ref=as_ref)
    911     # pylint: enable=protected-access
    912     elif dtype is not None and dtype != self.dtype:

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/distribute/values.py in _dense_var_to_tensor(self, dtype, name, as_ref)
   1164     assert not as_ref
   1165     return ops.convert_to_tensor(
-> 1166         self.get(), dtype=dtype, name=name, as_ref=as_ref)
   1167 
   1168   def _clone_with_new_values(self, new_values):

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/distribute/values.py in get(self, device)
    835   def get(self, device=None):
    836     if (_enclosing_tpu_context() is None) or (device is not None):
--> 837       return super(TPUVariableMixin, self).get(device=device)
    838     else:
    839       raise NotImplementedError(

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/distribute/values.py in get(self, device)
    320         device = distribute_lib.get_update_device()
    321         if device is None:
--> 322           return self._get_cross_replica()
    323     device = device_util.canonicalize(device)
    324     return self._device_map.select_for_device(self._values, device)

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/distribute/values.py in _get_cross_replica(self)
   1136     replica_id = self._device_map.replica_for_device(device)
   1137     if replica_id is None:
-> 1138       return array_ops.identity(self.primary)
   1139     return array_ops.identity(self._values[replica_id])
   1140 

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/util/dispatch.py in wrapper(*args, **kwargs)
    178     """Call target, and fall back on dispatchers if there is a TypeError."""
    179     try:
--> 180       return target(*args, **kwargs)
    181     except (TypeError, ValueError):
    182       # Note: convert_to_eager_tensor currently raises a ValueError, not a

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/ops/array_ops.py in identity(input, name)
    265     # variables. Variables have correct handle data when graph building.
    266     input = ops.convert_to_tensor(input)
--> 267   ret = gen_array_ops.identity(input, name=name)
    268   # Propagate handle data for happier shape inference for resource variables.
    269   if hasattr(input, "_handle_data"):

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/ops/gen_array_ops.py in identity(input, name)
   3824         pass  # Add nodes to the TensorFlow graph.
   3825     except _core._NotOkStatusException as e:
-> 3826       _ops.raise_from_not_ok_status(e, name)
   3827   # Add nodes to the TensorFlow graph.
   3828   _, _, _op, _outputs = _op_def_library._apply_op_helper(

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/framework/ops.py in raise_from_not_ok_status(e, name)
   6604   message = e.message + (" name: " + name if name is not None else "")
   6605   # pylint: disable=protected-access
-> 6606   six.raise_from(core._status_to_exception(e.code, message), None)
   6607   # pylint: enable=protected-access
   6608 

/usr/local/lib/python3.5/dist-packages/six.py in raise_from(value, from_value)

NotFoundError: '_MklMatMul' is neither a type of a primitive operation nor a name of a function registered in binary running on n-aa2fcfb7-w-0. One possible root cause is the client and server binaries are not built with the same version. Please make sure the operation or function is registered in the binary running in this process. [Op:Identity]

我将其发布在 Huggingface github 上(https://github.com/huggingface/transformers/issues/2572 https://github.com/huggingface/transformers/issues/2572)并且他们建议 TPU 服务器版本可能与 TPU 客户端版本不匹配,但是 a)我不知道如何检查,也不知道 b)该怎么办。建议表示赞赏。


None

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Huggingface Bert TPU 微调适用于 Colab,但不适用于 GCP 的相关文章

  • 如何识别图形线条

    我有以下格式的路径的 x y 数据 示例仅用于说明 seq p1 p2 0 20 2 3 1 20 2 4 2 20 4 4 3 22 5 5 4 22 5 6 5 23 6 2 6 23 6 3 7 23 6 4 每条路径都有多个点 它们
  • 如何强制下载图片?

    我的页面上有一个动态生成的图像 如下所示 img src 我不想告诉我的用户右键单击图像并点击保存 而是想公开一个下载链接 单击该链接将提示下载图像 如何实现这一目标 最初我在 js 中尝试这样做 var path my image att
  • 将我的免费应用程序从 Universal 升级到仅限 iPhone

    我释放我的free app到 appStore 它的版本是 1 0 它是一个Universal app 现在我想发布 1 1 版本到 appStore 我将其升级到iPhone only appStore会拒绝我吗 我已阅读类似的问题 ht
  • 如何将 SQL“LIKE”与 LINQ to Entities 结合使用?

    我有一个文本框 允许用户指定搜索字符串 包括通配符 例如 Joh Johnson mit ack on 在使用 LINQ to Entities 之前 我有一个存储过程 该存储过程将该字符串作为参数并执行以下操作 SELECT FROM T
  • UI Router 将 url 与 hash(片段)相匹配

    使用 UI 路由器 我需要将 URL 与其中包含的哈希 片段 进行匹配 HTML5 模式 state myState url path id page section templateUrl template html controller
  • CUDA 添加矩阵的行

    我试图将 4800x9600 矩阵的行加在一起 得到一个 1x9600 的矩阵 我所做的是将 4800x9600 分成 9 600 个矩阵 每个矩阵长度为 4800 然后我对 4800 个元素进行缩减 问题是 这真的很慢 有人有什么建议吗
  • AWK 错误:尝试在标量上下文中使用数组

    我正在学习AWK 这是一个简单的代码片段 我尝试将字符串拆分为数组并迭代它 BEGIN split a b c a for i 1 i lt length a i print a i 运行此代码时 我收到以下错误 awk awk txt 4
  • 如何使用placement new重新初始化该字段?

    我的课程包含字段 private OrderUpdate curOrderUpdate 我一遍又一遍地使用它 经常需要重新初始化 for int i 0 i lt entries size i auto entry entries i ne
  • 突出显示单词并提取其附近文本的函数

    我有一个文本例如 Etiam porta semmalesuada magna mollis euismod 整数取数 ante venenatis dapibus posuere velit aliquet 埃蒂亚姆 门塔 塞姆 male
  • [GoF]-ConcreteSubject 可以覆盖通知方法吗?

    我正在模拟一种情况 其中存在 通知框 观察者 list1 list2 list3 这个科目 现在我会制作一张图表 其中使用观察者模式描述每个列表实现不同类型的notify 这一事实 例如 列表状态的某些变化只需要按照某些标准通知给某些观察者
  • 拉斐尔路径交叉点不起作用

    我对拉斐尔和 pathIntersection method JSFiddle 示例 http jsfiddle net t6gWt 2 您可以看到有两条线都与曲线相交 但当我使用 pathIntersection method 有一个未解
  • 结构化绑定的用例有哪些?

    C 17 标准引入了新的结构化绑定 http en cppreference com w cpp language structured binding功能 最初是proposed http www open std org jtc1 sc
  • 无法完成添加 Android 证书的构建

    我刚刚完成构建我的应用程序 我发送了一个没有证书的构建版本 它工作了 现在添加一个 android 证书 它在我的代号 one 仪表板上报告构建错误 如有帮助 将不胜感激 失败 构建失败并出现异常 出了什么问题 执行 任务失败 transf
  • 对齐与未对齐 x86 SIMD 指令之间的选择

    SIMD指令一般有两种类型 A 使用对齐的内存地址 如果地址未在操作数大小边界上对齐 则会引发一般保护 GP 异常 movaps xmm0 xmmword ptr rax vmovaps ymm0 ymmword ptr rax vmova
  • Jenkins 通过 ssh 发布显示错误“jenkins.plugins.publish_over.BapPublisherException:无法添加 SSH 密钥。”

    为了使用 ssh 连接 jenkins 与远程服务器 我在 jenkins 中安装了通过 SSH 发布的插件 但配置后 它显示错误为 jenkins plugins publish over BapPublisherException 无法
  • 在并行包中的 R 的 par*apply 函数内部使用 Rcpp 函数

    我试图了解背后发生的事情Rcpp sourceCpp 调用并行环境 最近 问题中部分解决了这个问题 在 Windows 上使用 parLapply 中的 Rcpp 函数 https stackoverflow com questions 2
  • 小部件配置在 macOS 上不起作用

    我为我的 iOS 应用程序制作了一个小部件 效果很好 现在我正在将其移植到我的 macOS 应用程序中 但不知何故 小部件配置不起作用 这些项目已显示 但我无法以某种方式选择它们 查看屏幕截图 但请看一下我制作的视频 https youtu
  • React Native 0.61 中引入的快速刷新不起作用

    也发表在https github com facebook react native issues 27583 https github com facebook react native issues 27583 更新 一天过去了 我再次
  • Ada 中的 In/Out 与 Out

    我有一个简短的艾达问题 如果我有一个程序may写出一个变量 或者我might不用管它 它应该是一个Out参数或In Out范围 我想这可以归结为一个问题 如果调用者调用参数如下的过程 它会看到什么Out但该过程不触及参数 它看到相同的值吗
  • 在 Perl 中查找数组的大小

    我似乎遇到过几种不同的方法来查找数组的大小 这三种方法有什么区别呢 my arr 2 print scalar arr First way to print array size print arr Second way to print

随机推荐