运行时错误:CUDA 错误:设备端断言已触发 - 训练 LayoutLMV3 时

2024-04-03

我正在训练最新版本的layoutLMv3模型,但在开始训练时trainer.train()出现以下错误。请帮我解决它。我使用的是 v100 4 GPU:

RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_3844/4032920361.py in <module>
----> 1 trainer.train()

/data/anaconda3/envs/data/lib/python3.7/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1417             resume_from_checkpoint=resume_from_checkpoint,
   1418             trial=trial,
-> 1419             ignore_keys_for_eval=ignore_keys_for_eval,
   1420         )
   1421 

/data/anaconda3/envs/data/lib/python3.7/site-packages/transformers/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1655                         tr_loss_step = self.training_step(model, inputs)
   1656                 else:
-> 1657                     tr_loss_step = self.training_step(model, inputs)
   1658 
   1659                 if (

/data/anaconda3/envs/data/lib/python3.7/site-packages/transformers/trainer.py in training_step(self, model, inputs)
   2348 
   2349         with self.compute_loss_context_manager():
-> 2350             loss = self.compute_loss(model, inputs)
   2351 
   2352         if self.args.n_gpu > 1:
...
    visual_bbox = visual_bbox.to(device).type(dtype)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

None

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

运行时错误:CUDA 错误:设备端断言已触发 - 训练 LayoutLMV3 时 的相关文章

随机推荐