https://www.cnblogs.com/anai/p/11645953.html
bert 论文
![在这里插入图片描述](https://img-blog.csdnimg.cn/20200612104657544.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2t5bGUxMzE0NjA4,size_16,color_FFFFFF,t_70)
从语言模型到Seq2Seq:Transformer如戏,全靠Mask
https://zhuanlan.zhihu.com/p/69106080
深度学习 — > NLP — >Improving Language Understanding by Generative Pre-Training
https://zhuanlan.zhihu.com/p/44121378
https://zhuanlan.zhihu.com/p/32544778
https://blog.csdn.net/qq_33876194/article/details/98943383
https://zhuanlan.zhihu.com/p/93061413
[# Transformer 源码中 Mask 机制的实现
GPT解读(论文 + TensorFlow实现)
BERT源码分析(PART III)
Bert系列(三)——源码解读之Pre-trainhttps://www.jianshu.com/p/22e462f01d8c
https://www.jianshu.com/p/ff43575ab2b0
![在这里插入图片描述](https://img-blog.csdnimg.cn/2020051000145592.jpg)