【PyTorch】语言模型/Language model

2023-11-16

1 模型描述

（1）语言模型的定义，来自于维基百科

统计式的语言模型是一个几率分布。语言模型提供上下文来区分听起来相似的单词和短语。例如，短语“再给我两份葱，让我把记忆煎成饼”和“再给我两分钟，让我把记忆结成冰”听起来相似，但意思不同。
语言模型经常使用在许多自然语言处理方面的应用，如语音识别，机器翻译，词性标注，句法分析，手写体识别和资讯检索。由于字词与句子都是任意组合的长度，因此在训练过的语言模型中会出现未曾出现的字串(资料稀疏的问题)，也使得在语料库中估算字串的几率变得很困难，这也是要使用近似的平滑n-元语法(N-gram)模型之原因。
在语音辨识和在资料压缩的领域中，这种模式试图捕捉语言的特性，并预测在语音串列中的下一个字。
在语音识别中，声音与单词序列相匹配。当来自语言模型的证据与发音模型和声学模型相结合时，歧义更容易解决。

（2）数据集

这里使用的是Penn Treebank词性标记集
简单地说，语言模型就是用来计算一个句子的概率的模型，也就是判断一句话是否是人话的概率？句子概率越大，语言模型越好，迷惑度越小（from 深入浅出讲解语言模型），因此模型输出是接近人话的文本

2 相关代码

# language model
# Some part of the code was referenced from below.
# https://github.com/pytorch/examples/tree/master/word_language_model 
import torch
import torch.nn as nn
import numpy as np
from torch.nn.utils import clip_grad_norm_
# for dropout


class Dictionary(object):
    def __init__(self): # bi-directional dic
        self.word2idx = {}
        self.idx2word = {}
        self.idx = 0
    
    def add_word(self, word):
        if not word in self.word2idx:
            self.word2idx[word] = self.idx
            self.idx2word[self.idx] = word
            self.idx += 1
    
    def __len__(self):
        return len(self.word2idx)


class Corpus(object):
    def __init__(self):
        self.dictionary = Dictionary()

    def get_data(self, path, batch_size=20):
        # Add words to the dictionary
        with open(path, 'r') as f:
            tokens = 0
            for line in f:
                words = line.split() + ['<eos>']
                tokens += len(words)
                for word in words: 
                    self.dictionary.add_word(word)  
        
        # Tokenize the file content
        # recode all words and tokens
        ids = torch.LongTensor(tokens)
        token = 0
        with open(path, 'r') as f:
            for line in f:
                words = line.split() + ['<eos>']
                for word in words:
                    ids[token] = self.dictionary.word2idx[word]
                    token += 1
        num_batches = ids.size(0) // batch_size
        ids = ids[:num_batches*batch_size]
        return ids.view(batch_size, -1)


# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters
embed_size = 128
hidden_size = 1024
num_layers = 1
num_epochs = 5
num_samples = 1000     # number of words to be sampled
batch_size = 20
seq_length = 30
learning_rate = 0.002

# Load "Penn Treebank" dataset
corpus = Corpus()
ids = corpus.get_data('data/train.txt', batch_size)
vocab_size = len(corpus.dictionary)
num_batches = ids.size(1) // seq_length


# RNN based language model
class RNNLM(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, num_layers):
        super(RNNLM, self).__init__()
        self.embed = nn.Embedding(vocab_size, embed_size) # embedding like mapping
        self.lstm = nn.LSTM(embed_size, hidden_size, num_layers, batch_first=True)
        self.linear = nn.Linear(hidden_size, vocab_size) # outlayer is a linear function
        
    def forward(self, x, h):
        # Embed word ids to vectors
        x = self.embed(x)
        
        # Forward propagate LSTM
        out, (h, c) = self.lstm(x, h)
        
        # Reshape output to (batch_size*sequence_length, hidden_size)
        out = out.reshape(out.size(0)*out.size(1), out.size(2))
        
        # Decode hidden states of all time steps
        out = self.linear(out)
        return out, (h, c)

model = RNNLM(vocab_size, embed_size, hidden_size, num_layers).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Truncated backpropagation
def detach(states):
    return [state.detach() for state in states] 

# Train the model
for epoch in range(num_epochs):
    # Set initial hidden and cell states
    states = (torch.zeros(num_layers, batch_size, hidden_size).to(device),
              torch.zeros(num_layers, batch_size, hidden_size).to(device))
    
    for i in range(0, ids.size(1) - seq_length, seq_length):
        # Get mini-batch inputs and targets
        inputs = ids[:, i:i+seq_length].to(device)
        targets = ids[:, (i+1):(i+1)+seq_length].to(device)
        
        # Forward pass
        states = detach(states)
        outputs, states = model(inputs, states)
        loss = criterion(outputs, targets.reshape(-1))
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        clip_grad_norm_(model.parameters(), 0.5)
        optimizer.step()

        step = (i+1) // seq_length
        if step % 100 == 0:
            print ('Epoch [{}/{}], Step[{}/{}], Loss: {:.4f}, Perplexity: {:5.2f}'
                   .format(epoch+1, num_epochs, step, num_batches, loss.item(), np.exp(loss.item())))

# Test the model
with torch.no_grad():
    with open('sample.txt', 'w') as f:
        # Set intial hidden ane cell states
        state = (torch.zeros(num_layers, 1, hidden_size).to(device),
                 torch.zeros(num_layers, 1, hidden_size).to(device))

        # Select one word id randomly
        prob = torch.ones(vocab_size)
        input = torch.multinomial(prob, num_samples=1).unsqueeze(1).to(device)

        for i in range(num_samples):
            # Forward propagate RNN 
            output, state = model(input, state)

            # Sample a word id
            prob = output.exp()
            word_id = torch.multinomial(prob, num_samples=1).item()

            # Fill input with sampled word id for the next time step
            input.fill_(word_id)

            # File write
            word = corpus.dictionary.idx2word[word_id]
            word = '\n' if word == '<eos>' else word + ' '
            f.write(word)

            if (i+1) % 100 == 0:
                print('Sampled [{}/{}] words and save to {}'.format(i+1, num_samples, 'sample.txt'))

# Save the model checkpoints
torch.save(model.state_dict(), 'model.ckpt')

3 程序输出

上述程序的输出如下所示，随着训练程度增加（速度较慢），Loss（交叉熵）和Perplexity（Loss的e次方）不断下降，则模型输出的语言更接近人话。

Epoch [1/5], Step[0/1549], Loss: 9.2150, Perplexity: 10046.61
Epoch [1/5], Step[100/1549], Loss: 6.0423, Perplexity: 420.85
Epoch [1/5], Step[200/1549], Loss: 5.9387, Perplexity: 379.44
Epoch [1/5], Step[300/1549], Loss: 5.7512, Perplexity: 314.56
Epoch [1/5], Step[400/1549], Loss: 5.6709, Perplexity: 290.30
Epoch [1/5], Step[500/1549], Loss: 5.1621, Perplexity: 174.54
Epoch [1/5], Step[600/1549], Loss: 5.1755, Perplexity: 176.89
Epoch [1/5], Step[700/1549], Loss: 5.3721, Perplexity: 215.32
Epoch [1/5], Step[800/1549], Loss: 5.1827, Perplexity: 178.17
Epoch [1/5], Step[900/1549], Loss: 5.0756, Perplexity: 160.06
Epoch [1/5], Step[1000/1549], Loss: 5.1428, Perplexity: 171.19
Epoch [1/5], Step[1100/1549], Loss: 5.3263, Perplexity: 205.67
Epoch [1/5], Step[1200/1549], Loss: 5.1895, Perplexity: 179.39
Epoch [1/5], Step[1300/1549], Loss: 5.0724, Perplexity: 159.56
Epoch [1/5], Step[1400/1549], Loss: 4.8528, Perplexity: 128.10
Epoch [1/5], Step[1500/1549], Loss: 5.1661, Perplexity: 175.22
Epoch [2/5], Step[0/1549], Loss: 5.4163, Perplexity: 225.05
Epoch [2/5], Step[100/1549], Loss: 4.5526, Perplexity: 94.88
Epoch [2/5], Step[200/1549], Loss: 4.6929, Perplexity: 109.17
Epoch [2/5], Step[300/1549], Loss: 4.6444, Perplexity: 104.00
Epoch [2/5], Step[400/1549], Loss: 4.5688, Perplexity: 96.42
Epoch [2/5], Step[500/1549], Loss: 4.1592, Perplexity: 64.02
Epoch [2/5], Step[600/1549], Loss: 4.4269, Perplexity: 83.67
Epoch [2/5], Step[700/1549], Loss: 4.3720, Perplexity: 79.20
Epoch [2/5], Step[800/1549], Loss: 4.4036, Perplexity: 81.74
Epoch [2/5], Step[900/1549], Loss: 4.1653, Perplexity: 64.41
Epoch [2/5], Step[1000/1549], Loss: 4.3449, Perplexity: 77.08
Epoch [2/5], Step[1100/1549], Loss: 4.4840, Perplexity: 88.59
Epoch [2/5], Step[1200/1549], Loss: 4.4659, Perplexity: 87.00
Epoch [2/5], Step[1300/1549], Loss: 4.1735, Perplexity: 64.94
Epoch [2/5], Step[1400/1549], Loss: 3.9952, Perplexity: 54.34
Epoch [2/5], Step[1500/1549], Loss: 4.2860, Perplexity: 72.67
Epoch [3/5], Step[0/1549], Loss: 4.4764, Perplexity: 87.91
Epoch [3/5], Step[100/1549], Loss: 3.8185, Perplexity: 45.54
Epoch [3/5], Step[200/1549], Loss: 4.0630, Perplexity: 58.15
Epoch [3/5], Step[300/1549], Loss: 3.8839, Perplexity: 48.62
Epoch [3/5], Step[400/1549], Loss: 3.9263, Perplexity: 50.72
Epoch [3/5], Step[500/1549], Loss: 3.4153, Perplexity: 30.43
Epoch [3/5], Step[600/1549], Loss: 3.8813, Perplexity: 48.49
Epoch [3/5], Step[700/1549], Loss: 3.7443, Perplexity: 42.28
Epoch [3/5], Step[800/1549], Loss: 3.7594, Perplexity: 42.92
Epoch [3/5], Step[900/1549], Loss: 3.4794, Perplexity: 32.44
Epoch [3/5], Step[1000/1549], Loss: 3.6235, Perplexity: 37.47
Epoch [3/5], Step[1100/1549], Loss: 3.7085, Perplexity: 40.79
Epoch [3/5], Step[1200/1549], Loss: 3.8110, Perplexity: 45.20
Epoch [3/5], Step[1300/1549], Loss: 3.4499, Perplexity: 31.50
Epoch [3/5], Step[1400/1549], Loss: 3.2214, Perplexity: 25.06
Epoch [3/5], Step[1500/1549], Loss: 3.5429, Perplexity: 34.57
Epoch [4/5], Step[0/1549], Loss: 3.6315, Perplexity: 37.77
Epoch [4/5], Step[100/1549], Loss: 3.2487, Perplexity: 25.76
Epoch [4/5], Step[200/1549], Loss: 3.5140, Perplexity: 33.58
Epoch [4/5], Step[300/1549], Loss: 3.3193, Perplexity: 27.64
Epoch [4/5], Step[400/1549], Loss: 3.4360, Perplexity: 31.06
Epoch [4/5], Step[500/1549], Loss: 2.9549, Perplexity: 19.20
Epoch [4/5], Step[600/1549], Loss: 3.3490, Perplexity: 28.48
Epoch [4/5], Step[700/1549], Loss: 3.3122, Perplexity: 27.45
Epoch [4/5], Step[800/1549], Loss: 3.2668, Perplexity: 26.23
Epoch [4/5], Step[900/1549], Loss: 2.9631, Perplexity: 19.36
Epoch [4/5], Step[1000/1549], Loss: 3.1250, Perplexity: 22.76
Epoch [4/5], Step[1100/1549], Loss: 3.2380, Perplexity: 25.48
Epoch [4/5], Step[1200/1549], Loss: 3.2806, Perplexity: 26.59
Epoch [4/5], Step[1300/1549], Loss: 2.9988, Perplexity: 20.06
Epoch [4/5], Step[1400/1549], Loss: 2.7011, Perplexity: 14.90
Epoch [4/5], Step[1500/1549], Loss: 3.1112, Perplexity: 22.45
Epoch [5/5], Step[0/1549], Loss: 3.0950, Perplexity: 22.09
Epoch [5/5], Step[100/1549], Loss: 2.8688, Perplexity: 17.62
Epoch [5/5], Step[200/1549], Loss: 3.1285, Perplexity: 22.84
Epoch [5/5], Step[300/1549], Loss: 2.9598, Perplexity: 19.29
Epoch [5/5], Step[400/1549], Loss: 3.1288, Perplexity: 22.85
Epoch [5/5], Step[500/1549], Loss: 2.6090, Perplexity: 13.58
Epoch [5/5], Step[600/1549], Loss: 3.0915, Perplexity: 22.01
Epoch [5/5], Step[700/1549], Loss: 2.9536, Perplexity: 19.18
Epoch [5/5], Step[800/1549], Loss: 2.9605, Perplexity: 19.31
Epoch [5/5], Step[900/1549], Loss: 2.6687, Perplexity: 14.42
Epoch [5/5], Step[1000/1549], Loss: 2.8161, Perplexity: 16.71
Epoch [5/5], Step[1100/1549], Loss: 2.9194, Perplexity: 18.53
Epoch [5/5], Step[1200/1549], Loss: 3.0538, Perplexity: 21.20
Epoch [5/5], Step[1300/1549], Loss: 2.6999, Perplexity: 14.88
Epoch [5/5], Step[1400/1549], Loss: 2.4688, Perplexity: 11.81
Epoch [5/5], Step[1500/1549], Loss: 2.7906, Perplexity: 16.29
Sampled [100/1000] words and save to sample.txt
Sampled [200/1000] words and save to sample.txt
Sampled [300/1000] words and save to sample.txt
Sampled [400/1000] words and save to sample.txt
Sampled [500/1000] words and save to sample.txt
Sampled [600/1000] words and save to sample.txt
Sampled [700/1000] words and save to sample.txt
Sampled [800/1000] words and save to sample.txt
Sampled [900/1000] words and save to sample.txt
Sampled [1000/1000] words and save to sample.txt

截取的sample.txt的部分内容如下，基本上语言表述是可以的，但还是缺乏逻辑能力。只能通过增大语料库，并增加训练程度来提升。

N repeal according to takeover experts 
for the british and canada are insolvent interest to the cost of how the transactions will seek additional information on its u.k. business 
the banks badly previously concluded that americans should slash the impact of income returns by the agency but they warn that it should be known for many small <unk> 
if the central park world will be made only a german economy and the state industries which has <unk> to foreigners how the u.s. can then by how to pay political 's healthy benefit push for such tasks as thrifts often 
while the baltimore for managers are among the <unk> concerned the average families of the cause in the area will translate to the higher costs for beneficiaries days 
he favors a resignation of selling such matters openly that <unk> doubled made all penalties 
there 's no answers at that time the proper has missed it the dead force is <unk> out of control 
we call it and easy for rights to pay attention 
i think we owe japanese investment and obviously not even agree 
until then mr. bryant is a much higher degree and more important for the country

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

Pytorch

【PyTorch】语言模型/Language model 的相关文章

torchvision.transforms.Normalize 是如何操作的？

我不明白如何标准化Pytorch works 我想将平均值设置为0和标准差1跨越张量中的所有列x形状的 2 2 3 一个简单的例子 gt gt gt x torch tensor 1 2 3 4 5 6 7 8 9 10 11 12 gt
为什么 RNN 需要两个偏置向量？

In Pytorch RNN 实现 http pytorch org docs master nn html highlight rnn torch nn RNN 有两个偏差 b ih and b hh 为什么是这样它与使用一种偏差有什么
尝试理解 Pytorch 的 LSTM 实现

我有一个包含 1000 个示例的数据集其中每个示例都有5特征 a b c d e 我想喂7LSTM 的示例以便它预测第 8 天的特征 a 阅读 nn LSTM 的 Pytorchs 文档我得出以下结论 input size 5 hid
PyTorch 中复数矩阵的行列式

有没有办法在 PyTorch 中计算复矩阵的行列式 torch det未针对 ComplexFloat 实现不幸的是目前尚未实施一种方法是实现您自己的版本或简单地使用np linalg det 这是一个简短的函数它计算我使用 LU
pytorch 中的 keras.layers.Masking 相当于什么？

我有时间序列序列我需要通过将零填充到矩阵中并在 keras 中使用 keras layers Masking 来将序列的长度固定为一个数字我可以忽略这些填充的零以进行进一步的计算我想知道它怎么可能在 Pytorch 中完成要么我需要
pytorch 中的 autograd 可以处理同一模块中层的重复使用吗？

我有一层layer in an nn Module并在一次中使用两次或多次forward步这个的输出layer稍后输入到相同的layer pytorch可以吗autograd正确计算该层权重的梯度 def forward x x self
如何更新 PyTorch 中神经网络的参数？

假设我想将神经网络的所有参数相乘PyTorch 继承自的类的实例torch nn Module http pytorch org docs master nn html torch nn Module by 0 9 我该怎么做呢 Let n
如何计算 CNN 第一个线性层的维度

目前我正在使用 CNN 其中附加了一个完全连接的层并且我正在使用尺寸为 32x32 的 3 通道图像我想知道是否有一个一致的公式可以用来计算第一个线性层的输入尺寸和最后一个卷积最大池层的输入我希望能够计算第一个线性层的尺寸仅给出
Pytorch“展开”等价于 Tensorflow [重复]

这个问题在这里已经有答案了假设我有大小为 50 50 的灰度图像在本例中批量大小为 2 并且我使用 Pytorch Unfold 函数如下所示 import numpy as np from torch import nn from
如何从已安装的云端硬盘文件夹中永久删除？

我编写了一个脚本在每次迭代后将我的模型和训练示例上传到 Google Drive 以防发生崩溃或任何阻止笔记本运行的情况如下所示 drive path drive My Drive Colab Notebooks models if p
PyTorch 中的交叉熵

交叉熵公式但为什么下面给出loss 0 7437代替loss 0 since 1 log 1 0 import torch import torch nn as nn from torch autograd import Variable
Fine-Tuning DistilBertForSequenceClassification：不是学习，为什么loss没有变化？权重没有更新？

我对 PyTorch 和 Huggingface transformers 比较陌生并对此尝试了 DistillBertForSequenceClassificationKaggle 数据集 https www kaggle com c
PyTorch LSTM 中的“隐藏”和“输出”有什么区别？

我无法理解 PyTorch 的 LSTM 模块以及类似的 RNN 和 GRU 的文档关于输出它说输出输出 h n c n 输出 seq len batch hidden size num directions 包含RNN最后一层的
Pytorch .to('cuda') 或 .cuda() 不起作用并且卡住了

我正在尝试做 pytorch 教程当我尝试将他们的设备设置为 cuda 时它不起作用并且我的代码运行被卡住有关具体信息我正在使用 conda 环境蟒蛇3 7 3 火炬1 3 0 cuda 10 2 NVIDIA RTX2080TI
无法在 Windows 10 上构建 Detectron2

尽管 Windows 上的 Detectron2 没有官方支持但有很多可用的说明我尝试按照这些说明进行操作但最终出现了相同的错误这是我的设置 OS Windows 10 专业版 19043 1466 微软视觉工作室 2019 CUD
如何让火车装载机使用特定数量的图像？

假设我正在使用以下调用 trainset torchvision datasets ImageFolder root imgs transform transform trainloader torch utils data DataLoa
如何在pytorch中动态索引张量？

例如我有一个张量 tensor torch rand 12 512 768 我得到了一个索引列表说它是 0 2 3 400 5 32 7 8 321 107 100 511 我希望从给定索引列表的维度 2 上的 512 个元素中选择 1
PyTorch 中的数据增强

我对 PyTorch 中执行的数据增强有点困惑现在据我所知当我们执行数据增强时我们保留原始数据集然后添加它的其他版本翻转裁剪等但 PyTorch 中似乎并没有发生这种情况据我从参考文献中了解到当我们使用data tra
将 Pytorch 模型 .pth 转换为 onnx 模型

我有一个预训练的模型其格式为 pth 扩展名我想将其转换为 Tensorflow protobuf 但我没有找到任何方法来做到这一点我见过 onnx 可以将模型从 pytorch 转换为 onnx 然后从 onnx 转换为 Tenso
PyTorch 中的后向函数

我对 pytorch 的后向功能有一些疑问我认为我没有得到正确的输出 import numpy as np import torch from torch autograd import Variable a Variable torch

随机推荐