无效参数:indices[0,0] = -4 不在 [0, 40405) 中



vocab_size = len(tokenizer.word_index) + 1
comment_texts = df.comment_text.values

tokenizer = Tokenizer(num_words=num_words)

comment_seq = tokenizer.texts_to_sequences(comment_texts)
maxtrainlen = max_length(comment_seq)
comment_train = pad_sequences(comment_seq, maxlen=maxtrainlen, padding='post')
vocab_size = len(tokenizer.word_index) + 1

df.comment_text = comment_train

x = df.drop('label', 1) # the thing I'm training

labels = df['label'].values  # Also known as Y

x_train, x_test, y_train, y_test = train_test_split(
    x, labels, test_size=0.2, random_state=1337)        

n_cols = x_train.shape[1]

embedding_dim = 100  # TODO: why?

model = Sequential([
            Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_shape=(n_cols,)),
            Dense(32, activation='relu'),
            Dense(512, activation='relu'),
            Dense(12, activation='softmax'),  # for an unknown type, we don't account for that while training


# convert the y_train to a one hot encoded variable
encoder = LabelEncoder()
encoder.fit(labels)  # fit on all the labels
encoded_Y = encoder.transform(y_train)  # encode on y_train
one_hot_y = np_utils.to_categorical(encoded_Y)

model.fit(x_train, one_hot_y, epochs=10, batch_size=16)


Model: "sequential"
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 12, 100)           4040500   
lstm (LSTM)                  (None, 32)                17024     
dense (Dense)                (None, 32)                1056      
dense_1 (Dense)              (None, 512)               16896     
dense_2 (Dense)              (None, 12)                6156      
Total params: 4,081,632
Trainable params: 4,081,632
Non-trainable params: 0
Train on 4702 samples
Epoch 1/10
2020-03-04 22:37:59.499238: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: indices[0,0] = -4 is not in [0, 40405)

我认为这一定来自我的 comment_text 列,因为这是我添加的唯一内容。

Here is what comment_text looks like before I make the substitution: before

And here is after: after

我的完整代码(在进行更改之前)在这里:https://colab.research.google.com/drive/1y8Lhxa_DROZg0at3VR98fi5WCcunUhyc#scrollTo=hpEoqR4ne9TO https://colab.research.google.com/drive/1y8Lhxa_DROZg0at3VR98fi5WCcunUhyc#scrollTo=hpEoqR4ne9TO


The embedding_dim=100可以自由选择。这就像隐藏层中的单元数。您可以调整此参数来找到最适合您的模型的参数,也可以调整隐藏层中的单元数量。


  • 评论的一次输入,通过嵌入和处理文本
  • 其余数据的另一个输入,可能通过标准网络传递。


这个链接有一个很好的教程函数式API模型并显示具有两个文本输入和一个额外输入的模型:https://www.tensorflow.org/guide/keras/functional https://www.tensorflow.org/guide/keras/functional


