我有一个 pandas df,其中每一行都是单词列表。该列表有重复的单词。我想删除重复的单词。
我尝试在 for 循环中使用 dict.fromkeys(listname) 来迭代 df 中的每一行。但这将单词分成字母表
filepath = "C:/abc5/Python/Clustering/output2.csv"
df = pd.read_csv(filepath,encoding='windows-1252')
df["newlist"] = df["text_lemmatized"]
for i in range(0,len(df)):
l = df["text_lemmatized"][i]
df["newlist"][i] = list(dict.fromkeys(l))
print(df)
预期结果是==>
['clear', 'pending', 'order', 'pending', 'order'] ['clear', 'pending', 'order']
['pending', 'activation', 'clear', 'pending'] ['pending', 'activation', 'clear']
实际结果是
['clear', 'pending', 'order', 'pending', 'order'] ... [[, ', c, l, e, a, r, ,, , p, n, d, i, g, o, ]]
['pending', 'activation', 'clear', 'pending', ... ... [[, ', p, e, n, d, i, g, ,, , a, c, t, v, o, ...