我有一个数据框,其中每行非列表后面跟着一行包含列值中的列表。
Name |
Number |
Country |
Fruit |
Fruit Date |
Sport |
Sport Date |
Color |
Color Date |
Jack |
23 |
Canada |
Banana |
09/01/2022 |
Basketball |
09/01/2022 |
Blue |
09/01/2022 |
Jack |
23 |
Canada |
['Banana', 'Apple', 'Apple'] |
['09/01/2022', '09/02/2022', '09/02/2022'] |
['Basketball', 'Soccer', 'Hockey'] |
['09/01/2022', '09/02/2022', '09/03/2022'] |
['Blue', 'Blue', 'Red'] |
['09/01/2022', '09/01/2022', '09/02/2022'] |
John |
24 |
USA |
Banana |
09/01/2022 |
Basketball |
09/01/2022 |
Blue |
09/01/2022 |
John |
24 |
USA |
['Banana', 'Apple', 'Apple'] |
['09/01/2022', '09/02/2022', '09/02/2022'] |
['Basketball', 'Soccer', 'Hockey'] |
['09/01/2022', '09/02/2022', '09/03/2022'] |
['Blue', 'Blue', 'Red'] |
['09/01/2022', '09/01/2022', '09/02/2022'] |
我想删除每个列表中的重复列表值:Fruit、Sport、Color 及其相应的 Date 索引值。我还需要没有列表的行保持不变。我只希望带有列表的行受到此更改的影响。第一行只是每个列表中的第一次出现。
示例输出:
Name |
Number |
Country |
Fruit |
Fruit Date |
Sport |
Sport Date |
Color |
Color Date |
Jack |
23 |
Canada |
Banana |
09/01/2022 |
Basketball |
09/01/2022 |
Blue |
09/01/2022 |
Jack |
23 |
Canada |
['Banana', 'Apple'] |
['09/01/2022', '09/02/2022'] |
['Basketball', 'Soccer', 'Hockey'] |
['09/01/2022', '09/02/2022', '09/03/2022'] |
['Blue', 'Red'] |
['09/01/2022', '09/02/2022'] |
John |
24 |
USA |
Banana |
09/01/2022 |
Basketball |
09/01/2022 |
Blue |
09/01/2022 |
John |
24 |
USA |
['Banana', 'Apple'] |
['09/01/2022', '09/02/2022'] |
['Basketball', 'Soccer', 'Hockey'] |
['09/01/2022', '09/02/2022', '09/03/2022'] |
['Blue', 'Red'] |
['09/01/2022', '09/02/2022'] |
我认为这里循环是不可避免的:
for col in df.columns:
df[col] = [
list(dict.fromkeys(v)) # to preserve the order of the values
if isinstance(v, list) else v for v in df[col]
]
或者,您可以使用map https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.map.html那已经是2.1.0中添加 https://pandas.pydata.org/docs/dev/whatsnew/v2.1.0.html#new-dataframe-map-method-and-support-for-extensionarrays (从前 applymap https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.applymap.html):
df = df.map(lambda v: list(dict.fromkeys(v)) if isinstance(v, list) else v)
时间:
# loop : 907 µs ± 31.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# .map : 630 µs ± 40.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Output :
|
Name |
Number |
Country |
Fruit |
Fruit Date |
Sport |
Sport Date |
Color |
Color Date |
0 |
Jack |
23 |
Canada |
Banana |
09/01/2022 |
Basketball |
09/01/2022 |
Blue |
09/01/2022 |
1 |
Jack |
23 |
Canada |
['Banana', 'Apple'] |
['09/01/2022', '09/02/2022'] |
['Basketball', 'Soccer', 'Hockey'] |
['09/01/2022', '09/02/2022', '09/03/2022'] |
['Blue', 'Red'] |
['09/01/2022', '09/02/2022'] |
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)