加载 JSON/字典:
- Using .json_normalized https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html扩大
dict
.
import pandas as pd
data = {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
df = pd.json_normalize(data)
# display(df)
id data.name data.lastname data.office.num data.office.department
0 3241234 carol netflik 3543 trigy
如果数据框的列为dicts
- 另请参阅此answer https://stackoverflow.com/questions/63311361,对此SO:用 pandas 将一列字典拆分/分解为单独的列 https://stackoverflow.com/questions/38231591
# dataframe with column of dicts
df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})
# display(df)
col2 col
0 1 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
1 2 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
2 3 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
# normalize the column of dicts
normalized = pd.json_normalize(df['col'])
# join the normalized column to df
df = df.join(normalized).drop(columns=['col'])
# display(df)
col2 id data.name data.lastname data.office.num data.office.department
0 1 3241234 carol netflik 3543 trigy
1 2 3241234 carol netflik 3543 trigy
2 3 3241234 carol netflik 3543 trigy
如果数据框有一列lists
with dicts
- The
dicts
需要从lists
with .explode
data = [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})
# display(df)
col2 col
0 1 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
1 2 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
2 3 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
# explode the lists
df = df.explode('col', ignore_index=True)
# remove and normalize the column of dicts
normalized = pd.json_normalize(df.pop('col'))
# join the normalized column to df
df = df.join(normalized)