Setup
您的数据结构不方便。我想重点关注:
- 获取列表
'IDs'
放入字典列表中,这会方便得多。
- 摆脱父字典中无用的键。我们关心的只是价值观。
Your data
:
{1: {'Name': 'Thrilling Tales of Dragon Slayers',
'IDs': {'StoreID': ['123445452543'],
'BookID': ['543533254353'],
'SalesID': ['543267765345']}},
2: {'Name': 'boring Tales of Dragon Slayers',
'IDs': {'StoreID': ['111111', '1121111'],
'BookID': ['543533254353', '4324232342'],
'SalesID': ['543267765345', '4353543']}}}
我希望它看起来像什么:
[{'Name': 'Thrilling Tales of Dragon Slayers',
'IDs': [{'StoreID': '123445452543',
'BookID': '543533254353',
'SalesID': '543267765345'}]},
{'Name': 'boring Tales of Dragon Slayers',
'IDs': [{'StoreID': '111111',
'BookID': '543533254353',
'SalesID': '543267765345'},
{'StoreID': '1121111',
'BookID': '4324232342',
'SalesID': '4353543'}]}]
重组数据
合理的方式
简单的循环,不要搞乱。这得到了我上面展示的内容
new = []
for v in data.values():
temp = {**v} # This is intended to keep all the other data that might be there
ids = temp.pop('IDs') # I have to focus on this to create the records
temp['IDs'] = [dict(zip(ids, x)) for x in zip(*ids.values())]
new.append(temp)
可爱的单线
new = [{**v, 'IDs': [dict(zip(v['IDs'], x)) for x in zip(*v['IDs'].values())]} for v in data.values()]
Create DataFrame
with pd.json_normalize
在这次通话中json_normalize
我们需要指定记录的路径,即在'IDs'
key. json_normalize
将为该列表中的每个项目在数据框中创建一行。这将通过record_path
参数,我们传递一个tuple
描述路径(如果它在更深的结构中)或字符串(如果键位于顶层,对我们来说,它是)。
record_path = 'IDs'
然后我们想告诉json_normalize
什么键是记录的元数据。如果像我们一样有多个记录,则每条记录都会重复元数据。
meta = 'Name'
所以最终的解决方案如下所示:
pd.json_normalize(new, record_path='IDs', meta='Name')
StoreID BookID SalesID Name
0 123445452543 543533254353 543267765345 Thrilling Tales of Dragon Slayers
1 111111 543533254353 543267765345 boring Tales of Dragon Slayers
2 1121111 4324232342 4353543 boring Tales of Dragon Slayers
However
如果我们无论如何都要进行重组,不妨将其传递给数据帧构造函数。
pd.DataFrame([
{'Name': r['Name'], **dict(zip(r['IDs'], x))}
for r in data.values() for x in zip(*r['IDs'].values())
])
Name StoreID BookID SalesID
0 Thrilling Tales of Dragon Slayers 123445452543 543533254353 543267765345
1 boring Tales of Dragon Slayers 111111 543533254353 543267765345
2 boring Tales of Dragon Slayers 1121111 4324232342 4353543
奖励内容
当我们这样做的时候。关于每个 id 类型是否具有相同数量的 id,数据不明确。假设他们没有。
data = {1:{
'Name': "Thrilling Tales of Dragon Slayers",
'IDs':{
"StoreID": ['123445452543'],
"BookID": ['543533254353'],
"SalesID": ['543267765345']}},
2:{
'Name': "boring Tales of Dragon Slayers",
'IDs':{
"StoreID": ['111111', '1121111'],
"BookID": ['543533254353', '4324232342'],
"SalesID": ['543267765345', '4353543', 'extra id']}}}
然后我们可以使用zip_longest
from itertools
from itertools import zip_longest
pd.DataFrame([
{'Name': r['Name'], **dict(zip(r['IDs'], x))}
for r in data.values() for x in zip_longest(*r['IDs'].values())
])
Name StoreID BookID SalesID
0 Thrilling Tales of Dragon Slayers 123445452543 543533254353 543267765345
1 boring Tales of Dragon Slayers 111111 543533254353 543267765345
2 boring Tales of Dragon Slayers 1121111 4324232342 4353543
3 boring Tales of Dragon Slayers None None extra id