我需要写一个严格的regular expression
替换我的某些值pandas
数据框。这是解决我发布的问题后提出的问题here.
问题是.replace(idsToReplace, regex=True)
并不严格。因此,如果 iDsToReplace 是:
NY : New York
NYC : New York City
我们替换 ID 的注释是:
My cat from NYC is large.
结果响应是:
My cat from New York is large.
里面有没有Pythonic的方式pandas
replace
函数使regular expression
匹配更严格NYC
并不是NY
?
Add \b
for word boundaries到每个键dict
:
d = {'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City', 'NY' : 'New York'}
data = {'Categories': ['animal','plant','object'],
'Type': ['tree','dog','rock'],
'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
'The rock was found in LA.']
}
d = {r'\b' + k + r'\b':v for k, v in d.items()}
df = pd.DataFrame(data)
df['commentTest'] = df['Comment'].replace(d, regex=True)
print (df)
Categories Comment Type \
0 animal The NYC tree is very big tree
1 plant NY The cat from the UK is small dog
2 object The rock was found in LA. rock
commentTest
0 The New York City tree is very big
1 New York The cat from the United Kingdom is small
2 The rock was found in Los Angeles.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)