因为你不需要寻找every关键字,但最长的关键字如果重叠,您可以使用正则表达式findall
方法。
这里的要点是,您需要首先按长度降序对关键字进行排序(因为其中有空格),然后您需要对这些值进行转义,因为它们包含特殊字符,然后您必须修改单词边界以使用明确的单词边界,(?<!\w)
and (?!\w)
(注意\b
是上下文相关的)。
Use
pat = r'(?<!\w)(?:{})(?!\w)'.format('|'.join(map(re.escape, sorted(Keywords,key=len,reverse=True))))
See an 在线Python测试:
import re
Keywords = ["Caden(S, A)", "Caden(a","Caden(.A))", "Caden.Q", "Caden.K", "Caden"]
rx = r'(?<!\w)(?:{})(?!\w)'.format('|'.join(map(re.escape, sorted(Keywords,key=len,reverse=True))))
# => (?<!\w)(?:Caden\(S,\ A\)|Caden\(\.A\)\)|Caden\(a|Caden\.Q|Caden\.K|Caden)(?!\w)
strs = ["Caden(S, A) Charlotte.A, Caden.K;", "Emily.P Ethan.B; Caden(a", "Grayson.Q, Lily; Caden(.A))", "Mason, Emily.Q Noah.B; Caden.Q - Riley.P"]
for s in strs:
print(re.findall(rx, s))
Output
['Caden(S, A)', 'Caden.K']
['Caden(a']
['Caden(.A))']
['Caden.Q']