我试图找出数据框中是否出现某些模式。
假设我有以下“模式词典”(注意“james”与“jamesj”):
patterns <- c("john", "jack", "james", "jamesj", "jason")
我的实际数据框(“date_frame”)如下所示:
id names
1 1 johnjack jameS
2 2 john/james, jasonjames
3 3 peter_jackjason
4 4 jamesjasonj jack
5 5 jamesjjason, johnjasonjohn , jason-jack sam _ peter
我试图产生的最终结果应该如下所示:
id names
1 1 john, jack, james
2 2 john, james, jason, james
3 3 peter, jack, jason
4 4 jamesj, asonj, jack
5 5 jamesj, jason, john, jason, john , jason, jack, sam , peter
我尝试在这里查看这篇文章(R:在输出的每个元素后面插入逗号 https://stackoverflow.com/questions/56103591/r-insert-comma-after-each-element-from-the-output)并尝试了那里提供的答案:
> data_frame$parsed_names = dput(data_frame$names)
id names parsed_names
1 1 john, jack, james john, jack, james
2 2 john, james, jason, james john, james, jason, james
3 3 peter, jack, jason peter, jack, jason
4 4 jamesj, asonj, jack jamesj, asonj, jack
5 5 jamesj, jason, john, jason, john , jason, jack, sam , peter jamesj, jason, john, jason, john , jason, jack, sam , peter
但这并不符合我想要的。
然后我在这里尝试了这篇文章(在 r 中的某些单词后面的文本字符串中插入逗号 https://stackoverflow.com/questions/54078683/insert-commas-in-text-string-after-certain-words-in-r)并尝试了那里提供的答案:
library(gsubfn)
data_frame$parsed_names = gsubfn("\\w+", as.list(setNames(paste0(patterns, ","), patterns)),
format(data_frame$names))
data_frame
id names parsed_names
1 1 john, jack, james john,, jack,, james,
2 2 john, james, jason, james john,, james,, jason,, james,
3 3 peter, jack, jason peter, jack,, jason,
4 4 jamesj, asonj, jack jamesj,, asonj, jack,
5 5 jamesj, jason, john, jason, john , jason, jack, sam , peter jamesj,, jason,, john,, jason,, john, , jason,, jack,, sam , peter
谢谢你!