我有一组丑陋且复杂的字符串,我必须将其拆分:
vec <- c("'01'", "'01' '02'",
"#bateau", "#bateau #batiment",
"#'autres 32'", "#'autres 32' #'batiment 30'", "#'autres 32' #'batiment 30' #'contenu 31'",
"#'34'", "#'34' #'33' #'35'")
vec
[1] "'01'" "'01' '02'"
[3] "#bateau" "#bateau #batiment"
[5] "#'autres 32'" "#'autres 32' #'batiment 30'"
[7] "#'autres 32' #'batiment 30' #'contenu 31'" "#'34'"
[9] "#'34' #'33' #'35'"
我需要在有空格的地方分割字符串(
),除非之间有空格'
。所以在上面的例子中,'01' '02'
会成为'01'
and '02'
while #'autres 32' #'batiment 30'
会成为#'autres 32'
and #'batiment 30'
.
我尝试从中获得灵感这个问题 https://stackoverflow.com/questions/45811754/regex-match-everything-except-words-between-quotes,但并没有走远:
strsplit(vec, "(\\s[^']+?)('.*?'|$)")
因为这个解决方案分割了一些不应该的空间,也让我丢失了一些信息。
分割的结果应该是这样的:
res <- c("'01'", "'01'", "'02'",
"#bateau", "#bateau", "#batiment",
"#'autres 32'", "#'autres 32'", "#'batiment 30'", "#'autres 32'", "#'batiment 30'", "#'contenu 31'",
"#'34'", "#'34'", "#'33'", "#'35'")
分割该字符串的正确正则表达式是什么?
Thanks