假设您可以在之后拥有一个或多个空间<a
,并且周围有零个或多个空间=
标志,以下应该有效:
$ cat in.txt
<a href="http://www.wowhead.com/?search=Superior Mana Oil">
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">
#
# The command to do the substitution
#
$ sed -e 's#<a[ \t][ \t]*href[ \t]*=[ \t]*".*search[ \t]*=[ \t]*\([^"]*\)">#&\1</a>#' in.txt
<a href="http://www.wowhead.com/?search=Superior Mana Oil">Superior Mana Oil</a>
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">Tabard of Brute Force</a>
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">Tabard of the Wyrmrest Accord</a>
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">Tattered Hexcloth Sack</a>
如果您确定没有多余的空格,该模式将简化为:
s#<a href=".*search=\([^"]*\)">#&\1</a>#
In sed
, s
后跟任意字符 (#
在这种情况下)开始替换。被替换的模式直到相同字符第二次出现为止。因此,在我们的第二个示例中,要替换的模式是:<a href=".*search=\([^"]*\)">
。我用了\([^"]*\)
意思是,任何非"
字符,并将其保存在反向引用中\1
(the \(\)
对表示反向引用)。最后,下一个标记由#
是替代品。&
in sed
代表“无论匹配什么”,在本例中是整行,并且\1
只匹配链接文本。
又是这样的模式:
's#<a[ \t][ \t]*href[ \t]*=[ \t]*".*search[ \t]*=[ \t]*\([^"]*\)">#&\1</a>#'
及其解释:
' quote so as to avoid shell interpreting the characters
s substitute
# delimiter
<a[ \t][ \t]* <a followed by one or more whitespace
href[ \t][ \t]*=[ \t]* href followed by optional space, = followed by optional space
".*search[ \t]*=[ \t]* " followed by as many characters as needed, followed by
search, optional space, =, followed by optional space
\([^"]*\) a sequence of non-" characters, saved in \1
"> followed by ">
# delimiter, replacement pattern starts
&\1 the matched pattern, followed by backreference \1.
</a> end the </a> tag
# end delimiter
' end quote
如果你是really确信总会有search=
接下来是你想要的文字,你可以这样做:
$ sed -e 's#.*search=\(.*\)">#&\1</a>#'
希望有帮助。