我正在尝试使用 simpleXML 来获取数据http://rates.fxcm.com/RatesXML http://rates.fxcm.com/RatesXML
Using simplexml_load_file()
我有时会遇到错误,因为这个网站在 xml 文件之前和之后总是有奇怪的字符串/数字。
例子:
2000<?xml version="1.0" encoding="UTF-8"?>
<Rates>
<Rate Symbol="EURUSD">
<Bid>1.27595</Bid>
<Ask>1.2762</Ask>
<High>1.27748</High>
<Low>1.27385</Low>
<Direction>-1</Direction>
<Last>23:29:11</Last>
</Rate>
</Rates>
0
然后我决定使用 file_get_contents 并将其解析为字符串simplexml_load_string()
,之后我用substr()
删除前后的字符串。然而,有时随机字符串会出现在节点之间,如下所示:
<Rate Symbol="EURTRY">
<Bid>2.29443</Bid>
<Ask>2.29562</Ask>
<High>2.29841</High>
<Low>2.28999</Low>
137b
<Direction>1</Direction>
<Last>23:29:11</Last>
</Rate>
我的问题是,我是否可以使用任何正则表达式函数一次性处理所有这些随机字符串,无论它们放置在哪里? (认为这比联系网站让他们广播正确的 xml 文件更好)
我相信使用正则表达式预处理 XML 可能和解析它一样糟糕 https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454.
但这里有一个 preg 替换,它从字符串的开头、字符串的结尾以及结束/自结束标记之后删除所有非空白字符:
$string = preg_replace( '~
(?| # start of alternation where capturing group count starts from
# 1 for each alternative
^[^<]* # match non-< characters at the beginning of the string
| # OR
[^>]*$ # match non-> characters at the end of the string
| # OR
( # start of capturing group $1: closing tag
</[^>]++> # match a closing tag; note the possessive quantifier (++); it
# suppresses backtracking, which is a convenient optimization,
# the following bit is mutually exclusive anyway (this will be
# used throughout the regex)
\s++ # and the following whitespace
) # end of $1
[^<\s]*+ # match non-<, non-whitespace characters (the "bad" ones)
(?: # start subgroup to repeat for more whitespace/non-whitespace
# sequences
\s++ # match whitespace
[^<\s]++ # match at least one "bad" character
)* # repeat
# note that this will kind of pattern keeps all whitespace
# before the first and the last "bad" character
| # OR
( # start of capturing group $1: self-closing tag
<[^>/]+/> # match a self-closing tag
\s++ # and the following whitespace
)
[^<]*+(?:\s++[^<\s]++)*
# same as before
) # end of alternation
~x',
'$1',
$input);
然后我们只需写回结束或自结束标记(如果有的话)。
这种方法不安全的原因之一是闭合或自闭合标记可能出现在注释或属性字符串内。但我很难建议您使用 XML 解析器来代替,因为您的 XML 解析器也无法解析 XML。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)