我想像下面这样解析 CSV 记录awk
or gawk
.
这些字段以逗号分隔,但最后一个字段 ($6
) 很特殊,因为它确实由子字段组成。这些子字段由 # 作为字段分隔符(或者,准确地说,“.#”)分隔。这本身不是问题:我可以使用awk -F'(,)|(. # )'
设置替代字段分隔符。
但是,最后一个字段中也有一些需要忽略的逗号。
有没有办法解决这个问题awk
,也许使用 FPAT?
记录样本:
"http://publications.europa.eu/resource/cellar/3befa3c3-a9af-4dac-baa2-92e95cb6e3ab","http://publications.europa.eu/resource/cellar/3befa3c3-a9af-4dac-baa2-92e95cb6e3ab.0002","EU:C:1985:443","61984CJ0239","Gerlach","Judgment of the Court (Third Chamber) of 24 October 1985. # Gerlach & Co. BV, Internationale Expeditie, v Minister van Economische Zaken. # Reference for a preliminary ruling: College van Beroep voor het Bedrijfsleven - Netherlands. # Article 41 ECSC - Anti-dumping duties. # Case 239/84."
Using FPAT
特色于gnu-awk
,你也许能够做到这一点。我们用FPAT
匹配所有双引号字段或逗号分隔字段。最后我们使用最后一个字段进行分割/\. # /
正则表达式模式。
s='"http://publications.europa.eu/resource/cellar/3befa3c3-a9af-4dac-baa2-92e95cb6e3ab","http://publications.europa.eu/resource/cellar/3befa3c3-a9af-4dac-baa2-92e95cb6e3ab.0002","EU:C:1985:443","61984CJ0239","Gerlach","Judgment of the Court (Third Chamber) of 24 October 1985. # Gerlach & Co. BV, Internationale Expeditie, v Minister van Economische Zaken. # Reference for a preliminary ruling: College van Beroep voor het Bedrijfsleven - Netherlands. # Article 41 ECSC - Anti-dumping duties. # Case 239/84."'
awk -v FPAT='"[^"]*"|[^,]+' '{
# loop through all fields except last one
for (i=1; i<NF; ++i)
print i, $i
# split last field using /\. # / regex and print each token
for (j=1; j<split($NF, a, /\. # /); ++j)
print i+j-1, a[j]
}' <<< "$s"
1 "http://publications.europa.eu/resource/cellar/3befa3c3-a9af-4dac-baa2-92e95cb6e3ab"
2 "http://publications.europa.eu/resource/cellar/3befa3c3-a9af-4dac-baa2-92e95cb6e3ab.0002"
3 "EU:C:1985:443"
4 "61984CJ0239"
5 "Gerlach"
6 "Judgment of the Court (Third Chamber) of 24 October 1985
7 Gerlach & Co. BV, Internationale Expeditie, v Minister van Economische Zaken
8 Reference for a preliminary ruling: College van Beroep voor het Bedrijfsleven - Netherlands
9 Article 41 ECSC - Anti-dumping duties
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)