该示例文本的格式正确吗?最终的实体对象缺少一个]
从最后。
entity=[HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }
应该
entity=[HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }]
我将继续这些说明,假设这是一个拼写错误并且实体字段实际上以]
。如果没有,我认为您需要修复底层日志以使其格式正确并关闭括号。
我决定解析整个内容并显示最终结果,而不是跳过整个日志并仅解析该 json 位。因此,我们需要做的第一件事就是在请求对象之后提取该组键/值对:
输入示例:thread-191555 app.main - [cid: 2cacd6f9-546d-41ew-a7ce-d5d41b39eb8f, uid: e6ffc3b0-2f39-44f7-85b6-1abf5f9ad970] Request: protocol=[HTTP/1.0] method=[POST] path=[/metrics] headers=[Timeout-Access: <function1>, Remote-Address: 192.168.0.1:37936, Host: app:5000, Connection: close, X-Real-Ip: 192.168.1.1, X-Forwarded-For: 192.168.1.1, Authorization: ***, Accept: application/json, text/plain, */*, Referer: https://google.com, Accept-Language: cs-CZ, Accept-Encoding: gzip, deflate, User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko, Cache-Control: no-cache] entity=[HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }]
Grok 解析器规则:app_log thread-%{integer:thread} %{notSpace:file} - \[%{data::keyvalue(": ")}\] Request: %{data:request:keyvalue("=","","[]")}
Result:
{
"thread": 191555,
"file": "app.main",
"cid": "2cacd6f9-546d-41ew-a7ce-d5d41b39eb8f",
"uid": "e6ffc3b0-2f39-44f7-85b6-1abf5f9ad970",
"request": {
"protocol": "HTTP/1.0",
"method": "POST",
"path": "/metrics",
"headers": "Timeout-Access: <function1>, Remote-Address: 192.168.0.1:37936, Host: app:5000, Connection: close, X-Real-Ip: 192.168.1.1, X-Forwarded-For: 192.168.1.1, Authorization: ***, Accept: application/json, text/plain, */*, Referer: https://google.com, Accept-Language: cs-CZ, Accept-Encoding: gzip, deflate, User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko, Cache-Control: no-cache",
"entity": "HttpEntity.Strict application/json {\"type\":\"text\",\"extract\": \"text\", \"field2\":\"text2\",\"duration\": 451 }"
}
}
注意我们如何使用带有引用字符串的键值解析器[]
,这使我们能够轻松地从请求对象中提取所有内容。
现在的目标是从请求对象内部的实体字段中提取详细信息。使用 Grok 解析器,您可以指定特定属性来进一步解析。
因此,在同一个管道中,我们将在第一个解析器处理器之后添加另一个 grok 解析器处理器
然后配置高级选项部分运行request.entity
,因为这就是我们所说的属性
输入示例:HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }
Grok 解析器规则:entity_rule %{notSpace:request.entity.class} %{notSpace:request.entity.media_type} %{data:request.entity.json:json}
Result:
{
"request": {
"entity": {
"class": "HttpEntity.Strict",
"media_type": "application/json",
"json": {
"duration": 451,
"extract": "text",
"type": "text",
"field2": "text2"
}
}
}
}
现在,当我们查看最终解析的日志时,它包含了我们需要的所有内容:
也正是因为它非常简单,我还为标头块添加了第三个 grok 处理器(高级设置设置为从request.headers
):
输入示例:Timeout-Access: <function1>, Remote-Address: 192.168.0.1:37936, Host: app:5000, Connection: close, X-Real-Ip: 192.168.1.1, X-Forwarded-For: 192.168.1.1, Authorization: ***, Accept: application/json, text/plain, */*, Referer: https://google.com, Accept-Language: cs-CZ, Accept-Encoding: gzip, deflate, User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko, Cache-Control: no-cache
Grok 解析器规则:headers_rule %{data:request.headers:keyvalue(": ", "/)(; :")}
Result:
{
"request": {
"headers": {
"Timeout-Access": "function1",
"Remote-Address": "192.168.0.1:37936",
"Host": "app:5000",
"Connection": "close",
"X-Real-Ip": "192.168.1.1",
"X-Forwarded-For": "192.168.1.1",
"Accept": "application/json",
"Referer": "https://google.com",
"Accept-Language": "cs-CZ",
"Accept-Encoding": "gzip",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko",
"Cache-Control": "no-cache"
}
}
}
这里唯一棘手的一点是我必须定义一个 characterWhiteList/)(; :
。主要是为了处理所有这些特殊字符User-Agent
field.
参考:
只是文档和一些猜测并检查我的个人 Datadog 帐户。
https://docs.datadoghq.com/logs/processing/parsing/?tab=matcher#key-value-or-logfmt