The 按您的类型搜索 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html字段类型是类似文本的字段
进行了优化,为按您键入的查询提供支持
完成用例
添加包含索引数据、映射、搜索查询和搜索结果的工作示例
索引映射:
{
"mappings": {
"properties": {
"title": {
"type": "search_as_you_type"
}
}
}
}
指数数据:
{"title": "how shingles are actually used"}
分析API
elasticsearch中默认的分词器是“标准分词器”,它使用基于语法的分词技术
为文本生成的各个标记是
{
"tokens": [
{
"token": "how",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "shingles",
"start_offset": 4,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "are",
"start_offset": 13,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "actually",
"start_offset": 17,
"end_offset": 25,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "used",
"start_offset": 26,
"end_offset": 30,
"type": "<ALPHANUM>",
"position": 4
}
]
}
生产 3 个字的木瓦
POST/_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "shingle",
"min_shingle_size": 3,
"max_shingle_size": 3,
"output_unigrams":false
}
],
"text": "how shingles are actually used"
}
生成的代币为:
{
"tokens": [
{
"token": "how shingles are",
"start_offset": 0,
"end_offset": 16,
"type": "shingle",
"position": 0
},
{
"token": "shingles are actually",
"start_offset": 4,
"end_offset": 25,
"type": "shingle",
"position": 1
},
{
"token": "are actually used",
"start_offset": 13,
"end_offset": 30,
"type": "shingle",
"position": 2
}
]
}
搜索查询:
title._3gram - 用 shingle 令牌包装 my_field 的分析器
木瓦尺寸 3 的过滤器
{
"query": {
"multi_match": {
"query": "shingles are actually",
"type": "bool_prefix",
"fields": [
"title._3gram"
]
}
}
}
搜索结果:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"title": "how shingles are actually used"
}
}
]
在你的情况下,考虑到"text": "[email protected] /cdn-cgi/l/email-protection"
,生成的各个令牌是:samantha
and example.com
创建 2 个单词的木瓦时,生成的标记为:
{
"tokens": [
{
"token": "samantha example.com",
"start_offset": 0,
"end_offset": 20,
"type": "shingle",
"position": 0
}
]
}
所以当你搜索时sa
它不会匹配,因为没有生成与其对应的令牌。
当使用带有布尔前缀查询的多重匹配时(在email_address
字段,它匹配是因为" type": "bool prefix"
。阅读本文以了解更多信息匹配布尔前缀查询 https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-match-bool-prefix-query.html#query-dsl-match-bool-prefix-query.
如果你想查询sa
,并获得所有结果,然后您可以使用完成建议者 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html你甚至可以通过UAX URL 电子邮件标记器 https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-uaxurlemail-tokenizer.html