ES 索引文档，按_id查找、更新、删除文档

2023-05-16

一、索引（新建）文档

通过使用 index API ，文档可以被索引 —— 存储和使文档可被搜索。但是首先，我们要确定文档的位置。正如我们刚刚讨论的，一个文档的 _index 、 ~~_type~~ 和 _id 唯一标识一个文档。

注：
ES7.x版本Type已经移除， _type字段那里变为固定值 _doc！！！

我们可以提供自定义的 _id 值，或者让 index API 自动生成。

1.1、自定义_id

#语法：
PUT /{index}/{type}/{id}
{
  "field": "value",
  ...
}

把一些文档放到consumer索引中。我们将在consumer索引中索引一个简单的客户文档，其ID为1，如下所示：

PUT /customer/_doc/1?pretty
{
  "name": "John Doe"
}

响应：

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

如上，我们可以看到在customer索引中成功地创建了一个新的customer文档。文档还有一个内部ID 1，我们在索引时指定了它。

1.2、自动创建主键

如果你的数据没有自然的 ID， Elasticsearch 可以帮我们自动生成 ID 。请求的结构调整为：不再使用 PUT ，而是使用 POST ：

POST /customer/_doc/?pretty
{
  "name": "make Doe"
}

响应：

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "YWSApXEBsOGGEpTGJEmD",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

1.3、创建新文档

当我们索引一个文档，怎么确认我们正在创建一个完全新的文档，而不是覆盖现有的呢？

请记住， _index 、 _type 和 _id 的组合可以唯一标识一个文档。所以，确保创建一个新文档的最简单办法是，使用索引请求的 POST 形式让 Elasticsearch 自动生成唯一 _id:

POST /website/blog/
{ ... }

然而，如果已经有自己的 _id ，那么我们必须告诉 Elasticsearch ，只有在相同的 _index 、 _type 和 _id 不存在时才接受我们的索引请求。这里有两种方式，他们做的实际是相同的事情。使用哪种，取决于哪种使用起来更方便。

第一种方法使用 op_type 查询-字符串参数：

PUT /website/blog/123?op_type=create
{ ... }

第二种方法是在 URL 末端使用 /_create :

PUT /website/blog/123/_create
{ ... }

另一方面，如果具有相同的 _index 、 _type 和 _id 的文档已经存在，
Elasticsearch 将会返回 409 Conflict 响应码，以及如下的错误信息：
{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[1]: version conflict, document already exists (current version [3])",
        "index_uuid": "UmtZgqkeTBKFQun4Xs3R1A",
        "shard": "0",
        "index": "customer"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[1]: version conflict, document already exists (current version [3])",
    "index_uuid": "UmtZgqkeTBKFQun4Xs3R1A",
    "shard": "0",
    "index": "customer"
  },
  "status": 409
}

1.4、返回参数含义

id
Doc的主键，在写入的时候，可以指定该Doc的ID值，如果不指定，则系统自动生成一个唯一的UUID值。
通过_id值（ES内部转换成_uid）可以唯一在Elasticsearch中确定一个Doc。
Elasticsearch中，_id只是一个用户级别的虚拟字段，在Elasticsearch中并不会映射到Lucene中，所以也就不会存储该字段的值。
_id的值可以由_uid解析而来（_uid =type + ‘#’ + id），Elasticsearch中会存储_uid。
uid
_uid的格式是：type + ‘#’ + id。
_uid会存储在Lucene中，在Lucene中的映射关系如下：dex下可能存在多个id值相同的Doc，而6.0.0之后只支持单Type，同Index下id值是唯一的。
_version
Elasticsearch中每个Doc都会有一个Version，该Version可以由用户指定，也可以由系统自动生成。如果是系统自动生成，那么每次Version都是递增1。
Elasticsearch 中每个文档都有一个版本号。当每次对文档进行修改时（包括删除）， _version 的值会递增。在处理冲突中，我们讨论了怎样使用 _version 号码确保你的应用程序中的一部分修改不会覆盖另一部分所做的修改。
_seq_no
严格递增的顺序号，每个文档一个，Shard级别严格递增，保证后写入的Doc的_seq_no大于先写入的Doc的_seq_no。
任何类型的写操作，包括index、create、update和Delete，都会生成一个_seq_no。
_primary_term
_primary_term也和_seq_no一样是一个整数，每当Primary Shard发生重新分配时，比如重启，Primary选举等，_primary_term会递增1。
_primary_term主要是用来恢复数据时处理当多个文档的_seq_no一样时的冲突，避免Primary Shard上的写入被覆盖。

二、按id查询文档

2.1、取回一个文档

GET /customer/_doc/1?pretty

响应：

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "John Doe"
  }
}

除了一个字段之外，这里没有发现任何异常的地方，说明我们找到了一个具有请求的ID 1的文档和另一个字段_source，它返回了我们从上一步索引的完整JSON文档。

如果只想检查一个文档是否存在–根本不想关心内容—那么用 HEAD 方法来代替 GET 方法。 HEAD 请求没有返回体，只返回一个 HTTP 请求报头：

curl -i -XHEAD http://localhost:9200/website/blog/123

如果文档存在， Elasticsearch 将返回一个 200 ok 的状态码：
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

若文档不存在， Elasticsearch 将返回一个 404 Not Found 的状态码：
HTTP/1.1 404 Not Found
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

2.2、取回多个文档

Elasticsearch 的速度已经很快了，但甚至能更快。将多个请求合并成一个，避免单独处理每个请求花费的网络延时和开销。如果你需要从 Elasticsearch 检索很多文档，那么使用 multi-get 或者 mget API 来将这些检索请求放在一个请求中，将比逐个文档请求更快地检索到全部文档。

GET /_mget
{
   "docs" : [
      {
         "_index" : "customer",
         "_type" :  "_doc",
         "_id" :    2
      },
      {
         "_index" : "customer",
         "_type" :  "_doc",
         "_id" :    1
      }
   ]
}

响应：

#! Deprecation: [types removal] Specifying types in multi get requests is deprecated.
{
  "docs" : [
    {
      "_index" : "customer",
      "_type" : "_doc",
      "_id" : "2",
      "_version" : 1,
      "_seq_no" : 6,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "John Doe"
      }
    },
    {
      "_index" : "customer",
      "_type" : "_doc",
      "_id" : "1",
      "found" : false
    }
  ]
}
注意：
提示 弃用：[types removal]不建议在多个get请求中指定types。

如果想检索的数据都在相同的 _index 中（甚至相同的 _type 中），则可以在 URL 中指定默认的 /_index 或者默认的 /_index/_type 。

你仍然可以通过单独请求覆盖这些值：

GET /website/blog/_mget
{
   "docs" : [
      { "_id" : 2 },
      { "_type" : "pageviews", "_id" :   1 }
   ]
}

事实上，如果所有文档的 _index 和 _type 都是相同的，你可以只传一个 ids 数组，而不是整个 docs 数组：

GET /website/blog/_mget
{
   "ids" : [ "2", "1" ]
}

三、更新文档

实际上 Elasticsearch 并不在原有的基础上进行更新。每当我们进行更新时，Elasticsearch 执行步骤如下：

从旧文档构建 JSON
更改该 JSON
删除旧文档
索引一个新文档

3.1、更新全部文档

在 Elasticsearch 中 文档是不可改变 的，不能修改它们。相反，如果想要更新现有的文档，需要重建索引或者进行替换，我们可以使用相同的 index API 进行实现，在索引文档中已经进行了讨论。

PUT /website/_doc/123
{
  "title": "My first blog entry",
  "text":  "I am starting to get the hang of this...",
  "date":  "2014/01/02"
}

在响应体中，我们能看到 Elasticsearch 已经增加了 _version 字段值：

{
  "_index" :   "website",
  "_type" :    "_doc",
  "_id" :      "123",
  "_version" : 2,
  "created":   false 
}

3.2、更新部分文档

update 请求最简单的一种形式是接收文档的一部分作为 doc 的参数，它只是与现有的文档进行合并。对象被合并到一起，覆盖现有的字段，增加新的字段。

#此示例显示如何通过将名称字段更改为“Jane xiang”来更新以前的文档（ID为1）：
POST /customer/_update/1?pretty
{
  "doc": { "name": "Jane xiang" }
}

响应：
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 4,
  "_primary_term" : 1
}

#此示例演示如何通过将名称字段更改为“Jane Doe”来更新以前的文档（ID为1），同时向其添加年龄字段：
POST /customer/_update/1?pretty
{
  "doc": { "name": "Jane Doe", "age": 20 }
}

响应：
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 5,
  "_primary_term" : 1
}

3.3、使用脚本部分更新文档

脚本可以在 update API中用来改变 _source 的字段内容，它在更新脚本中称为 ctx._source 。例如，我们可以使用脚本来增加博客文章中 views 的数量：

POST /website/1/_update
{
   "script" : "ctx._source.views+=1"
}

我们也可以通过使用脚本给 tags 数组添加一个新的标签。在这个例子中，我们指定新的标签作为参数，而不是硬编码到脚本内部。这使得 Elasticsearch 可以重用这个脚本，而不是每次我们想添加标签时都要对新脚本重新编译：

POST /website/1/_update
{
   "script" : "ctx._source.tags+=new_tag",
   "params" : {
      "new_tag" : "search"
   }
}

获取文档并显示最后两次请求的效果：

{
   "_index":    "website",
   "_type":     "_doc",
   "_id":       "1",
   "_version":  5,
   "found":     true,
   "_source": {
      "title":  "My first blog entry",
      "text":   "Starting to get the hang of this...",
      "tags":  ["testing", "search"], 
      "views":  1 
   }
}

我们甚至可以选择通过设置 ctx.op 为 delete 来删除基于其内容的文档：

POST /website/1/_update
{
   "script" : "ctx.op = ctx._source.views == count ? 'delete' : 'none'",
    "params" : {
        "count": 1
    }
}

四、删除文档

删除文档的语法和我们所知道的规则相同，只是使用 DELETE 方法：

DELETE /customer/_doc/1

响应：
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 4,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 7,
  "_primary_term" : 1
}

如果文档没有找到，我们将得到 404 Not Found 的响应码和类似这样的响应体：

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 5,
  "result" : "not_found",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 8,
  "_primary_term" : 1
}

即使文档不存在（ Found 是 false ）， _version 值仍然会增加。这是 Elasticsearch 内部记录本的一部分，用来确保这些改变在跨多节点时以正确的顺序执行。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)