apache druid学习之Processes and servers

2023-10-31

Process types

Druid has several process types:

Server types

Druid processes can be deployed any way you like, but for ease of deployment we suggest organizing them into three server types:

Master
Query
Data

Coordinator processes manage data availability on the cluster. --数据的调度
Overlord processes control the assignment of data ingestion workloads. --控制数据的摄入和分配
Broker processes handle queries from external clients. --处理客户端的请求
Router processes are optional; they route requests to Brokers, Coordinators, and Overlords.--路由器，查询的适合选择那几个节点去处理
Historical processes store queryable data. 处理存储历史查询的数据(负责存和查) 缓存？
MiddleManager processes ingest data.--处理摄入的数据(实时数据和index)
Master: Runs Coordinator and Overlord processes, manages data availability and ingestion. --负责数据的可用的摄取
Query: Runs Broker and optional Router processes, handles queries from external clients. --处理外部请求 query不存储数据
Data: Runs Historical and MiddleManager processes, executes ingestion workloads and stores all queryable data --数据真正存储的地方

Master(包含Coordinator and Overlord )

A Master server manages data ingestion and availability: it is responsible for starting new ingestion jobs and coordinating availability of data on the "Data servers" described below.

Within a Master server, functionality is split between two processes, the Coordinator and Overlord.

负责数据的可用和摄入，负责启动数据摄入任务，协调数据的可用。

Coordinator process (协调进程)

Coordinator processes watch over the Historical processes on the Data servers. They are responsible for assigning segments to specific servers, and for ensuring segments are well-balanced across Historicals.

监听Historical进程，负责安排segment分配到哪一台服务器，使得segmemt在多台historical上负载均衡

Overlord process(霸王进程？)

Overlord processes watch over the MiddleManager processes on the Data servers and are the controllers of data ingestion into Druid. They are responsible for assigning ingestion tasks to MiddleManagers and for coordinating segment publishing.

监听MiddleManager进程，是数据摄入到druid的控制器，负责安排数据摄入工作到各个MiddleManagers 同时协调segment的发布。

Data server

A Data server executes ingestion jobs and stores queryable data.

Within a Data server, functionality is split between two processes, the Historical and MiddleManager.

DATA 主要是执行数据摄入工作并且存储可查询的数据，一般来说一个DATA就包含 Historical and MiddleManager

Historical process

Historical processes are the workhorses that handle storage and querying on "historical" data (including any streaming data that has been in the system long enough to be committed). Historical processes download segments from deep storage and respond to queries about these segments. They don't accept writes.

历史进程处理存储和查询“历史”数据(包括在系统中存在足够长时间将被提交的任何流数据)，历史进程从深度存储下载数据段，并响应有关这些数据段的查询。他们不接受写请求。

Middle Manager process

MiddleManager processes handle ingestion of new data into the cluster. They are responsible for reading from external data sources and publishing new Druid segments.

中间管理进程处理新数据的摄入，主要负责从其他数据源(kafka)读数据然后形成segment，主要负责写请求

Peon processes

Peon processes are task execution engines spawned by MiddleManagers. Each Peon runs a separate JVM and is responsible for executing a single task. Peons always run on the same host as the MiddleManager that spawned them.

牡丹进程。。是由MiddleManager生成的任务执行引擎。每个牡丹运行一个单独的JVM，并负责执行单个任务。牡丹始终与产生它们的MiddleManager在同一主机上运行。

Indexer process (optional)

Indexer processes are an alternative to MiddleManagers and Peons. Instead of forking separate JVM processes per-task, the Indexer runs tasks as individual threads within a single JVM process.

索引进程是中间管理器和牡丹的替代方案。索引进程不是将每个任务切分为单独的JVM进程，而是将任务作为单个JVM进程中的单个线程运行。

The Indexer is designed to be easier to configure and deploy compared to the MiddleManager + Peon system and to better enable resource sharing across tasks. The Indexer is a newer feature and is currently designated experimental due to the fact that its memory management system is still under development. It will continue to mature in future versions of Druid.

与牡丹+中间管理器相比索引进程更易于配置和部署，并更好地实现任务间的资源共享。索引器是一个较新的功能。

Typically, you would deploy either MiddleManagers or Indexers, but not both.

二选其一！二选其一！二选其一！二选其一！

Pros and cons of colocation

Druid processes can be colocated based on the Master/Data/Query server organization as described above. This organization generally results in better utilization of hardware resources for most clusters.

For very large scale clusters, however, it can be desirable to split the Druid processes such that they run on individual servers to avoid resource contention.

This section describes guidelines and configuration parameters related to process colocation.

Coordinators and Overlords

The workload on the Coordinator process tends to increase with the number of segments in the cluster. The Overlord's workload also increases based on the number of segments in the cluster, but to a lesser degree than the Coordinator.

In clusters with very high segment counts, it can make sense to separate the Coordinator and Overlord processes to provide more resources for the Coordinator's segment balancing workload.

Unified Process

The Coordinator and Overlord processes can be run as a single combined process by setting the druid.coordinator.asOverlord.enabled property.

Please see Coordinator Configuration: Operation for details.

Coordinator's and Overlords分开部署

Historicals and MiddleManagers

With higher levels of ingestion or query load, it can make sense to deploy the Historical and MiddleManager processes on separate hosts to to avoid CPU and memory contention.

The Historical also benefits from having free memory for memory mapped segments, which can be another reason to deploy the Historical and MiddleManager processes separately.

Historicals and MiddleManagers分开部署分开部署

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)