我正在尝试使用此方法在裸机上的 K8S 集群上设置 RabbitMQ Operator 和 RabbitMQ Clusterlink https://www.rabbitmq.com/kubernetes/operator/using-operator.html#create
K8S集群有1个master和1个worker节点
RabbitMQ 集群 Pod 日志
[root@re-ctrl01 容器]# kubectl 日志定义-server-0 -nrabbitmq-system
BOOT FAILED (Tailored output)
===========
ERROR: epmd error for host definition-server-0.definition-nodes.rabbitmq-system: nxdomain (non-existing domain)
11:51:13.733 [error] Supervisor rabbit_prelaunch_sup had child prelaunch started with rabbit_prelaunch:run_prelaunch_first_phase() at undefined exit with reason {epmd_error,"definition-server-0.definition-nodes.rabbitmq-system",nxdomain} in context start_error. Crash dump is being written to: erl_crash.dump...
[root@re-ctrl01 容器]# kubectl 描述 pod 定义-server-0 -nrabbitmq-system
Name: definition-server-0
Namespace: rabbitmq-system
Priority: 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 44s default-scheduler Successfully assigned rabbitmq-system/definition-server-0 to re-ctrl01.local
Normal Pulled 43s kubelet Container image "rabbitmq:3.8.16-management" already present on machine
Normal Created 43s kubelet Created container setup-container
Normal Started 43s kubelet Started container setup-container
Normal Pulled 42s kubelet Container image "rabbitmq:3.8.16-management" already present on machine
Normal Created 42s kubelet Created container rabbitmq
Normal Started 42s kubelet Started container rabbitmq
Warning Unhealthy 4s (x3 over 24s) kubelet Readiness probe failed: dial tcp 10.244.0.xxx:5672: connect: connection refused
我添加了以下条目/etc/hosts
文件的worker node
因为我不确定该条目是否必须添加到master or worker
[root@re-worker01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
127.0.0.1 re-worker01.local re-worker01 definition-server-0.definition-nodes.rabbitmq-system
我被这个问题困扰了将近2天。我用谷歌搜索并发现了类似的问题,但没有解决我的问题
我在 Pod 日志和描述输出中看到多个问题,但无法找出根本原因
- 我在哪里可以找到
erl_crash.dump
K8S 上的文件?
- 这真的是一个与主机名相关的问题吗?
- 10.244.0.xxx:5672:连接:连接被拒绝 - 此问题是由于
'epmd'
或者是其他东西 ?