如何解决主机 nxdomain(不存在的域)的 ERROR: epmd 错误?

2024-01-11

我正在尝试使用此方法在裸机上的 K8S 集群上设置 RabbitMQ Operator 和 RabbitMQ Clusterlink https://www.rabbitmq.com/kubernetes/operator/using-operator.html#create

K8S集群有1个master和1个worker节点

RabbitMQ 集群 Pod 日志

[root@re-ctrl01 容器]# kubectl 日志定义-server-0 -nrabbitmq-system

BOOT FAILED  (Tailored output)
===========
ERROR: epmd error for host definition-server-0.definition-nodes.rabbitmq-system: nxdomain (non-existing domain)

11:51:13.733 [error] Supervisor rabbit_prelaunch_sup had child prelaunch started with rabbit_prelaunch:run_prelaunch_first_phase() at undefined exit with reason {epmd_error,"definition-server-0.definition-nodes.rabbitmq-system",nxdomain} in context start_error.  Crash dump is being written to: erl_crash.dump...

[root@re-ctrl01 容器]# kubectl 描述 pod 定义-server-0 -nrabbitmq-system

Name:         definition-server-0
Namespace:    rabbitmq-system
Priority:     0
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  44s               default-scheduler  Successfully assigned rabbitmq-system/definition-server-0 to re-ctrl01.local
  Normal   Pulled     43s               kubelet            Container image "rabbitmq:3.8.16-management" already present on machine
  Normal   Created    43s               kubelet            Created container setup-container
  Normal   Started    43s               kubelet            Started container setup-container
  Normal   Pulled     42s               kubelet            Container image "rabbitmq:3.8.16-management" already present on machine
  Normal   Created    42s               kubelet            Created container rabbitmq
  Normal   Started    42s               kubelet            Started container rabbitmq
  Warning  Unhealthy  4s (x3 over 24s)  kubelet            Readiness probe failed: dial tcp 10.244.0.xxx:5672: connect: connection refused

我添加了以下条目/etc/hosts文件的worker node因为我不确定该条目是否必须添加到master or worker

[root@re-worker01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
127.0.0.1   re-worker01.local re-worker01 definition-server-0.definition-nodes.rabbitmq-system

我被这个问题困扰了将近2天。我用谷歌搜索并发现了类似的问题,但没有解决我的问题

我在 Pod 日志和描述输出中看到多个问题,但无法找出根本原因

  1. 我在哪里可以找到erl_crash.dumpK8S 上的文件?
  2. 这真的是一个与主机名相关的问题吗?
  3. 10.244.0.xxx:5672:连接:连接被拒绝 - 此问题是由于'epmd'或者是其他东西 ?

我花了很多时间后设法解决了这个问题

我添加了主机definition-server-0.definition-nodes.rabbitmq-system to /etc/hosts使用 RabbitMQ Cluster pod 的文件主机别名

要添加的 YAML主机别名下面给出

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: definition
  namespace: rabbitmq-system
spec:
  replicas: 1
  override:
    statefulSet:
      spec:
        template:
          spec:
            containers: []
            hostAliases:
            - ip: "127.0.0.1"
              hostnames:
              - "definition-server-0"
              - "definition-server-0.definition-nodes.rabbitmq-system"
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何解决主机 nxdomain(不存在的域)的 ERROR: epmd 错误? 的相关文章

随机推荐