Hyperledger Fabric 的性能测试

2024-02-18

在尝试使用 Hyperledger Fabric 实现 IBM 团队在其文章中报告的性能期间Hyperledger Fabric：用于许可区块链的分布式操作系统 https://arxiv.org/abs/1801.10228，我遇到了一些问题和错误。我收集了所有有用的信息，并希望与 HF 社区分享。另外，我对 Fabric 开发人员有几个关于其性能的问题。

目标描述

使用 Cello 在四个 c5.9xlarge (36vCPU) aws 实例上部署的 Hyperledger Fabric v1.1.0 网络：

{
    fabric001: {
      cas: [],
      peers: ["[email protected] /cdn-cgi/l/email-protection"],
      orderers: ["orderer1st.orderer"],
      zookeepers: ["zookeeper1st"],
      kafkas: ["kafka1st"]
    },
    fabric002: {
      cas: [],
      peers: ["[email protected] /cdn-cgi/l/email-protection"],
      orderers: ["orderer2nd.orderer"],
      zookeepers: ["zookeeper2nd"],
      kafkas: ["kafka2nd"]
    },
    fabric003: {
      cas: [],
      peers: ["[email protected] /cdn-cgi/l/email-protection"],
      orderers: ["orderer3rd.orderer"],
      zookeepers: ["zookeeper3rd"],
      kafkas: ["kafka3rd"]
    },
    fabric004: {
      cas: ["ca1st.main"],
      peers: [],
      orderers: ["orderer4th.orderer"],
      zookeepers: ["zookeeper4th"],
      kafkas: ["kafka4th"]
    }
}

TLS 已禁用。

Fabric 通道配置（所有其他参数均为默认值）：

BatchTimeout: 1s
BatchSize:
    MaxMessageCount: 500
    AbsoluteMaxBytes: 200 MB
    PreferredMaxBytes: 50 MB

我对 CouchDB 和 LevelDB 作为状态数据库进行了测试。我使用官方 Fabcar 链码（Golang 实现）进行测试。我创建了简单的 Nodejs 应用程序，它使用 SDK 与 Fabric 网络交互，并公开 HTTP API 以进行负载测试。该应用程序是无状态的，可以轻松扩展。对于负载测试，我使用工具 YandexTank。我已经执行了两种高负载测试：查询（当区块链为空时通过peer001向Fabric状态发出请求）和插入（区块链内的交易）。

Results

CouchDB 作为状态数据库

Query results: https://overload.yandex.net/101153 https://overload.yandex.net/101153. At ~1100 rps latency starts to increase. But Fabric instance is not loaded and have a lot of free resources. On the figure below you can see CPU and Memory usage by the Fabric network containers on the instance fabric001 during the test. 100% CPU usage means one full vCPU load. Also peer001 prints a lot of similar error logs (not full output, just tiny part, I can share it with you if needed): https://gist.github.com/krabradosty/9780cacc92fcdeaa0c36377a91727ade https://gist.github.com/krabradosty/9780cacc92fcdeaa0c36377a91727ade
Insert results: https://overload.yandex.net/101217 https://overload.yandex.net/101217. At ~600 rps latency degradation is very fast. Before is slowly, but anyway, exist. CPU and Memory usage of the fabric003 containers on the figure below: A lot of error logs from the peer (again, not full output): https://gist.github.com/krabradosty/3810151b8e101d8279cc705aef22863e https://gist.github.com/krabradosty/3810151b8e101d8279cc705aef22863e

基于此我可以得出结论，Fabric Peer在负载下的CouchDB连接存在问题。

我的问题：Fabric 社区知道这个错误吗？你有计划如何解决吗？

LevelDB 作为状态数据库

Query results: https://overload.yandex.net/102035 https://overload.yandex.net/102035. CPU and Memory usage of the fabric001 containers on the figure below: There are no any errors from the blockchain, I just see latency degradation.
Insert results: https://overload.yandex.net/102040 https://overload.yandex.net/102040. CPU and Memory usage of the fabric001 containers on the figure below: Aggressive latency degradation starts at ~850 rps. No errors from the blockchain.

我的问题：延迟降低的原因是什么？为什么我无法实现 IBM 在其文章中报告的 3500 rps 性能？ Fabric社区在性能提升方面有什么计划？

Fabric 是一个排队系统。在高负载的情况下，等待时间呈指数增长（排队属性），从而导致事务延迟。然而，对于 golevelDB，我们应该以低延迟获得至少 2000 tps。

从 CPU 利用率图中可以看出，36 个 vCPU 中只有 16 个 vCPU 得到充分利用。 core.yaml 中为每个对等点的 validatorPoolSize 设置什么值？您可以将此值设置为等于或小于块大小，并检查吞吐量是否增加。

性能会有所不同，具体取决于

工作负载（fabcar 与 fabcoin），
磁盘（HDD 与 SSD、本地连接与网络连接）、
负载生成器（CLI 与 SDK），
负载生成方法（开放系统与封闭系统 https://www.usenix.org/legacy/event/nsdi06/tech/full_papers/schroeder/schroeder.pdf与某些分布）和
网络带宽（2700 tps 至少 1.6 Gbps）。

另外，请确保负载生成器不会成为瓶颈。最好将延迟进一步划分为（背书延迟、排序延迟、提交延迟）并收集其他资源利用率（例如网络和磁盘），以便可以轻松识别瓶颈。

您可以参考我们的技术论文，标题为性能基准测试和优化 Hyperledger Fabric https://drive.google.com/file/d/1OsIoPtlv5X2PWyOAlDn1FCnHCZPyrF57/view。我们进行了全面的实证研究。使用 levelDB，我们应该获得至少 2000 tps 的低延迟。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)