在尝试使用 Hyperledger Fabric 实现 IBM 团队在其文章中报告的性能期间Hyperledger Fabric:用于许可区块链的分布式操作系统 https://arxiv.org/abs/1801.10228,我遇到了一些问题和错误。我收集了所有有用的信息,并希望与 HF 社区分享。另外,我对 Fabric 开发人员有几个关于其性能的问题。
目标描述
使用 Cello 在四个 c5.9xlarge (36vCPU) aws 实例上部署的 Hyperledger Fabric v1.1.0 网络:
{
fabric001: {
cas: [],
peers: ["[email protected] /cdn-cgi/l/email-protection"],
orderers: ["orderer1st.orderer"],
zookeepers: ["zookeeper1st"],
kafkas: ["kafka1st"]
},
fabric002: {
cas: [],
peers: ["[email protected] /cdn-cgi/l/email-protection"],
orderers: ["orderer2nd.orderer"],
zookeepers: ["zookeeper2nd"],
kafkas: ["kafka2nd"]
},
fabric003: {
cas: [],
peers: ["[email protected] /cdn-cgi/l/email-protection"],
orderers: ["orderer3rd.orderer"],
zookeepers: ["zookeeper3rd"],
kafkas: ["kafka3rd"]
},
fabric004: {
cas: ["ca1st.main"],
peers: [],
orderers: ["orderer4th.orderer"],
zookeepers: ["zookeeper4th"],
kafkas: ["kafka4th"]
}
}
TLS 已禁用。
Fabric 通道配置(所有其他参数均为默认值):
BatchTimeout: 1s
BatchSize:
MaxMessageCount: 500
AbsoluteMaxBytes: 200 MB
PreferredMaxBytes: 50 MB
我对 CouchDB 和 LevelDB 作为状态数据库进行了测试。我使用官方 Fabcar 链码(Golang 实现)进行测试。我创建了简单的 Nodejs 应用程序,它使用 SDK 与 Fabric 网络交互,并公开 HTTP API 以进行负载测试。该应用程序是无状态的,可以轻松扩展。
对于负载测试,我使用工具 YandexTank。我已经执行了两种高负载测试:查询(当区块链为空时通过peer001向Fabric状态发出请求)和插入(区块链内的交易)。
Results
CouchDB 作为状态数据库
Query results:
https://overload.yandex.net/101153 https://overload.yandex.net/101153.
At ~1100 rps latency starts to increase. But Fabric instance is not loaded and have a lot of free resources. On the figure below you can see CPU and Memory usage by the Fabric network containers on the instance fabric001 during the test. 100% CPU usage means one full vCPU load.
Also peer001 prints a lot of similar error logs (not full output, just tiny part, I can share it with you if needed): https://gist.github.com/krabradosty/9780cacc92fcdeaa0c36377a91727ade https://gist.github.com/krabradosty/9780cacc92fcdeaa0c36377a91727ade
Insert results: https://overload.yandex.net/101217 https://overload.yandex.net/101217. At ~600 rps latency degradation is very fast. Before is slowly, but anyway, exist. CPU and Memory usage of the fabric003 containers on the figure below:
A lot of error logs from the peer (again, not full output): https://gist.github.com/krabradosty/3810151b8e101d8279cc705aef22863e https://gist.github.com/krabradosty/3810151b8e101d8279cc705aef22863e
基于此我可以得出结论,Fabric Peer在负载下的CouchDB连接存在问题。
我的问题:Fabric 社区知道这个错误吗?你有计划如何解决吗?
LevelDB 作为状态数据库
-
Query results: https://overload.yandex.net/102035 https://overload.yandex.net/102035. CPU and Memory usage of the fabric001 containers on the figure below:
There are no any errors from the blockchain, I just see latency degradation.
-
Insert results: https://overload.yandex.net/102040 https://overload.yandex.net/102040. CPU and Memory usage of the fabric001 containers on the figure below:
Aggressive latency degradation starts at ~850 rps. No errors from the blockchain.
我的问题:延迟降低的原因是什么?为什么我无法实现 IBM 在其文章中报告的 3500 rps 性能? Fabric社区在性能提升方面有什么计划?