可以将两个变量添加到边缘。最简单的解决方案是使用元组,例如:
val data = Array(Edge(3L, 7L, (123, 456)), Edge(5L, 3L, (41, 34)))
val edges: RDD[Edge[(Int, Int)]] = spark.sparkContext.parallelize(data)
或者,您可以使用案例类:
case class EdgeWeight(flow_count: Int, sum_bytes: Int)
val data2 = Array(Edge(3L, 7L, EdgeWeight(123, 456)), Edge(5L, 3L, EdgeWeight(41, 34)))
val edges: RDD[Edge[EdgeWeight]] = spark.sparkContext.parallelize(data2)
如果要添加的属性较多,使用案例类会更方便使用和维护。
我相信在这种具体情况下,最优雅的解决方法是:
val trafficEdges = trafficsFromTo.map{x =>
Edge(MurmurHash3.stringHash(x(0).toString,
MurmurHash3.stringHash(x(1).toString,
EdgeWeight(x(2), x(3))
}
trafficEdges.sortBy(edge => edge.attr.flow_count) // sort by flow_count