略有改善Prem's https://stackoverflow.com/questions/45532183/pyspark-create-dataframe-grouping-columns-in-map-type-structure/45535762#45535762回答(抱歉我还不能发表评论)
Use func.create_map
代替func.struct
. See 文档 https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=map#pyspark.sql.functions.create_map
import pyspark.sql.functions as func
df = sc.parallelize([('B','a',10),('B','b',20),
('C','c',30)]).toDF(['Brand','Type','Amount'])
df_converted = df.groupBy("Brand").\
agg(func.collect_list(func.create_map(func.col("Type"),
func.col("Amount"))).alias("MAP_type_AMOUNT"))
print df_converted.collect()
Output:
[Row(Brand=u'B', MAP_type_AMOUNT=[{u'a': 10}, {u'b': 20}]),
Row(Brand=u'C', MAP_type_AMOUNT=[{u'c': 30}])]