我有以下 JSON 对象:
{
"user_id": "123",
"data": {
"city": "New York"
},
"timestamp": "1563188698.31",
"session_id": "6a793439-6535-4162-b333-647a6761636b"
}
{
"user_id": "123",
"data": {
"name": "some_name",
"age": "23",
"occupation": "teacher"
},
"timestamp": "1563188698.31",
"session_id": "6a793439-6535-4162-b333-647a6761636b"
}
我在用着val df = sqlContext.read.json("json")
将文件读取到数据框
它将所有数据属性组合成数据结构,如下所示:
root
|-- data: struct (nullable = true)
| |-- age: string (nullable = true)
| |-- city: string (nullable = true)
| |-- name: string (nullable = true)
| |-- occupation: string (nullable = true)
|-- session_id: string (nullable = true)
|-- timestamp: string (nullable = true)
|-- user_id: string (nullable = true)
是否可以将数据字段转换为 MAP[String, String] 数据类型?那么它只具有与原始 json 相同的属性?
是的,您可以通过从 JSON 数据导出 Map[String, String] 来实现这一点,如下所示:
import org.apache.spark.sql.types.{MapType, StringType}
import org.apache.spark.sql.functions.{to_json, from_json}
val jsonStr = """{
"user_id": "123",
"data": {
"name": "some_name",
"age": "23",
"occupation": "teacher"
},
"timestamp": "1563188698.31",
"session_id": "6a793439-6535-4162-b333-647a6761636b"
}"""
val df = spark.read.json(Seq(jsonStr).toDS)
val mappingSchema = MapType(StringType, StringType)
df.select(from_json(to_json($"data"), mappingSchema).as("map_data"))
//Output
// +-----------------------------------------------------+
// |map_data |
// +-----------------------------------------------------+
// |[age -> 23, name -> some_name, occupation -> teacher]|
// +-----------------------------------------------------+
首先我们提取出内容data
字段转换为字符串to_json($"data")
,然后我们解析并提取 Mapfrom_json(to_json($"data"), schema)
.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)