我有一个 Spark SQL,过去执行时间不到 10 分钟,现在在集群迁移后运行 3 小时,需要深入了解它实际执行的操作。我是 Spark 新手,如果我问一些不相关的问题,请不要介意。
增加spark.executor.memory
但没有运气。
环境:Azure 存储上的 Azure HDInsight Spark 2.4
SQL:读取并连接一些数据,最后将结果写入 Hive 元存储。
The spark.sql
脚本以以下代码结尾:.write.mode("overwrite").saveAsTable("default.mikemiketable")
Application Behavior:
Within the first 15 mins, it loads and complete most tasks (199/200); left only 1 executor process alive and continually to shuffle read / write data. Because now it only leave 1 executor, we need to wait 3 hours until this application finish.
![enter image description here](https://i.stack.imgur.com/6hqvh.png)
Left only 1 executor alive
![enter image description here](https://i.stack.imgur.com/55162.png)
Not sure what's the executor doing:
![enter image description here](https://i.stack.imgur.com/TwhuX.png)
From time to time, we can tell the shuffle read increased:
![enter image description here](https://i.stack.imgur.com/WhF9A.png)
Therefore I increased the spark.executor.memory to 20g, but nothing changed. From Ambari and YARN I can tell the cluster has many resources left.
![enter image description here](https://i.stack.imgur.com/pngQA.png)
Release of almost all executor
![enter image description here](https://i.stack.imgur.com/pA134.png)
非常感谢任何指导。