-
下载spark压缩包,链接:https://pan.baidu.com/s/1y5JlMdtkrZFyTJWKtuuZ_Q 提取码:z64y。
-
解压tar.gz文件
![在这里插入图片描述](https://img-blog.csdnimg.cn/20200708150755571.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MzQ4Njc4MA==,size_16,color_FFFFFF,t_70)
-
配置环境变量,系统变量Path
中添加spark bin的路径
![在这里插入图片描述](https://img-blog.csdnimg.cn/20200708150916607.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MzQ4Njc4MA==,size_16,color_FFFFFF,t_70)
-
安装hadoop,可参考,注意hadoop版本要和spark对应。
-
安装pyspark库,pip install pyspark
-
命令行输入spark-shell
,出现以下界面则spark安装成功
![在这里插入图片描述](https://img-blog.csdnimg.cn/20200708151038897.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MzQ4Njc4MA==,size_16,color_FFFFFF,t_70)
-
打开jupyter notebook,测试求圆周率代码
from __future__ import print_function
import sys
from random import random
from operator import add
from pyspark.sql import SparkSession
if __name__ == "__main__":
"""
Usage: pi [partitions]
"""
spark = SparkSession\
.builder\
.appName("PythonPi")\
.getOrCreate()
partitions = 2
n = 100000 * partitions
def f(_):
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 + y ** 2 <= 1 else 0
count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
print("Pi is roughly %f" % (4.0 * count / n))
spark.stop()
运行代码,出现以下结果则表明安装pyspark成功
![在这里插入图片描述](https://img-blog.csdnimg.cn/20200708151138980.png)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)