我一直在尝试通过 pip 安装 pyarrow (pip install pyarrow
,并且,正如 Yagav 所建议的:py -3.7 -m pip install --user pyarrow
)和康达(conda install -c conda-forge pyarrow
,还使用了conda install pyarrow
),从 src 构建 lib (使用 conda 环境和一些魔法,我不太明白),但总是在安装后(没有错误),当我调用时,它会以一个相同的问题结束:
import pyarrow as pa
fs = pa.hdfs.connect(host='my_host', user='my_user@my_host', kerb_ticket='path_to_kerb_ticket')
它失败并显示下一条消息:
Traceback (most recent call last):
File "", line 1, in
File "C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\hdfs.py", line 209, in connect
extra_conf=extra_conf)
File "C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\hdfs.py", line 37, in __init__
_maybe_set_hadoop_classpath()
File "C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\hdfs.py", line 135, in _maybe_set_hadoop_classpath
classpath = _hadoop_classpath_glob(hadoop_bin)
File "C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\hdfs.py", line 162, in _hadoop_classpath_glob
return subprocess.check_output(hadoop_classpath_args)
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 395, in check_output
**kwargs).stdout
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 472, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 1178, in _execute_child
startupinfo)
OSError: [WinError 193] %1 is not a valid win32 application
起初我以为 Hadoop 2.5.6 中的 libhdfs.so 有问题,但看来我错了。
我猜想,问题不在于 pyarrow 或子进程,而在于某些系统变量或依赖项。
我还手动将系统变量定义为HADOOP_HOME
, JAVA_HOME
and KRB5CCNAME