我正在尝试垂直连接两个 Dask DataFrame
我有以下 Dask DataFrame:
d = [
['A','B','C','D','E','F'],
[1, 4, 8, 1, 3, 5],
[6, 6, 2, 2, 0, 0],
[9, 4, 5, 0, 6, 35],
[0, 1, 7, 10, 9, 4],
[0, 7, 2, 6, 1, 2]
]
df = pd.DataFrame(d[1:], columns=d[0])
ddf = dd.from_pandas(df, npartitions=5)
这是 Pandas DataFrame 形式的数据
A B C D E F
0 1 4 8 1 3 5
1 6 6 2 2 0 0
2 9 4 5 0 6 35
3 0 1 7 10 9 4
4 0 7 2 6 1 2
这是 Dask 数据框
Dask DataFrame Structure:
A B C D E F
npartitions=4
0 int64 int64 int64 int64 int64 int64
1 ... ... ... ... ... ...
2 ... ... ... ... ... ...
3 ... ... ... ... ... ...
4 ... ... ... ... ... ...
Dask Name: from_pandas, 4 tasks
我正在尝试垂直连接 2 个 Dask DataFrame:
ddf_i = ddf + 11.5
dd.concat([ddf,ddf_i],axis=0)
但我收到此错误:
Traceback (most recent call last):
...
File "...", line 572, in concat
raise ValueError('All inputs have known divisions which cannot '
ValueError: All inputs have known divisions which cannot be concatenated
in order. Specify interleave_partitions=True to ignore order
但是,如果我尝试:
dd.concat([ddf,ddf_i],axis=0,interleave_partitions=True)
那么它似乎正在发挥作用。设置这个有问题吗True
(就性能而言 - 速度)?或者还有另一种方法来垂直连接 2 个 Dask DataFrames 吗?