成对半正矢距离
这是一种矢量化的方式broadcasting
基于this post -
def convert_to_arrays(df1, df2):
d1 = np.array(df1['coordinates'].tolist())
d2 = np.array(df2['coordinates'].tolist())
return d1,d2
def broadcasting_based_lng_lat(data1, data2):
# data1, data2 are the data arrays with 2 cols and they hold
# lat., lng. values in those cols respectively
data1 = np.deg2rad(data1)
data2 = np.deg2rad(data2)
lat1 = data1[:,0]
lng1 = data1[:,1]
lat2 = data2[:,0]
lng2 = data2[:,1]
diff_lat = lat1[:,None] - lat2
diff_lng = lng1[:,None] - lng2
d = np.sin(diff_lat/2)**2 + np.cos(lat1[:,None])*np.cos(lat2) * np.sin(diff_lng/2)**2
return 2 * 6371 * np.arcsin(np.sqrt(d))
因此,为了解决您的情况以获得所有成对半正弦距离,它将是 -
broadcasting_based_lng_lat(*convert_to_arrays(df1,df2))
元素级半正矢距离
对于两个数据之间的元素级半正弦距离计算,使得每个数据在每列两列或每列两个元素的列表中保存纬度和经度,我们将跳过一些扩展2D
最终得到这样的结果 -
def broadcasting_based_lng_lat_elementwise(data1, data2):
# data1, data2 are the data arrays with 2 cols and they hold
# lat., lng. values in those cols respectively
data1 = np.deg2rad(data1)
data2 = np.deg2rad(data2)
lat1 = data1[:,0]
lng1 = data1[:,1]
lat2 = data2[:,0]
lng2 = data2[:,1]
diff_lat = lat1 - lat2
diff_lng = lng1 - lng2
d = np.sin(diff_lat/2)**2 + np.cos(lat1)*np.cos(lat2) * np.sin(diff_lng/2)**2
return 2 * 6371 * np.arcsin(np.sqrt(d))
使用将两个数据保存在两列中的数据框进行示例运行 -
In [42]: np.random.seed(0)
...: a = np.random.randint(10,100,(5,2)).tolist()
...: b = np.random.randint(10,100,(5,2)).tolist()
...: df = pd.DataFrame({'A':a,'B':b})
In [43]: df
Out[43]:
A B
0 [54, 57] [80, 98]
1 [74, 77] [98, 22]
2 [77, 19] [68, 75]
3 [93, 31] [49, 97]
4 [46, 97] [56, 98]
In [44]: from haversine import haversine
In [45]: [haversine(i,j) for (i,j) in zip(df.A,df.B)]
Out[45]:
[3235.9659882513424,
2399.6124657290075,
2012.0851666001824,
4702.8069773315865,
1114.1193334220534]
In [46]: broadcasting_based_lng_lat_elementwise(np.vstack(df.A), np.vstack(df.B))
Out[46]:
array([3235.96151855, 2399.60915125, 2012.08238739, 4702.80048155,
1114.11779454])
这些细微的差异很大程度上是因为haversine library假设6371.0088
作为地球半径,而我们将其视为6371
here.