如何正确获取 pandas 中的单个单元格:loc[index,column] VS get_value(index,column)

2024-01-05

从 pandas 中获取单个细胞时使用哪种方法更好(就性能和可靠性而言)DataFrame: get_value() 还是 loc[]?


您可以在以下位置找到信息:docs http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label到底:

用于显式获取值(相当于已弃用的 df.get_value('a','A'))

# this is also equivalent to ``df1.at['a','A']``
In [55]: df1.loc['a', 'A']
Out[55]: 0.13200317033032932

但如果使用它则没有警告。

但如果检查Index.get_value http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.get_value.html:

从一维 ndarray 快速查找值。仅当您知道自己在做什么时才使用此功能

所以我认为更好的是使用iat http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iat.html, at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.at.html, loc http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html, ix http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ix.html.

Timings:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)

In [93]: %timeit (df.loc[0, 'A'])
The slowest run took 6.40 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 177 µs per loop

In [96]: %timeit (df.at[0, 'A'])
The slowest run took 17.01 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.61 µs per loop

In [94]: %timeit (df.get_value(0, 'A'))
The slowest run took 23.49 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.36 µs per loop
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何正确获取 pandas 中的单个单元格:loc[index,column] VS get_value(index,column) 的相关文章

随机推荐