It's better to stick to regular NumPy arrays over the chararrays https://docs.scipy.org/doc/numpy/reference/generated/numpy.chararray.html#numpy.chararray :
Note:
chararray 类的存在是为了向后兼容
Numarray,不建议用于新开发。从...开始
numpy 1.4,如果需要字符串数组,建议使用
dtype object_、string_ 或 unicode_ 的数组,并使用 free
numpy.char 模块中用于快速矢量化字符串的函数
运营。
对于常规数组,我们提出两种方法。
方法#1
我们可以使用np.count_nonzero https://docs.scipy.org/doc/numpy/reference/generated/numpy.count_nonzero.html来计算True
与搜索元素进行比较后的:'A'
-
np.count_nonzero(rr=='A')
方法#2
随着chararray
仅保留单个字符元素,我们可以通过查看它来更好地优化uint8
dtype,然后比较和计数。计数会快得多,因为我们将处理数字数据。实施将是 -
np.count_nonzero(rr.view(np.uint8)==ord('A'))
On Python 2.x
, 这将是 -
np.count_nonzero(np.array(rr.view(np.uint8))==ord('A'))
Timings
原始样本数据的计时并缩放至10,000x
缩放的 -
# Original sample data
In [10]: rr
Out[10]: array(['B', 'B', 'B', 'A', 'B', 'A', 'A', 'A', 'B', 'A'], dtype='<U1')
# @Nils Werner's soln
In [14]: %timeit np.sum(rr == 'A')
100000 loops, best of 3: 3.86 µs per loop
# Approach #1 from this post
In [13]: %timeit np.count_nonzero(rr=='A')
1000000 loops, best of 3: 1.04 µs per loop
# Approach #2 from this post
In [40]: %timeit np.count_nonzero(rr.view(np.uint8)==ord('A'))
1000000 loops, best of 3: 1.86 µs per loop
# Original sample data scaled by 10,000x
In [16]: rr = np.repeat(rr,10000)
# @Nils Werner's soln
In [18]: %timeit np.sum(rr == 'A')
1000 loops, best of 3: 734 µs per loop
# Approach #1 from this post
In [17]: %timeit np.count_nonzero(rr=='A')
1000 loops, best of 3: 659 µs per loop
# Approach #2 from this post
In [24]: %timeit np.count_nonzero(rr.view(np.uint8)==ord('A'))
10000 loops, best of 3: 40.2 µs per loop