我生成一个 npz 文件,如下所示:
import numpy as np
import os
# Generate npz file
dataset_text_filepath = 'test_np_load.npz'
texts = []
for text_number in range(30000):
texts.append(np.random.random_integers(0, 20000,
size = np.random.random_integers(0, 100)))
texts = np.array(texts)
np.savez(dataset_text_filepath, texts=texts)
这给了我这个 ~7MiB npz 文件(基本上只有 1 个变量texts
,这是 NumPy 数组的 NumPy 数组):
我加载的numpy.load()
:
# Load data
dataset = np.load(dataset_text_filepath)
如果我按如下方式查询,则需要几分钟:
# Querying data: the slow way
for i in range(20):
print('Run {0}'.format(i))
random_indices = np.random.randint(0, len(dataset['texts']), size=10)
dataset['texts'][random_indices]
而如果我按如下方式查询,则需要不到5秒的时间:
# Querying data: the fast way
data_texts = dataset['texts']
for i in range(20):
print('Run {0}'.format(i))
random_indices = np.random.randint(0, len(data_texts), size=10)
data_texts[random_indices]
为什么第二种方法比第一种方法快这么多?
dataset['texts']
每次使用时都会读取该文件。load of a npz
仅返回文件加载器,而不返回实际数据。它是一个“惰性加载器”,仅在访问时加载特定数组。这load
文档可能更清楚,但他们说:
- If the file is a ``.npz`` file, the returned value supports the context
manager protocol in a similar fashion to the open function::
with load('foo.npz') as data:
a = data['a']
The underlying file descriptor is closed when exiting the 'with' block.
并从savez
:
When opening the saved ``.npz`` file with `load` a `NpzFile` object is
returned. This is a dictionary-like object which can be queried for
its list of arrays (with the ``.files`` attribute), and for the arrays
themselves.
更多详情请参阅help(np.lib.npyio.NpzFile)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)