我有一些代码很长,因此需要很长时间才能运行。我想简单地在本地保存请求对象(在本例中为“名称”)或 BeautifulSoup 对象(在本例中为“汤”),以便下次可以节省时间。这是代码:
from bs4 import BeautifulSoup
import requests
url = 'SOMEURL'
name = requests.get(url)
soup = BeautifulSoup(name.content)
Since name.content
只是HTML
,您可以将其转储到文件中并稍后再读回。
通常瓶颈不是解析,而是发出请求的网络延迟。
from bs4 import BeautifulSoup
import requests
url = 'https://google.com'
name = requests.get(url)
with open("/tmp/A.html", "w") as f:
f.write(name.content)
# read it back in
with open("/tmp/A.html") as f:
soup = BeautifulSoup(f)
# do something with soup
以下是一些轶事证据,证明瓶颈存在于网络中。
from bs4 import BeautifulSoup
import requests
import time
url = 'https://google.com'
t1 = time.clock();
name = requests.get(url)
t2 = time.clock();
soup = BeautifulSoup(name.content)
t3 = time.clock();
print t2 - t1, t3 - t2
输出,在 Thinkpad X1 Carbon 上运行,具有快速园区网络。
0.11 0.02
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)