我编写了一个脚本来从在线网站上抓取产品信息。目标是将这些信息写入 Excel 文件。由于我的Python知识有限,我只知道如何在Powershell中使用Out-file导出。但结果是每个产品的信息都打印在不同的行上。我希望每种产品都有一条生产线。
我想要的输出可以在图片中看到。我希望我的输出看起来像第二个版本,但我可以接受第一个版本。
这是我的代码:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
url = "http://www.strem.com/"
cas = ['16940-92-4','29796-57-4','13569-57-8','15635-87-7']
for i in cas:
driver = webdriver.Firefox()
driver.get(url)
driver.find_element_by_id("selectbox_input").click()
driver.find_element_by_id("selectbox_input_cas").click()
inputElement = driver.find_element_by_name("keyword")
inputElement.send_keys(i)
inputElement.submit()
# Check if a particular element exists; returns True/False
def check_exists_by_xpath(xpath):
try:
driver.find_element_by_xpath(xpath)
except NoSuchElementException:
return False
return True
xpath1 = ".//div[@class = 'error']" # element containing error message
xpath2 = ".//table[@class = 'product_list tiles']" # element containing table to select product from
#xpath3 = ".//div[@class = 'catalog_number']" # when selection is needed, returns the first catalog number
if check_exists_by_xpath(xpath1):
print "cas# %s is not found on Strem." %i
driver.quit()
else:
if check_exists_by_xpath(xpath2):
catNum = driver.find_element_by_xpath(".//div[@class = 'catalog_number']")
catNum.click()
country = driver.find_element_by_name("country")
for option in country.find_elements_by_tag_name('option'):
if option.text == "USA":
option.click()
country.submit()
name = driver.find_element_by_id("header_description").text
prodNum = driver.find_element_by_class_name("catalog_number").text
print(i)
print(name.encode("utf-8"))
print(prodNum)
skus_by_xpath = WebDriverWait(driver, 10).until(
lambda driver : driver.find_elements_by_xpath(".//td[@class='size']")
)
for output in skus_by_xpath:
print(output.text)
prices_by_xpath = WebDriverWait(driver, 10).until(
lambda driver : driver.find_elements_by_xpath(".//td[@class='price']")
)
for result in prices_by_xpath:
print(result.text[3:]) #To remove last three characters, use :-3
driver.quit()
else:
country = driver.find_element_by_name("country")
for option in country.find_elements_by_tag_name('option'):
if option.text == "USA":
option.click()
country.submit()
name = driver.find_element_by_id("header_description").text
prodNum = driver.find_element_by_class_name("catalog_number").text
print(i)
print(name.encode("utf-8"))
print(prodNum)
skus_by_xpath = WebDriverWait(driver, 10).until(
lambda driver : driver.find_elements_by_xpath(".//td[@class='size']")
)
for output in skus_by_xpath:
print(output.text)
prices_by_xpath = WebDriverWait(driver, 10).until(
lambda driver : driver.find_elements_by_xpath(".//td[@class='price']")
)
for result in prices_by_xpath:
print(result.text[3:]) #To remove last three characters, use :-3
driver.quit()
https://pythonhosted.org/openpyxl/tutorial.html https://pythonhosted.org/openpyxl/tutorial.html
这是一个Python库的教程,允许对Python进行操作
还有其他库,但我喜欢使用这个。
从 openpyxl 导入工作簿
wb = 工作簿()
然后使用给出的方法写入数据
进而
wb.保存(文件名)
真的很容易上手。
这是一个使用 xlwt 和 xlrd 的 pdf 教程,但我并不经常使用这些模块。http://www.simplistix.co.uk/presentations/python-excel.pdf http://www.simplistix.co.uk/presentations/python-excel.pdf
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)