本次采用selenium方式爬取
但此代码有个小小问题 在第二个日期分类下会出现多个空格
selenium中的隐性等待和显示等待都测试过,无法解决这个问题
故把保存的txt逐行读取,再删除空格了
file1 = open('file1.txt', 'r')
file2 = open('data2.txt', 'w')
for line in file1.readlines():
if line == '\n':
line = line.strip('\n')
file2.write(line)
file1.close()
file2.close()
python代码
import unittest
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from time import sleep
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
def main():
sdata()
def sdata():
f = open('data.txt','a')
driver = webdriver.Chrome()
driver.get("http://gbtgame.ysepan.com/")
sleep(2)
elist = driver.find_elements_by_class_name('ml')
for i in range(0,len(elist)):
elist[i].click()
sleep(2)
f.write(elist[i].text)
f.write('\n')
f.write(('\n'))
egame =driver.find_elements_by_class_name('xwj')
sleep(3)
for j in range(0,len(egame)):
f.write(egame[j].text)
f.write('\n')
driver.close()
if __name__ == "__main__":
main()
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)