使用输入按钮处理网站上的分页

2024-05-26

试图使用硒抓取这个网站。

我的代码可以工作,但目前它只抓取第一页。该页面使用输入按钮作为浏览页面的一种方式,因此我想逐个单击每个按钮,但它不起作用,有没有人有任何其他方法来处理此类分页的导航?

import requests
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.options import Options

options = Options()
# options.add_argument('--headless')
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, 
executable_path=r'/Users/liban/Downloads/chromedriver')

url = 'http://www.boston.gov.uk/index.aspx?articleid=6207&ShowAdvancedSearch=true'
driver.get(url)


def get_Data():
    data = []
    divs = driver.find_element_by_xpath('//*[@id="content"]/form').find_elements_by_tag_name('div')
    for div in divs:
        app_number = driver.find_element_by_xpath('//div[ contains( concat( " ", normalize-space( @class ), " "), " grid_13 ") ]/form/div[1]/h4/a').text
        address = driver.find_element_by_xpath('//div[ contains( concat( " ", normalize-space( @class ), " "), " grid_13 ") ]/form/div[1]/p[5]').text
        status = driver.find_element_by_xpath('//div[ contains( concat( " ", normalize-space( @class ), " "), " grid_13 ") ]/form/div[1]/p[1]/strong').text
        link = driver.find_element_by_xpath('//div[ contains( concat( " ", normalize-space( @class ), " "), " grid_13 ") ]/form/div[1]/h4/a').get_attribute("href")
        proposals = driver.find_element_by_xpath('//div[ contains( concat( " ", normalize-space( @class ), " "), " grid_13 ") ]/form/div[1]/p[3]').text

        data.append({"caseRef": app_number, "propDesc": proposals, "address": address,  "caseUrl": link, "status": status})
    print(data)
    return data

def navigation():
    data = []
    total_inputs = driver.find_element_by_xpath('//div[ contains( concat( " ", normalize-space( @class ), " "), " grid_13 ") ]/form/table/tbody/tr/td/input')
    for input in total_inputs:
        input.click()
        data.extend(get_Data())

def main():
    all_data = []
    select = Select(driver.find_element_by_xpath('//*[@id="DatePresets"]'))
    select.select_by_index(7)
    search_by = driver.find_element_by_xpath('//*[@id="radio-ReceivedDate"]')
    search_by.click()
    show = Select(driver.find_element_by_xpath('//*[@id="ResultSize"]'))
    show.select_by_index(4)
    search_button = driver.find_element_by_xpath('//*[@id="content"]/form/input[3]')
    search_button.click()

    all_data.extend(navigation())

 if __name__ == "__main__":
       main()

网站如何处理分页:

  <td align="center">
           <input type="submit" class="pageNumberButton selected" name="searchResults_Page" value="1" disabled="disabled"/>
           <input type="submit" class="pageNumberButton " name="searchResults_Page" value="2" />
           <input type="submit" class="pageNumberButton " name="searchResults_Page" value="3" />
           <input type="submit" class="pageNumberButton " name="searchResults_Page" value="4" />
           <input type="submit" class="pageNumberButton " name="searchResults_Page" value="5" />
           <input type="submit" class="pageNumberButton " name="searchResults_Page" value="6" />
           <input type="submit" class="pageNumberButton " name="searchResults_Page" value="7" />
           <input type="submit" class="pageNumberButton " name="searchResults_Page" value="8" />
           <input type="submit" class="pageNumberButton " name="searchResults_Page" value="9" />
           <input type="submit" class="pageNumberButton " name="searchResults_Page" value="10" />
    </td>

手动步骤:

  1. 选择预设日期=“上个月”
  2. 搜索依据=“两个日期”
  3. 点击搜索
  4. 抓取每个页面后,转到下一页,依此类推,直到没有更多页面,然后返回原始 URL。

尝试使用:find_elements_by_xpath代替find_element_by_xpath这将返回您的列表。

PS:我没有在本地尝试你的代码,但你提到的错误是我提到的解决方案。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

使用输入按钮处理网站上的分页 的相关文章

随机推荐