我正在尝试在 python 中使用 selenium chromedriver 来访问 www.mouser.co.uk 网站。然而,从第一次拍摄开始,它就被检测为机器人。
有人对此有解释吗?此后我使用的代码:
options = Options()
options.add_argument("--start-maximized")
browser = webdriver.Chrome('chromedriver.exe',chrome_options=options)
wait = WebDriverWait(browser, 30)
browser.get('https://www.mouser.co.uk')
我尝试访问该网址https://www.mouser.co.uk/
与某些chrome.选项但确实被检测到并被重定向到请原谅我们的打扰 page.
-
代码块:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.mouser.co.uk")
myElement = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='1_lnkLeftFlag']")))
driver.execute_script("arguments[0].click();", myElement)
现在正在检查请原谅我们的打扰页面你会发现<body>
标签包含:
- The class属性
dist-GlobalHeader
- The class属性
dist-PageWrap
这清楚地表明该网站受到保护机器人管理服务提供者蒸馏网络 https://www.distilnetworks.com/和导航Chrome驱动程序被检测到并随后blocked.
Distil
根据文章Distill.it 确实有一些东西...... https://www.forbes.com/sites/timconneally/2013/01/28/theres-something-about-distil-it/#6e1881e438b9:
Distil 通过观察网站行为并识别抓取工具特有的模式来保护网站免受自动内容抓取机器人的侵害。当 Distil 在一个站点上识别出恶意机器人时,它会创建一份部署到所有客户的黑名单行为配置文件。 Distil 类似于机器人防火墙,可以检测模式并做出反应。
Further,
"One pattern with Selenium was automating the theft of Web content"
Distil 首席执行官 Rami Essaid 上周在接受采访时表示。"Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".
参考
您可以在以下位置找到一些详细的讨论:
- Distil 检测 WebDriver 驱动的 Chrome 浏览上下文 https://stackoverflow.com/questions/53605757/unable-to-use-selenium-to-automate-chase-site-login/54284776#54284776
- Selenium webdriver:修改 navigator.webdriver 标志以防止 selenium 检测 https://stackoverflow.com/questions/53039551/selenium-webdriver-modifying-navigator-webdriver-flag-to-prevent-selenium-detec/53040904#53040904
- Akamai Bot Manager 检测 WebDriver 驱动的 Chrome 浏览上下文 https://stackoverflow.com/questions/59872920/clicking-on-get-data-button-for-monthly-settlement-statistics-on-nseindia-com-do/59874462#59874462
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)