我有以下 Python 代码来启动 APScheduler/TwistedScheduler cronjob 来启动蜘蛛。
使用一只蜘蛛不是问题,而且效果很好。然而,使用两个蜘蛛会导致错误:twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed
.
我确实找到了一个相关问题 https://stackoverflow.com/questions/71548957/twisted-internet-error-reactoralreadyinstallederror-reactor-already-installed, using CrawlerRunner
作为解决方案。但是,我正在使用 TwistedScheduler 对象,所以我不知道如何使用多个 cron 作业(多个add_job()
).
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from apscheduler.schedulers.twisted import TwistedScheduler
from myprojectscraper.spiders.my_homepage_spider import MyHomepageSpider
from myprojectscraper.spiders.my_spider import MySpider
process = CrawlerProcess(get_project_settings())
# Start the crawler in a scheduler
scheduler = TwistedScheduler(timezone="Europe/Amsterdam")
# Use cron job; runs the 'homepage' spider every 4 hours (eg. 12:10, 16:10, 20:10, etc.)
scheduler.add_job(process.crawl, 'cron', args=[MyHomepageSpider], hour='*/4', minute=10)
# Use cron job; runs the full spider every week on the monday, tuesday and saturday at 4:35 midnight
scheduler.add_job(process.crawl, 'cron', args=[MySpider], day_of_week='mon,thu,sat', hour=4, minute=35)
scheduler.start()
process.start(False)