我正在尝试使用 rvest 来抓取需要在表单上输入电子邮件/密码登录的网页。
rm(list=ls())
library(rvest)
### Trying to sign into a form using email/password
url <-"http://www.perfectgame.org/" ## page to spider
pgsession <-html_session(url) ## create session
pgform <-html_form(pgsession)[[1]] ## pull form from session
set_values(pgform, `ctl00$Header2$HeaderTop1$tbUsername` = "[email protected] /cdn-cgi/l/email-protection")
set_values(pgform, `ctl00$Header2$HeaderTop1$tbPassword` = "mypassword")
submit_form(pgsession,pgform,submit=`ctl00$Header2$HeaderTop1$Button1`)
这给了我以下错误消息:
Error in submit_request(form, submit) :
未找到对象“ctl00$Header2$HeaderTop1$Button1”
如果我提交表单而不指定提交参数,我会得到以下结果:
Submitting with 'ctl00$Header2$HeaderTop1$Button1'
Error in function (type, msg, asError = TRUE) : <url> malformed
我还尝试将参数直接传递给 httr,如这个问题中提到的:如何在 R 中发布简单的 HTML 表单? https://stackoverflow.com/questions/27631460/how-can-i-post-a-simple-html-form-in-r,但“submit”参数不接受带反引号(``)、引号或不带任何引号的提交按钮:
library(httr)
url <- "http://www.perfectgame.org/Rankings/Players/Default.aspx?gyear=2015&num=500"
fd <- list(
submit = `ctl00$Header2$HeaderTop1$Button1`,
`ctl00$Header2$HeaderTop1$tbUsername` = "[email protected] /cdn-cgi/l/email-protection",
`ctl00$Header2$HeaderTop1$tbPassword` = "mypassword")
resp<-POST(url, body=fd, encode="form")
content(resp)
关于如何从 R 会话登录并抓取登录墙后面的数据,有什么想法吗?