如何使机械化不会因该页面上的表单而失败?

2024-01-09

import mechanize

url = 'http://steamcommunity.com'

br=mechanize.Browser(factory=mechanize.RobustFactory())

br.open(url)
print br.request
print br.form
for each in br.forms():
    print each
    print

上述代码的结果是:

Traceback (most recent call last):
  File "./mech_test.py", line 12, in <module>
    for each in br.forms():
  File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 426, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 559, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 228, in forms
mechanize._html.ParseError

我的具体目标是使用登录表单,但我什至无法让 mechanize 识别出有任何表单。甚至用我认为最基本的选择方法any form, br.select_form(nr=0),导致相同的回溯。如果有区别的话,表单的 enctype 是 multipart/form-data 。

我想这一切都归结为一个由两部分组成的问题:我怎样才能让机械化处理这个页面,或者如果不可能,那么在维护 cookie 的同时还有什么其他方法?

编辑:如下所述,这将重定向到“https://steamcommunity.com https://steamcommunity.com'.

Mechanize 可以成功检索 HTML,如以下代码所示:

url = 'https://steamcommunity.com'

hh = mechanize.HTTPSHandler()  # you might want HTTPSHandler, too
hh.set_http_debuglevel(1)
opener = mechanize.build_opener(hh)
response = opener.open(url)
contents = response.readlines()

print contents

您是否提到该网站正在重定向到 https (ssl) 服务器?

好吧,尝试设置一个新的 HTTPS 处理程序,如下所示:

mechanize.HTTPSHandler()
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何使机械化不会因该页面上的表单而失败? 的相关文章

随机推荐