R 是否有任何包允许查询维基百科(最有可能使用 Mediawiki API)来获取与此类查询相关的可用文章列表,以及导入选定的文章以进行文本挖掘?
有WikipediR, 'R 中的 MediaWiki API 包装器'
library(devtools)
install_github("Ironholds/WikipediR")
library(WikipediR)
它包括以下功能:
ls("package:WikipediR")
[1] "wiki_catpages" "wiki_con" "wiki_diff" "wiki_page"
[5] "wiki_pagecats" "wiki_recentchanges" "wiki_revision" "wiki_timestamp"
[9] "wiki_usercontribs" "wiki_userinfo"
这里正在使用它,获取一堆用户的贡献详细信息和用户详细信息:
library(RCurl)
library(XML)
# scrape page to get usernames of users with highest numbers of edits
top_editors_page <- "http://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits"
top_editors_table <- readHTMLTable(top_editors_page)
very_top_editors <- as.character(top_editors_table[[3]][1:5,]$User)
# setup connection to wikimedia project
con <- wiki_con("en", project = c("wikipedia"))
# connect to API and get last 50 edits per user
user_data <- lapply(very_top_editors, function(i) wiki_usercontribs(con, i) )
# and get information about the users (registration date, gender, editcount, etc)
user_info <- lapply(very_top_editors, function(i) wiki_userinfo(con, i) )
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)