如何从 ISI Web of Knowledge 检索有关期刊的信息？

2024-05-22

我正在从事一些预测文章引用计数的工作。我遇到的问题是我需要 ISI Web of Knowledge 中有关期刊的信息。他们逐年收集这些信息（期刊影响因子、特征因子……），但无法一次下载所有一年期期刊信息。只有“标记全部”选项，该选项始终标记列表中的前 500 种期刊（然后可以下载该列表）。我正在用 R 编写这个项目。所以我的问题是，如何立即或以高效、整洁的方式检索这些信息？谢谢你的任何想法。

I used 硒 http://ropensci.github.io/RSelenium/抓取 WOS 以获取引文数据并制作类似于此的图基兰·希利 http://kieranhealy.org/blog/archives/2014/11/15/top-ten-by-decade/（但我的是考古学期刊，所以我的代码是为此量身定制的）：

这是我的代码（来自一个稍大的项目）github https://gist.github.com/benmarwick/5826552):

# setup broswer and selenium
library(devtools)
install_github("ropensci/rselenium")
library(RSelenium)
checkForServer()
startServer()
remDr <- remoteDriver()
remDr$open()
# go to http://apps.webofknowledge.com/
# refine search by journal... perhaps arch?eolog* in 'topic'
# then: 'Research Areas' -> archaeology -> refine
# then: 'Document types' -> article -> refine
# then: 'Source title' -> choose your favourite journals -> refine
# must have <10k results to enable citation data
# click 'create citation report' tab at the top
# do the first page manually to set the 'save file' and 'do this automatically', 
# then let loop do the work after that

# before running the loop, get URL of first page that we already saved,
# and paste in next line, the URL will be different for each run
remDr$navigate("http://apps.webofknowledge.com/CitationReport.do?product=UA&search_mode=CitationReport&SID=4CvyYFKm3SC44hNsA2w&page=1&cr_pqid=7&viewType=summary")

这是自动从接下来的数百页 WOS 结果收集数据的循环......

# Loop to get citation data for each page of results, each iteration will save a txt file, I used selectorgadget to check the css ids, they might be different for you.
for(i in 1:1000){
  # click on 'save to text file'
  result <- try(
    webElem <- remDr$findElement(using = 'id', value = "select2-chosen-1")
  ); if(class(result) == "try-error") next;
  webElem$clickElement()
  # click on 'send' on pop-up window
  result <- try(
    webElem <- remDr$findElement(using = "css", "span.quickoutput-action")
  ); if(class(result) == "try-error") next;
  webElem$clickElement()
  # refresh the page to get rid of the pop-up
  remDr$refresh()
  # advance to the next page of results
  result <- try(
    webElem <- remDr$findElement(using = 'xpath', value = "(//form[@id='summary_navigation']/table/tbody/tr/td[3]/a/i)[2]")
  ); if(class(result) == "try-error") next;
  webElem$clickElement()
  print(i) 
}

# there are many duplicates, but the code below will remove them
# copy the folder to your hard drive, and edit the setwd line below
# to match the location of your folder containing the hundreds of text files.

将所有文本文件读入 R...

# move them manually into a folder of their own
setwd("/home/two/Downloads/WoS")
# get text file names
my_files <- list.files(pattern = ".txt")
# make list object to store all text files in R
my_list <- vector(mode = "list", length = length(my_files))
# loop over file names and read each file into the list
my_list <- lapply(seq(my_files), function(i) read.csv(my_files[i], 
                                                      skip = 4, 
                                                      header = TRUE,                            
                                                      comment.char = " "))
# check to see it worked
my_list[1:5]

将抓取到的数据帧列表合并到一个大数据帧中

# use data.table for speed
install_github("rdatatable/data.table")
library(data.table)
my_df <- rbindlist(my_list)
setkey(my_df)
# filter only a few columns to simplify
my_cols <- c('Title', 'Publication.Year', 'Total.Citations', 'Source.Title')
my_df <- my_df[,my_cols, with=FALSE]
# remove duplicates
my_df <- unique(my_df)
# what journals do we have?
unique(my_df$Source.Title)

为期刊名称制作缩写，将文章标题全部大写以准备绘图......

# get names
long_titles <- as.character(unique(my_df$Source.Title))
# get abbreviations automatically, perhaps not the obvious ones, but it's fast
short_titles <- unname(sapply(long_titles, function(i){
  theletters = strsplit(i,'')[[1]]
  wh = c(1,which(theletters  == ' ') + 1)
  theletters[wh]
  paste(theletters[wh],collapse='') 
}))
# manually disambiguate the journals that now only have 'A' as the short name                         
short_titles[short_titles == "A"] <- c("AMTRY", "ANTQ", "ARCH")
# remove 'NA' so it's not confused with an actual journal
short_titles[short_titles == "NA"] <- ""
# add abbreviations to big table
journals <- data.table(Source.Title = long_titles, 
                       short_title = short_titles)
setkey(journals) # need a key to merge
my_df <- merge(my_df, journals, by = 'Source.Title')
# make article titles all upper case, easier to read
my_df$Title <- toupper(my_df$Title)


## create new column that is 'decade'
# first make a lookup table to get a decade for each individual year
year1 <- 1900:2050
my_seq <- seq(year1[1], year1[length(year1)], by = 10)
indx <- findInterval(year1, my_seq)
ind <- seq(1, length(my_seq), by = 1)
labl1 <- paste(my_seq[ind], my_seq[ind + 1], sep = "-")[-42]
dat1 <- data.table(data.frame(Publication.Year = year1, 
                              decade = labl1[indx], 
                              stringsAsFactors = FALSE))
setkey(dat1, 'Publication.Year')
# merge the decade column onto my_df
my_df <- merge(my_df, dat1, by = 'Publication.Year')

查找出版十年来被引用次数最多的论文...

df_top <- my_df[ave(-my_df$Total.Citations, my_df$decade, FUN = rank) <= 10, ] 

# inspecting this df_top table is quite interesting.

以与 Kieran 类似的风格绘制绘图，这段代码来自乔纳森·古德温 http://jgoodwin.net/他还复制了他的领域的情节（1 http://jgoodwin.net/lit-cites.png, 2 https://twitter.com/joncgoodwin/status/534134510971912192)

######## plotting code from from Jonathan Goodwin ##########
######## http://jgoodwin.net/ ########

# format of data: Title, Total.Citations, decade, Source.Title
# THE WRITERS AUDIENCE IS ALWAYS A FICTION,205,1974-1979,PMLA

library(ggplot2)
ws <- df_top

ws <-  ws[order(ws$decade,-ws$Total.Citations),]
ws$Title <- factor(ws$Title, levels = unique(ws$Title)) #to preserve order in plot, maybe there's another way to do this

g <- ggplot(ws, aes(x = Total.Citations, 
                    y = Title, 
                    label = short_title, 
                    group = decade, 
                    colour = short_title))

g <- g + geom_text(size = 4) + 
  facet_grid (decade ~.,
              drop=TRUE,
              scales="free_y") + 
  theme_bw(base_family="Helvetica") +
  theme(axis.text.y=element_text(size=8)) +
  xlab("Number of Web of Science Citations") + ylab("") +
  labs(title="Archaeology's Ten Most-Cited Articles Per Decade (1970-)", size=7) + 
  scale_colour_discrete(name="Journals")

g #adjust sizing, etc.

该图的另一个版本，但没有代码：http://charlesbreton.ca/?page_id=179 http://charlesbreton.ca/?page_id=179

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

r

webscraping

如何从 ISI Web of Knowledge 检索有关期刊的信息？的相关文章

返回数据帧 R 中的下一行

我有一个看起来像这样的数据框 kind datetime book 2016 04 23 04 23 00 pen 2016 04 23 04 30 00 toy 2016 04 23 06 45 00 我想为数据集中的每一行返回下一行的日
Jsoup - 获取仅包含指定属性及其值的 HTML 标记

我想用jsoup从页面中提取具有以下内容的元素only一些特定的属性和值我已经经历过以下提到的方法但没有一个很好地解决了我的目的 Jsoup s getElementsByAttributesMatching 选择查询的这种格式 doc
如何用Python抓取动态网页

我正在努力做什么抓取下面的网页以获取二手车数据 Issue 刮掉整个页面在上面的 url 中仅显示前 30 项这些可以通过我在下面编写的代码来抓取其他页面的链接显示为 1 2 3 但链接地址似乎是用 Javascript 编写的
使用 broom 和 tidyverse 总结 r 平方游戏

我发布了一个问题here https stackoverflow com questions 48627287 getting adjusted r squared value for each line in a geom smooth
如何从 R 运行带有特定模块的 perl 脚本？

我可以从终端运行 perl 脚本 myperlscript pl 没有任何问题但是如果我尝试从 RStudio 中运行相同的 perl 脚本则会出现以下错误 command lt myperlscript pl outputfile
在 R 中创建一个运行计数变量？

我有一个足球比赛结果的数据集我希望通过创建一组类似于世界足球 Elo 公式的运行评级来学习 R 我遇到了麻烦在 Excel 中看似简单的事情在 R 中并不完全直观例如 4270 个观察中的前 15 个具有必要的变量 date t 1
删除ggplot2中的负图区域[重复]

这个问题在这里已经有答案了如何删除 ggplot2 中 x 轴和 y 轴下方的绘图区域请参见下面的示例我尝试了几个主题元素 panel border panel margin plot margin 但没有任何运气 p lt ggpl
计算每个唯一值出现的次数

假设我有 v rep c 1 2 2 2 25 现在我想计算每个唯一值出现的次数 unique v 返回唯一值是什么但不返回它们的数量 gt unique v 1 1 2 我想要一些能给我的东西 length v v 1 1 25 le
如何从 Fortran 调用 R 函数？

根据http gallery rcpp org articles r function from c http gallery rcpp org articles r function from c Rcpp 允许用户从 C 调用 R 函数
重复测量引导统计数据，按多个因素分组

我有一个看起来像这样的数据框但显然还有更多行等 df lt data frame id c 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 cond c A A B B A A B B A A B B A A B B co
R foreach问题（某些进程返回NULL）

我遇到了问题foreach我正在 R 中使用的程序的一部分该程序用于运行不同参数的模拟然后将结果返回到单个列表然后用于生成报告当并非所有分配的模拟运行都在报告上实际可见时就会出现问题从各方面来看似乎只有分配的运行的一个子集实际
我无法下载 R 中的 reshape2 包 [关闭]

Closed 这个问题是无法重现或由拼写错误引起 help closed questions 目前不接受答案我在尝试安装 R 包时收到此响应 gt installed packages reshape2 Package LibPath V
在 R 中向散点图添加线条

如何向图表添加线条我做了以下 dat lt data frame xvar 1 20 rnorm 20 sd 10 yvar 1 20 rnorm 20 sd 10 zvar 1 20 rnorm 20 sd 10 plot dat 1
通过间接引用列来修改数据框中的某些值

我正在整理一些数据我们将失败的数据分类到垃圾箱中并按批次计算每个分类箱的有限产量我有一个描述排序箱的元表这些行按升序测试顺序排列一些排序标签带有非语法名称 sort tbl lt tibble tribble weight lab
将绘图调用拆分为多个块

我正在编写一个图的解释其中我基本上将在第一个块中创建图然后描述该输出并在第二个块中添加一个轴然而似乎每个块都会强制一个新的绘图环境因此当我们尝试使用以下命令运行块时会出现错误axis独自的观察 output html docu
从命令行运行 R 代码 (Windows)

我在名为 analysis r 的文件中有一些 R 代码我希望能够从命令行 CMD 运行该文件中的代码而无需通过 R 终端并且我还希望能够传递参数并在我的代码中使用这些参数例如就像下面的伪代码 C gt execute r scri
API 请求和curl::curl_fetch_memory(url, handle = handle) 中的错误：SSL 证书问题：证书已过期

几天前我运行了代码几个月没有任何问题 GET url myurl query 今天我遇到一个错误 Error in curl curl fetch memory url handle handle SSL certificate pro
以引用透明的方式从函数的省略号参数中提取符号

事情又发生了我正要按下发布答案按钮的问题被删除了我正在寻找一种方法来从函数的省略号参数中提取绑定到符号的对象的值以及符号也就是说我试图以引用透明的方式从省略号中提取符号我尝试过使用替代品和lazy dots 但没有成功 funct
如何使用 Scrapy 从网站获取所有纯文本？

我希望在 HTML 呈现后可以从网站上看到所有文本我正在使用 Scrapy 框架使用 Python 工作和xpath body text 我能够获取它但是带有 HTML 标签而且我只想要文本有什么解决办法吗最简单的选择是ext
更改闪亮 R 中的默认浏览器

我在 RStudio 中使用 01 hello 虽然在 IE 中默认打开程序时它不会显示直方图但即使在 Chrome 中滑块也不起作用我无法滑动条形图并看到直方图中的变化如何更改 R 中的默认浏览器以便闪亮启动 Chrome 而不

随机推荐

UrlMapping 和文件扩展名

我有以下 url 映射 name a a file controller attachment action get 我想像这样使用它
在 JSPdf 中嵌入二进制文件

我在用着JsPDF https www npmjs com package jspdf将 html 内容导出到下载的 PDF 考虑以下示例该示例获取一些 HTML 内容并将其输出到下载的 PDF 文件使用JsPdf import Rea
如何将一行分成多行？

我有一个 CSV 文件看起来像这样 Column1 Column2 Column3 John Smith AA AH CA NI PB Reginald Higginsworth AA AH CA NI PB SN ZS 您会注意到其中有
在 tkinter Label 中漂亮地打印数据

我有以下示例数据 data 1 JohnCena Peter 24 74 2 James Peter 24 70 3 Cena Peter 14 64 14 John Mars 34 174 我想在 tkinter 输出窗口上以漂亮的表格方
DataFrame 分区到单个 Parquet 文件（每个分区）

我想重新分区合并我的数据以便将其保存到每个分区的一个 Parquet 文件中我还想使用 Spark SQL partitionBy API 所以我可以这样做 df coalesce 1 write partitionBy entity
两个分隔符之间的字符

尝试将正则表达式放在一起返回和之间的字符串其中是字符串的结尾 input abc def ghi 期望的正则表达式结果 def ghi 我已经尝试了很多这样的组合 1 任何帮助表示赞赏注意上面的正则表达式返回 abc def
reStructuredText 页面中的非 TOC 标题

我正在使用 Sphinx 编写一些文档有没有一种方法可以格式化页面中不成为目录一部分的标题理想情况下有一些反映在格式中的层次结构吗例如我想要做 My page TOC heading Subheading not in TOC a
Javascript 考虑 colspan 和 rowspan 获取表格单元格二维数组

我有一个带有 colspan 和 rowspan 的表如下所示 5x3 表 table tr td 1 td td 2 td td 3 td td 4 td tr tr td 5 td td 6 td td 7 td tr tr td 8
登录异常 SASL 身份验证在 android 中使用 DIGEST-MD5 asmack 机制失败

我尝试连接 Xmpp 服务器但出现异常登录异常 SASL 身份验证使用 DIGEST MD5 机制失败我使用这个代码任何人都可以帮助我或者代码 try if xmppConnection null ConnectionConfig
在R中，按特定字符分割字符向量；将第三块保存在新向量中

我有一个格式为 aaa 9999 1 的数据向量其中第一部分是字母位置代码第二部分是四位数年份最后部分是唯一的点标识符例如有多个 sil 2007 X 点每个点都有不同的最后一位数字我需要使用字符拆分此字段并仅将唯一 ID
.NET 紧凑框架中的信号量

不幸的是使用 NET Compact Framework 时 System Threading 中没有信号量我不确定为什么会这样有人有想法吗经过谷歌搜索后我发现很多人给出了他们自己的实现但他们中没有一个真正工作得很好或者根本没
Android 连接接口时出现空指针异常

大家好我尝试使用以下代码连接我的应用程序上的两个不同的接口 public class HelloForms extends Activity Called when the activity is first created Overri
C++ 字符串文字比较

我是一个 c 新手只是老派的 c 我儿子为此寻求帮助但我无法解释如果他问我如何比较字符串我会告诉他使用 strcmp 但这并不是让我困惑的地方这是他问的 int main cout lt lt A lt Z 将打印 1 int
为什么两个浮点类型变量具有不同的值[重复]

这个问题在这里已经有答案了我有两个大小接近 1000 的整数向量我要做的是检查这两个向量的平方整数之和是否相同所以我写了以下代码 std vector
Ruby：创建日期范围

我正在寻找一种优雅的方式来制作一系列日期时间例如 def DateRange start time end time period end gt gt results DateRange DateTime new 2013 10 10 1
如何使用 XmlSerializer 在大文档中插入节点

我有一个很大的 XML 文档我想使用XmlSerializer类来插入新元素其内容来自使用 xsd exe 生成的 NET 类实例这是问题的后续如何使用 XmlSerializer 反序列化大型文档中的节点 https stackov
将数据插入向量时多次调用复制构造函数

include
尝试克隆一个 git 存储库，但它卡在克隆到中

我使用的是 Windows 10版本 10 0 19042 内部版本 19042 GIT Ver 2 32当尝试使用 git bash 执行以下命令时git clone depth 1 b carla https github com Ca
Allen Browne 的 ConcatRelated() 错误 3061：参数太少

我正在尝试创建给定仓库的产品列表 Allen Browne 的 ConcatRelated 函数似乎是在链接变量相同时创建列表的经过验证的正确方法但我无法让它工作我已将我的信息分解为单个查询 qry Products SELECT qr
如何从 ISI Web of Knowledge 检索有关期刊的信息？

我正在从事一些预测文章引用计数的工作我遇到的问题是我需要 ISI Web of Knowledge 中有关期刊的信息他们逐年收集这些信息期刊影响因子特征因子但无法一次下载所有一年期期刊信息只有标记全部选项该选项始终标记列表

如何从 ISI Web of Knowledge 检索有关期刊的信息？

如何从 ISI Web of Knowledge 检索有关期刊的信息？ 的相关文章

随机推荐

热门标签

如何从 ISI Web of Knowledge 检索有关期刊的信息？的相关文章