没有官方的 Google Scholar API https://academia.stackexchange.com/questions/34970/how-to-get-permission-from-google-to-use-google-scholar-data-if-needed/34973#34973.
有第三方解决方案,例如免费的scholarly https://github.com/scholarly-python-package/scholarly支持的Python包profile https://scholarly.readthedocs.io/en/stable/quickstart.html#search-by-keyword-and-return-a-generator-of-author-objects, author https://scholarly.readthedocs.io/en/stable/quickstart.html#search-for-an-author-by-name-and-return-a-generator-of-author-objects, cite https://scholarly.readthedocs.io/en/stable/quickstart.html#citedby and organic https://github.com/scholarly-python-package/scholarly/blob/9269ff36ad2314e6cc0c5b499efc3b79b844707e/scholarly/_scholarly.py#L24结果 (search_pubs https://scholarly.readthedocs.io/en/stable/quickstart.html?highlight=organic#search-pubs似乎是获得有机结果的方法,尽管方法名称让我感到困惑)。
请注意,通过使用scholarly
如果持续没有请求速率限制,Google 可能会阻止您的 IP (@RadioControlled 提到 https://stackoverflow.com/questions/62938110/does-google-scholar-have-an-api-available-that-we-can-use-in-our-research-applic/71236400?noredirect=1#comment131414631_71236400)。明智地使用它。
此外,还有一个scrape-google-scholar-py https://github.com/dimitryzub/scrape-google-scholar-py模块可让您提取几乎所有 Google Scholar 页面。
或者,有一个谷歌学术 API https://serpapi.com/google-scholar-api来自 SerpApi,这是一个付费 API,具有免费计划,支持organic https://serpapi.com/google-scholar-organic-results, cite https://serpapi.com/google-scholar-cite-api, profile https://serpapi.com/google-scholar-profiles-api, author https://serpapi.com/google-scholar-author-api结果并绕过 SerpApi 后端上的所有阻止,因此它不会阻止您的 IP,并且它会处理抓取的合法部分。
使用以下命令解析配置文件结果的示例代码scholarly
using search_by_keyword https://scholarly.readthedocs.io/en/stable/quickstart.html#search-by-keyword-and-return-a-generator-of-author-objects method:
import json
from scholarly import scholarly
# will paginate to the next page by default
authors = scholarly.search_keyword("biology")
for author in authors:
print(json.dumps(author, indent=2))
# part of the output:
'''
{
"container_type": "Author",
"filled": [],
"source": "SEARCH_AUTHOR_SNIPPETS",
"scholar_id": "LXVfPc8AAAAJ",
"url_picture": "https://scholar.google.com/citations?view_op=medium_photo&user=LXVfPc8AAAAJ",
"name": "Eric Lander",
"affiliation": "Broad Institute",
"email_domain": "",
"interests": [
"Biology",
"Genomics",
"Genetics",
"Bioinformatics",
"Mathematics"
],
"citedby": 552013
}
... other author results
'''
使用示例scrape-google-scholar-py https://github.com/dimitryzub/scrape-google-scholar-py#example-usage-custom-backend:
from google_scholar_py import CustomGoogleScholarProfiles
import json
parser = CustomGoogleScholarProfiles()
data = parser.scrape_google_scholar_profiles(
query='blizzard',
pagination=False,
save_to_csv=False,
save_to_json=False
)
print(json.dumps(data, indent=2))
Outputs:
[
{
"name": "Adam Lobel",
"link": "https://scholar.google.com/citations?hl=en&user=_xwYD2sAAAAJ",
"affiliations": "Blizzard Entertainment",
"interests": [
"Gaming",
"Emotion regulation"
],
"email": "Verified email at AdamLobel.com",
"cited_by_count": 3593
}, # other results...
]
使用以下命令解析有机结果的示例代码Google 学术搜索结果 API https://serpapi.com/google-scholar-profiles-api来自 SerpApi:
import json
from serpapi import GoogleScholarSearch
# search parameters
params = {
"api_key": "Your SerpApi API key",
"engine": "google_scholar_profiles",
"hl": "en", # language
"mauthors": "biology" # search query
}
search = GoogleScholarSearch(params)
results = search.get_dict()
# only first page results
for result in results["profiles"]:
print(json.dumps(result, indent=2))
# part of the output:
'''
{
"name": "Masatoshi Nei",
"link": "https://scholar.google.com/citations?hl=en&user=VxOmZDgAAAAJ",
"serpapi_link": "https://serpapi.com/search.json?author_id=VxOmZDgAAAAJ&engine=google_scholar_author&hl=en",
"author_id": "VxOmZDgAAAAJ",
"affiliations": "Laura Carnell Professor of Biology, Temple University",
"email": "Verified email at temple.edu",
"cited_by": 384074,
"interests": [
{
"title": "Evolution",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolution",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:evolution"
},
{
"title": "Evolutionary biology",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolutionary_biology",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:evolutionary_biology"
},
{
"title": "Molecular evolution",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amolecular_evolution",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:molecular_evolution"
},
{
"title": "Population genetics",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Apopulation_genetics",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:population_genetics"
},
{
"title": "Phylogenetics",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aphylogenetics",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:phylogenetics"
}
],
"thumbnail": "https://scholar.googleusercontent.com/citations?view_op=small_photo&user=VxOmZDgAAAAJ&citpid=3"
}
... other results
'''
有一个专门的使用 Python 抓取历史 Google Scholar 结果 https://serpapi.com/blog/scrape-historic-google-scholar-results-using-python/我在 SerpApi 上的博客文章展示了如何将历史性的 2017-2021 Organic、Cite Google Scholar 结果抓取到 CSV、SQLite。
还有一篇关于在 R 中抓取 Google Scholar https://dimitryzub.medium.com/scrape-google-scholar-in-r-d521cfe0e8d,如果你不是 Python 爱好者。
免责声明,我为 SeprApi 工作