我最初制定了一个基于的解决方案Yuva Raj https://stackoverflow.com/users/2135811/yuva-raj's 建议 https://stackoverflow.com/questions/22469713/managing-tweepy-api-search#comment34277304_22473254使用附加参数获取搜索/推文 https://dev.twitter.com/docs/api/1.1/get/search/tweets - the max_id
参数结合id
循环每次迭代中返回的最后一条推文的内容,该循环还检查是否出现TweepError
.
但是,我发现有一种更简单的方法可以使用tweepy.Cursor
(see tweepy 光标教程 https://github.com/tweepy/tweepy/blob/master/docs/cursor_tutorial.rst有关使用的更多信息Cursor
).
以下代码获取最近 1000 次提及'python'
.
import tweepy
# assuming twitter_authentication.py contains each of the 4 oauth elements (1 per line)
from twitter_authentication import API_KEY, API_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET
auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
query = 'python'
max_tweets = 1000
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query).items(max_tweets)]
更新:响应安德烈·佩特雷 https://stackoverflow.com/users/1207494/andrei-petre关于潜在内存消耗问题的评论tweepy.Cursor
,我将包括我的原始解决方案,替换上面用于计算的单语句列表理解searched_tweets
具有以下内容:
searched_tweets = []
last_id = -1
while len(searched_tweets) < max_tweets:
count = max_tweets - len(searched_tweets)
try:
new_tweets = api.search(q=query, count=count, max_id=str(last_id - 1))
if not new_tweets:
break
searched_tweets.extend(new_tweets)
last_id = new_tweets[-1].id
except tweepy.TweepError as e:
# depending on TweepError.code, one may want to retry or wait
# to keep things simple, we will give up on an error
break