Search_tweets doesn't include user_id? Github v 0.7.0.9024

ADrugResearcher · May 7, 2022, 3:03am

I started using the github repo version of rtweet (0.7.0.9024) after a suggestion by Lluis. 1st issue I’ve run into is that search_tweets() doesn’t seem to include user_id for the person who posted it (such as account ID or even screen_name).

Since they both have “id_str” as columns, my initial assumption was that this way a way of connecting the two data frames, but they refer to the tweetID & userID respectively.

I feel like there’s something I’m missing here, so figured I’d post here b4 creating an issue on github.

library(rtweet)
library(tidyverse)
#Bearer token input
auth_setup_default()

#Test run of my personal acct
test <- search_tweets2(q = "@ADrugResearcher", n=5)

#ss1 Retrieved 466 tweets
ss1 <- search_tweets(q = '"safe supply"', n = 600,
                    include_rts = TRUE, 
                    retryonratelimit = TRUE)

#Remove RT's (I need them for other purposes)
snrt <- ss1 %>%
  filter(!str_detect(full_text, "RT "))
#snrt returns 214 rows, 39 columns

u_data <- users_data(snrt)
#users_data returns 465 rows?

I have elevated access if that makes a difference (seems like search_tweets still runs on v1 though?)

The other idea that I had was to try to include author ID as an additional parameter

ss2 <- search_tweets(q = '"safe supply"', n = 100,
                     include_rts = TRUE, 
                     retryonratelimit = TRUE,
                     env_name="author_id")
#or the same thing, but instead of env_name, using
ss2 <- search_tweets(q = '"safe supply"', n = 100,
                     include_rts = TRUE, 
                     retryonratelimit = TRUE,
                     tweet.fields="author_id")

But then I get,

Warning message:
Could not authenticate you. (32)

Which I think is due to rtweet using v1, but unsure

Sorry if this convoluted, I’m primarily a qualitative researcher & thanks for any help!

llrs · May 7, 2022, 8:11am

id_str of tweets are the status id or users’ identifier. If the requests if for tweets the id_str is what used to be status_id, and the user_id is the id_str of users_data of said tweets. If the endpoint is to request data about users the id_str is the user_id, and the tweets_data’ id_str is the status_id. These ids are not mean to match users and tweets.

user <- lookup_users("ADrugResearcher")
t_data <- tweets_data(user)
# t_data$id_str is the id of the latest status.
# user$id_str is the id of the user.

When you filter the ss1 you don’t filter the users_data of said data.frame. I recommend to do something like:

rt <- str_detect(ss1$full_text, "RT ")
snrt <- ss1[!rt, ]
u_data_snrt <- users_data(ss1)[!rt, ]
attr(snrt, "users") <- u_data_snrt
u_data <- users_data(snrt)
stopifnot(nrow(u_data)  == nrow(snrt))

Yes, rtweet (still) uses v1 for all the requests.

The arguments tweet.fields and env_name are not Twitter API (v1) parameters. Probably this is why it fails to to validate/authenticate the request. Please read Twitter documentation for the endpoint (now included on References at the end of the help page).

I hope this helps,
I’ll try to improve this to make it easier to subset and filter using the whole data returned.

Topic		Replies	Views
Using look_upsers doesn't get me account_created_at? What should I use? Package Use Questions rtweet	4	476	March 1, 2022
Rtweet get_timeline no longer showing user_id or screen_name Package Use Questions rtweet	18	1243	December 15, 2022
rtweet save_as_csv failing General Q&A rtweet	3	1257	September 3, 2022
search_30day() 403 error? General Q&A rtweet	3	342	October 31, 2022
Error using network_data() function for rtweet data Package Use Questions	2	550	August 22, 2022

Search_tweets doesn't include user_id? Github v 0.7.0.9024

Related topics