Rtweet get_timeline no longer showing user_id or screen_name

I am looking to pull a set of Tweets for multiple users, the rtweet package. When I have used the get_timeline() function previously, the output has included the user_id and screen_name of the account. This is particularly important when pulling Tweets for multiple users so that you can identify who the Tweet belongs to.

However, in the recent update, this function no longer pulls any identifying user information. We now get id and id_str which are unique to the Tweet but don’t tell us whose timeline we are pulling from. Any idea how to reattach a screen_name to the output or a way to identify the output?

accounts <- c("BarackObama", "justinbieber")
timelines <- get_timeline(accounts, n = 100, token = auth, retryonratelimit = TRUE)
id id_str
1.436416e 1436416270426050566
1.437194e 1437194052122906626

The current version of rtweet does pull the identifying user information. It might be useful to read the post I wrote about the update of the package: rOpenSci | Upgrading rtweet .

In short, you must use users_data(timelines) to visualize information about users from those endpoints that provide tweet data. You can reattach it with:

accounts <- c("BarackObama", "justinbieber")
timelines <- get_timeline(accounts, n = 100, token = auth, retryonratelimit = TRUE)
users_timelines <- users_data(timelines)
cbind(timelines, users_timelines[, c("id", "id_str", "name", "screen_name")])

In future version I’ll try to make it easier to find and relate this information.

1 Like

Thanks for taking a look at this. We are getting closer. However, in the code you provided, users_data(timelines) only pulled information for BarackObama and then the cbind only appended on his information to all lines, even though half of the tweets were from Barack Obama and half were from Justin Bieber. It looks like the main issue is that users_data is only pulling user information for one account (rather than the two accounts that make up timelines).

1 Like

Oh, sorry, this is a bug in rtweet: users’ data is dropped in get_timeline when multiple users are provided. I opened a bug in the repository to fix it: get_timeline for multiple users do not have all the users_data · Issue #723 · ropensci/rtweet · GitHub.

To avoid this problem until I fix it in next release I would recommend to search for each one user each time:

timeline_bo <- get_timeline("BarackObama", n = 100, token = auth, retryonratelimit = TRUE)
timeline_jb <- get_timeline("justinbieber", n = 100, token = auth, retryonratelimit = TRUE)

bo <- cbind(timeline_bo, users_data(timeline_bo)[, c("id", "id_str", "name", "screen_name")])
jb <- cbind(timeline_jb, users_data(timeline_jb)[, c("id", "id_str", "name", "screen_name")])
timeline_with_users <- rbind(bo, jb)
2 Likes

I fixed this in the devel version of rtweet version 1.0.2.9004 (in the devel branch in github). Next release shouldn’t have this problem

2 Likes

Thanks for the update! I downloaded the development version of rtweet from github (though it said it was version still 1.0.2) and it was still having the same issue. Is there is a way for me to access this update?

The current development version of rtweet is in github on the devel branch. Probably you installed the version from the default branch.

You can install the latest version with: remotes::install_github("ropensci/rtweet@devel")

1 Like

Thanks. I tried to install with remotes::install_github("ropensci/rtweet@devel"). Installation was successful but screen_name is still not loaded when I use get_timeline.

I also note that ?get_timelineno longer gives me the help file for the get_timeline function. I don’t know if that is a significant clue to why get_timeline still does not load screen_name.

Dear cortexR.

The screen_name should be hidden, see this blog post for more background information. Did you use users_data(timeline) and didn’t get that column? Could you provide a reproducible example with the version of the package you are using?

The development version is might have problems and errors in it. However, the help page of get_timeline was accessible for me with ?get_timeline.

Thanks a lot. That blog post explains it. Instead of using get_timeline I just need to change code to use users_data() and tweets_data() and then bind them with cbind to get screen_name into the data about users latest tweets.

PS: In the blog post, there seems to be a minor error. Instead of users_and_last_tweets <- cbind(users, id_str = tweets_data(users)[, "id_str"]), it should be users_and_last_tweets <- cbind(users, tweets_data(users) ,id_str = tweets_data(users)[, "id_str"]). (I.e. the second data to bind was missing).

@cortexR The data about the users is already in the output of get_timeline, users_data only retrieves the attribute where it is “hidden”.

It is no error, it is just an example on where and how to “add” data. I made the decision when retrieving users data to only display user data and when searching tweets only display tweet data. Packages and users might need or not that other data or would like one way or another.

1 Like

I am following up on this issue. Using the dev version versus 0.7.0 I am seeing significant changes in what is returned with get_timeline for a vector of users (I have many users I request data from). I find the cbind method to be inefficient. As seen in the screen shot attached, col_070 returns the lists of columns for get_timeline in version 0.7.0 and col_new_version returns the ones for the latest development version 1.0.2.9004.

I need to still have access to columns like “screen_name”, “status_url”, “query” and “mentions_screen_name” to maintain code developed on version 0.7.0. How can I access this data with the get_timeline() or any other functions?

@ggzoe You might have missed this introduction to the changes in rtweet betwee 0.7.0 and the 1.0.2 version. All the information is currently in there you just need to call users_data() but some columns might have changed position and/or name. For example rtweet no longer provides the status_url, you you can create it simply by adding https://twitter.com/<screen_name>/status/<id_str>. Mentioned users are part of the entities and then inside user_mentions (something like this might help retrieve them tw$entities[[1]]$user_mentions)

While checking this, I set users_data to return the data in the same order for all the functions in 1.0.2.9012, which has some improvements on the column binding of the output of the functions of rtweet. However, if you could explain how the cbind method is inefficient for your case it might help me understand better your issue. (But it might be better to just open a new question in the forum)

Thank you, this is helpful. But I have an additional problem: I want to use this code in a for-loop to get the timelines for over 1000 politicians with their screen names attached in a single data frame. I tried it with a sample of three, but I’m a beginner in R, and after searching for a while, I didn’t find an appropriate solution. Can you help me out?

Hi Lucas @TheLucasSchwarz I might be able to help but I don’t understand your question.
When you say “I tried it”, what is “it”? Extract the user_mentions, have in a single data.frame the tweets and the screen names of the politicians or the screen names of the original authors? Or something different?

If you tried to extract the user mentions of a tweet I recently (yesterday) created a new helper in rtweet 1.0.2.9014 to extract all user_mentions (simply called user_mentions) and the tweet they are from. Note that there might be multiple users in a single tweet and so some tweet ids might be duplicated, and also that tweets without mentions still appear but with NA in the other rows.

Here is some code to extract the users mentioned in the tweets of the feed of some politicians:

remotes::install_github("ropensci/rtweet@devel")
library("rtweet")
packageVersion("rtweet") >= "1.0.2.9014" # Should return TRUE
users <- c("my_plolitician1", "my_plolitician2", "my_plolitician3")
timeline_politicians <- get_timeline(users)
tweet_id_and_users_mentioned <- user_mentions(timeline_politicians)

I hope this helps

Hi @llrs, thank you for your answer. Sorry, my question was really unclear. I have an existing database of all German candidates for the 2021 Federal Election with their screen names. My answer referred to this code:

timeline_bo ← get_timeline(“BarackObama”, n = 100, token = auth, retryonratelimit = TRUE)
timeline_jb ← get_timeline(“justinbieber”, n = 100, token = auth, retryonratelimit = TRUE)

bo ← cbind(timeline_bo, users_data(timeline_bo)[, c(“id”, “id_str”, “name”, “screen_name”)])
jb ← cbind(timeline_jb, users_data(timeline_jb)[, c(“id”, “id_str”, “name”, “screen_name”)])
timeline_with_users ← rbind(bo, jb)

In this example, I would have to write for all 2500 politicians an extra line. So I thought I could use a for loop to solve this problem and created a list called accounts with all the screen names out of the data frame.

In the meanwhile, I created this dilettant approach with an example of three politicians. This works fine.

Create account list

accounts = c(“ABaerbockArchiv”, “OlafScholz”)

Create Master_Timeline with first account

timeline_al ← get_timeline(“ArminLaschet”, n = 100, retryonratelimit = TRUE)
master_timeline ← cbind(timeline_al, users_data(timeline_al)[, c(“id”, “id_str”, “name”, “screen_name”)])

For Loop for the rest of the Timelines

for (account in accounts) {
timeline ← get_timeline(account, n = 100, retryonratelimit = TRUE)

full_timelines ← cbind(timeline, users_data(timeline)[, c(“id”, “id_str”, “name”, “screen_name”)])

master_timeline ← rbind(master_timeline, full_timelines)
}

Do you think this will still work with +2500 Accounts and when I collect for each account all tweets to access the time period of the Election campaign from August to September 2021? Does the retryonratelimit = TRUE option solve the issue with too-large queries? I have problems with the cap of 1024 characters for one query in the academictwitteR-package.

Thank you for your help :slight_smile:

Hi @TheLucasSchwarz, this would simplify your code and be equivalent to your looped code:

packageVersion("rtweet") >= "1.0.2.9014"
accounts <- c("ArminLaschet", “ABaerbockArchiv”, “OlafScholz”)
timeline <- get_timeline(accounts, retryonratelimit = TRUE)
timeline_users <- cbind(timeline, users_data(timeline)[, c(“id”, “id_str”, “name”, “screen_name”)])

retryonratelimit = TRUE is meant to try again after you have completed the allocated amount of calls (1500 if I remember correctly for this endpoint), but to try again it waits 15 minutes. So it would take around 20 minutes to download this amount of information from 2500 accounts (if there isn’t any other limit). So, yes, this will work to retrieve the latest 100 tweets of 2500 accounts.

Given that is has been already a year over the period you want data from I doubt you’ll get the tweets between August and September 2021 in these 100 tweets. You could specify the since_id and max_id arguments to limit which is the first and last tweet you want to get, but you would need to know that in advance (which I don’t think you do or can get easily).

In academictwitteR you probably can split the requests and ask multiple times for ~150 accounts (to give a number of accounts that should be within limits of characters) and then join together the results. This would avoid the problem of a too long request, but you might hit other limits like too many requests.

Good luck with your analysis whatever route you use.

1 Like

Many thanks to all the people who participated in this chat. As the original problem was solved, we will close this discussion.

Details of the solution can be found in this thread and in this blog post: rOpenSci | Upgrading rtweet

If you have other questions about this (or another) package, please start a new question/topic on the forum.

2 Likes