I’m trying to use the get_timeline() function for multiple users to build a network map. Previously, with an older version of rtweet (I think it must have been 1.0.0), after retrieving a few hundred tweets from each user’s timeline I was able to use the columns mentions_screen_name, retweet_screen_name, reply_to_screen_name and quoted_screen_name to create an edgelist that I could convert to a network graph using Igraph. It’s possible I’m doing something wrong, but now, with rtweet 1.0.2 the returned data structure appears to be different, and the code I was using before does not work.
When I use the function network_data() on a test set of retrieved tweets using the get_timeline() function I get the following error:
Error in vectbl_as_row_location():
! Must subset rows with a valid subscript vector.
Logical subscripts must match the size of the indexed input.
Input has size 100 but subscript !k has size 200.
Run rlang::last_error() to see where the error occurred.
Any advise on what I may be doing wrong, or what I can do to get column data on mentions_screen_name, retweet_screen_name, reply_to_screen_name and quoted_screen_name in order to build a network graph, would be much appreciated!
Okay, after spending some more time on the problem I realized that network_data() can only receive inputs of data that are the same row length as when they were collected. So, for example, if we set n as 100 in
test <- get_timeline()
user = c("handle1", "handle2"),
n = 100
then network_data() has to look something like:
net_test <- network_data(test[1:100, ],
e = c("mention", "retweet", "reply", "quote"))
That works because it takes the first 100 Twitter interactions as a chunk (though for now ignores the next 100).
Therefore, I believe a for loop that takes chunks of rows at a time would work to go through a large collection of tweets from multiple users.
(I’m realizing this might be a rather amateur problem, so I can delete this thread if it would be better. But, can also leave if this might be helpful for others trying to wrangle data for network analysis).
I think your problem comes from a bug in get_timeline: When querying multiple users data it doesn’t provide the user information for all the users, so the network_data function cannot build the network. This was reported in this question and fixed (in the devel branch).
If you find other problems or improvements that would make it easier to work with rtweet, please do not hesitate to post them!