rtweet save_as_csv failing

I am trying to do some language, sentiment, and network analysis using rtweet, vader, and some other packages. I think the code my professor provided worked fine in rtweet 1.0 but now that the package has been upgraded its not working as expected… or I have something wrong; I’m hoping someone can help.


twitter_user <- "nike"
total_tweets <- 100

##Scrape Tweets for use later
raw_tweets <- rtweet::get_timeline(twitter_user, total_tweets, parse=T, token=my_authorization)

##create CSV becuase twitter data creates a lot of objects
rtweet::save_as_csv(raw_tweets, "tweets.csv", prepend_ids = TRUE, na = "", fileEncoding = "UTF-8")

My code is designed to grab tweets from whatever account I want and the number of tweets I need. It creates the raw_tweets dataframe fine. Because the instructors gave us code I intended to reuse it, but its all contingent on reading a CSV file and not from the dataframe. When the save_as_csv function runs, it errors out and creates a single, partial row in a CSV file. The error is:

Error in utils::write.table(x, file_name, row.names = FALSE, na = na, :
unimplemented type ‘list’ in ‘EncodeElement’

The warnings in the console state:

Warning: save_as_csv() was deprecated in rtweet 1.0.0.
Only works on rtweet data before 1.0.0 versionWarning: write_as_csv() was deprecated in rtweet 1.0.0.
Only works on rtweet data before 1.0.0 versionWarning: flatten() was deprecated in rtweet 1.0.0.
Only works on rtweet data before 1.0.0 versionWarning: data frame still contains recursive columns!Error in utils::write.table(x, file_name, row.names = FALSE, na = na, :
unimplemented type ‘list’ in ‘EncodeElement’

Of course, I’d MUCH prefer to get the information from the dataframe I create from the scraped tweets, but the objects within observations makes this much more complicated and I am not familiar enough with R to make it work.

Any help would be MUCH appreciated!

Yes, save_as_csv and load_as_csv do not work with data returned by rtweet since 1.0 version. Probably the instructions you have worked fine with rtweet 0.7.0 (The previous one was 0.7.0), you can read more about the changes in the post about rtweet I wrote to ease the transition.

If you want to save it and load it later you can use saveRDS(raw_tweets, "tweets.RDS") and load it back with raw_tweets <- readRDS("tweets.RDS").

I don’t know the code your professor provided, so I cannot provide much guidance about other required changes. I’m sorry but you’ll need to write your own code or ask your professor to update the instructions with the latest version of rtweet. I can give some guidelines:

  1. user information has moved and is accessible via users_data() so all the code that depended on using text and user_id, will need to change (also the user_id is now just “id”).

  2. There is a bug in get_timeline that might affect you if you require user data of more than one user. It is already fixed on the devel branch in github.

  3. Sentiment analysis should still work well, as you might just use the text of each tweet.

  4. Network analysis might require some tweaks, depending on how it was done:network_data/network_plot, should work fine, if you used other functions or packages you might need to change code). But there are some changes in get_followers

  5. If you want something much more simple and as csv you could use:

    tweets <- raw_tweets[, c("id_str", "text")]
    users <- users_data(raw_tweets[, "id_str", drop = FALSE]
    colnames(users) <- "user_id"
    simple_tweets <- cbind(tweets, users)
    write.csv(simple_tweets, "tweets.csv")
    
  6. I’m sorry I cannot help further, I hope you can take from there and learn and continue your analysis. With a bit of practice and time I’m sure you’ll be able to figure it out.

1 Like

I appreciate the reply. I figured it was likely an old version of rTweet was being used on their setup and its causing problems on my machine with new rtweet. I sort have managed to get through a few things and brute forced some VADER analysis and topic determination since it only depends on the full_text column and the vader_df function helps alot.

The big issue I am having now is doing the network analysis and identifying edges and such since the returned data from Twitter doesn’t contain the originating tweet’s user name (that I can find). I managed to use the the following to find tweets with a specific twitter handle (in this case GarminFitness):

test <- search_tweets("@GarminFitness", n = 1000, include_rts = FALSE)

I get a good data frame back. Unfortunately, the code below (provided by the professor) works with a CSV that the Social Media Macroscope outputs… but the SMM for the university is beyond FUBAR atm because its not scaled properly. When 1000 grad students hit it to do homework, it breaks… super hard. I think we killed it given its throwing 502 errors now. That being said, I was trying to adapt the code to use the test data frame above, but you need the original twitter handle to get the “FROM” column to find edges and therefore degree centrality. That’s the part I am struggling with now…

 select only relevant columns
df <- df_original %>% select(user.screen_name, text)
View(df)

# extract mentions into a new column named "mentions"
df$mentions <- str_extract_all(df$text, "(?<=@)\\w+")
View(df)

# transform DataFrame so that it contains 1 row per mention
edges <- df %>% select(user.screen_name, mentions) %>% 
  rename(c(from = user.screen_name, to = mentions))
edges <- unnest(edges, cols = to)
view(edges)

# exclude any @Apple mentions since we've searched for @Apple
edges <- edges %>% filter(to != "Apple")
View(edges)

# create graph from the DataFrame
graph <- graph_from_data_frame(edges, directed = TRUE)

# calculate degree centrality (both in + out degrees)
deg <- degree(graph, mode = "total")

# select top 500
top500 <- sort(deg, decreasing = TRUE)[1:500]
show(top500)

# only select top 500 influencers
subgraph <- induced_subgraph(graph, names(top500))

# visualize
# note that the plot here is difficult to interpret
# you can filter smaller subset and label nodes
plot(
 subgraph,
 layout = layout_with_fr(subgraph),
 main = "mentions network graph of 500 nodes with highest degree centrality",
 vertex.size = 3,
 vertex.label = NA,
 edge.arrow.size = 0
)

Very glad you manage to find a way to get through.
Please read the linked post about the transition from rtweet 0.7.0 to 1.0 it will help if you find the code provided doesn’t work. The returned data does contain the originating tweet’s user name, but to access it you need to use users_data():

ud <- users_data(test)
head(ud)
## # A tibble: 6 × 23
##        id id_str      name  scree…¹ locat…² descr…³ url   prote…⁴ follo…⁵ frien…⁶ liste…⁷ creat…⁸ favou…⁹ verif…˟
##     <dbl> <chr>       <chr> <chr>   <chr>   <chr>   <chr> <lgl>     <int>   <int>   <int> <chr>     <int> <lgl>  
## 1 1.81e 7 18075604    Nige… nigelw… "Londo… "MD @g… http… FALSE     26440    6456    2055 Fri De…  147361 FALSE  
## 2 9.51e17 9509105211… Kena… KenanH… "Reykj… "Canad… NA    FALSE        37     212       0 Wed Ja…     438 FALSE  
## 3 2.49e 7 24933826    Shar… sharon… "Amste… "Consu… http… FALSE     17808    5809     631 Tue Ma…   38872 TRUE   
## 4 6.29e 7 62907089    Klau… knoede… "Erlen… ""      NA    FALSE       112     136      17 Tue Au…   52642 FALSE  
## 5 2.80e 7 27964251    Chri… rl_chr… "Newca… "Work:… NA    FALSE      1552    2278       0 Tue Ma…   20677 FALSE  
## 6 7.39e 7 73863343    Evya… evyata… ""      "Husba… http… FALSE       156     895       2 Sun Se…    3626 FALSE  
## # … with 9 more variables: statuses_count <int>, profile_image_url_https <chr>, profile_banner_url <chr>,
## #   default_profile <lgl>, default_profile_image <lgl>, withheld_in_countries <list>, derived <chr>,
## #   withheld_scope <lgl>, entities <list>, and abbreviated variable names ¹​screen_name, ²​location, ³​description,
## #   ⁴​protected, ⁵​followers_count, ⁶​friends_count, ⁷​listed_count, ⁸​created_at, ⁹​favourites_count, ˟​verified
## # ℹ Use `colnames()` to see all variable names

I’d also recommend you explore the usage of network_data() and network_graph():

nd <- network_data(test)
head(nd)
##       from                 to    type
## 1 18075604           24933826 mention
## 2 18075604           73863343 mention
## 3 18075604           42000992 mention
## 4 18075604 960515675415613440 mention
## 5 18075604         3595680016 mention
## 6 18075604         2187847450 mention
ng <- network_graph(test)
## IGRAPH 250186f DN-- 345 15172 -- 
## + attr: id (v/c), name (v/c), type (e/c)
## + edges from 250186f (vertex names):
##  [1] nigelwalsh->sharonodea      nigelwalsh->evyatar_amira   nigelwalsh->Gilad_Shai     
##  [4] nigelwalsh->magicveronique  nigelwalsh->MsRMerry        nigelwalsh->JoeCMerriman   
##  [7] nigelwalsh->rl_chris_higham nigelwalsh->JimMarous       nigelwalsh->jenny__watts   
## [10] nigelwalsh->goforsergei     nigelwalsh->andrewthornley  nigelwalsh->barbmaclean    
## [13] nigelwalsh->pdeepa          nigelwalsh->Jo_R_H          nigelwalsh->Broker_Brett   
## [16] nigelwalsh->InsuranceEleph1 nigelwalsh->chrisc99        nigelwalsh->NeilMit42376137
## [19] nigelwalsh->Katherine_Coach nigelwalsh->Paul_GreyKnight nigelwalsh->jeffroth77     
## [22] nigelwalsh->Marged          nigelwalsh->EmmaAnnJohnston nigelwalsh->edgaze         
## + ... omitted several edges
plot(ng)

You might want to use the screen name via the attributes of nd to obtain a data.frame for your edges:

id <- attr(nd, "idsn")
names(id)
## [1] "id" "sn"
nd$from_id <- id$sn[match(nd$from, id$id)]
head(nd)
##       from                 to    type    from_id
## 1 18075604           24933826 mention nigelwalsh
## 2 18075604           73863343 mention nigelwalsh
## 3 18075604           42000992 mention nigelwalsh
## 4 18075604 960515675415613440 mention nigelwalsh
## 5 18075604         3595680016 mention nigelwalsh
## 6 18075604         2187847450 mention nigelwalsh
nd$to_id <- id$sn[match(nd$to, id$id)]
head(nd)
##       from                 to    type    from_id          to_id
## 1 18075604           24933826 mention nigelwalsh     sharonodea
## 2 18075604           73863343 mention nigelwalsh  evyatar_amira
## 3 18075604           42000992 mention nigelwalsh     Gilad_Shai
## 4 18075604 960515675415613440 mention nigelwalsh magicveronique
## 5 18075604         3595680016 mention nigelwalsh       MsRMerry
## 6 18075604         2187847450 mention nigelwalsh   JoeCMerriman
edges <- nd |> 
     filter(to_id != "GarminFitness") |> 
     mutate(from = from_id, to = to_id) |> 
     select(-from_id, -to_id)

You might want to filter also to keep just the mentions (or directly use nd <- network_data(test, e = "mention")).

Good luck with your course

1 Like