[fulltext] Error: Can't subset columns that don't exist.

Hi,

Apologies if the following question is too obvious. I have just started using this package which looks very useful. I use the bibliometrix package for analysis and I am trying to retreive the full-text of the documents included in the analysis. Here is the code and the error message. I would be grateful if you can help.

library(“fulltext”)
dois ← M$DI[1:10]
str(dois)
chr [1:10] “10.1109/TEM.2019.2914408” “10.1080/13511610.2020.1867519” …
dois
[1] “10.1109/TEM.2019.2914408” “10.1080/13511610.2020.1867519”
[3] “10.1108/SEJ-11-2019-0081” “10.1080/13511610.2020.1870441”
[5] “10.1002/bse.2707” “10.3390/su13010415”
[7] “10.3390/en14010216” “10.15446/achsc.v48n1.91552”
[9] NA “10.1016/j.forpol.2020.102335”
res ← ft_abstract(x=dois)
Error: Can’t subset columns that don’t exist.
x Location 1 doesn’t exist.
:information_source: There are only 0 columns.
Run rlang::last_error() to see where the error occurred.
rlang::last_error()
<error/vctrs_error_subscript_oob>
Can’t subset columns that don’t exist.
x Location 1 doesn’t exist.
:information_source: There are only 0 columns.
Backtrace:

  1. fulltext::ft_abstract(x = dois)
  2. fulltext:::ft_abstract.character(x = dois)
  3. fulltext:::plugin_abstract_plos(from, x, plosopts, …)
  4. base::lapply(…)
  5. fulltext:::FUN(X[[i]], …)
  6. tibble:::[[.tbl_df(…)
  7. tibble:::tbl_subset2(x, j = i, j_arg = substitute(i))
  8. tibble:::vectbl_as_col_location2(j, length(x), j_arg = j_arg)
  9. vctrs::vec_as_location2(j, n, names)
  10. vctrs:::vec_as_location2_result(…)
  11. vctrs::vec_as_location(i, n, names = names, arg = arg)
  12. vctrs:::stop_subscript_oob(…)
  13. vctrs:::stop_subscript(…)
    Run rlang::last_trace() to see the full context.

traceback()
31: stop(fallback)
30: rlang:::signal_abort(x)
29: cnd_signal(cnd)
28: (function (cnd)
{
cnd$subscript_arg ← j_arg
cnd$subscript_elt ← “column”
if (isTRUE(assign) && !isTRUE(cnd$subscript_action %in% c(“negate”))) {
cnd$subscript_action ← “assign”
}
cnd_signal(cnd)
})(structure(list(message = “”, trace = structure(list(calls = list(
fulltext::ft_abstract(x = dois), fulltext:::ft_abstract.character(x = dois),
fulltext:::plugin_abstract_plos(from, x, plosopts, …),
base::lapply(ids, function(z) {
opts$x ← z
opts$callopts ← curlopts
list(doi = z, abstract = do.call(plos_abstract, opts))
}), fulltext:::FUN(X[[i]], …), base::do.call(plos_abstract,
opts), (function (x, …)
{
rplos::searchplos(q = paste0(“id:”, x), fl = “abstract”,
…)$data[[1]]
})(x = “10.1109/TEM.2019.2914408”, callopts = list()), rplos::searchplos(q = paste0(“id:”,
x), fl = “abstract”, …)$data[[1]], tibble:::[[.tbl_df(rplos::searchplos(q = paste0(“id:”,
x), fl = “abstract”, …)$data, 1), tibble:::tbl_subset2(x,
j = i, j_arg = substitute(i)), tibble:::vectbl_as_col_location2(j,
length(x), j_arg = j_arg), tibble:::subclass_col_index_errors(vec_as_location2(j,
n, names), j_arg = j_arg, assign = assign), base::withCallingHandlers(expr,
vctrs_error_subscript = function(cnd) {
cnd$subscript_arg ← j_arg
cnd$subscript_elt ← “column”
if (isTRUE(assign) && !isTRUE(cnd$subscript_action %in%
c(“negate”))) {
cnd$subscript_action ← “assign”
}
cnd_signal(cnd)
}), vctrs::vec_as_location2(j, n, names), vctrs:::result_get(vec_as_location2_result(i,
n = n, names = names, negative = “error”, missing = missing,
arg = arg)), vctrs:::vec_as_location2_result(i, n = n,
names = names, negative = “error”, missing = missing,
arg = arg), base::tryCatch(vec_as_location(i, n, names = names,
arg = arg), vctrs_error_subscript_type = function(err) {
err <<- err
i
}), base:::tryCatchList(expr, classes, parentenv, handlers),
base:::tryCatchOne(expr, names, parentenv, handlers[[1L]]),
base:::doTryCatch(return(expr), name, parentenv, handler),
vctrs::vec_as_location(i, n, names = names, arg = arg), (function ()
stop_subscript_oob(i = i, subscript_type = subscript_type,
size = size, subscript_action = subscript_action, subscript_arg = subscript_arg))(),
vctrs:::stop_subscript_oob(i = i, subscript_type = subscript_type,
size = size, subscript_action = subscript_action, subscript_arg = subscript_arg),
vctrs:::stop_subscript(class = “vctrs_error_subscript_oob”,
i = i, subscript_type = subscript_type, …)), parents = c(0L,
0L, 2L, 3L, 4L, 5L, 5L, 7L, 7L, 9L, 10L, 11L, 12L, 11L, 14L,
14L, 16L, 17L, 18L, 19L, 16L, 21L, 22L, 23L), indices = 1:24), class = “rlang_trace”, version = 1L),
parent = NULL, i = 1L, subscript_type = “numeric”, size = 0L,
subscript_action = NULL, subscript_arg = “”), class = c(“vctrs_error_subscript_oob”,
“vctrs_error_subscript”, “rlang_error”, “error”, “condition”)))
27: signalCondition(cnd)
26: signal_abort(cnd)
25: abort(class = c(class, “vctrs_error_subscript”), i = i, …)
24: stop_subscript(class = “vctrs_error_subscript_oob”, i = i, subscript_type = subscript_type,
…)
23: stop_subscript_oob(i = i, subscript_type = subscript_type, size = size,
subscript_action = subscript_action, subscript_arg = subscript_arg)
22: (function ()
stop_subscript_oob(i = i, subscript_type = subscript_type, size = size,
subscript_action = subscript_action, subscript_arg = subscript_arg))()
21: vec_as_location(i, n, names = names, arg = arg)
20: doTryCatch(return(expr), name, parentenv, handler)
19: tryCatchOne(expr, names, parentenv, handlers[[1L]])
18: tryCatchList(expr, classes, parentenv, handlers)
17: tryCatch(vec_as_location(i, n, names = names, arg = arg), vctrs_error_subscript_type = function(err) {
err <<- err
i
})
16: vec_as_location2_result(i, n = n, names = names, negative = “error”,
missing = missing, arg = arg)
15: result_get(vec_as_location2_result(i, n = n, names = names, negative = “error”,
missing = missing, arg = arg))
14: vec_as_location2(j, n, names)
13: withCallingHandlers(expr, vctrs_error_subscript = function(cnd) {
cnd$subscript_arg ← j_arg
cnd$subscript_elt ← “column”
if (isTRUE(assign) && !isTRUE(cnd$subscript_action %in% c(“negate”))) {
cnd$subscript_action ← “assign”
}
cnd_signal(cnd)
})
12: subclass_col_index_errors(vec_as_location2(j, n, names), j_arg = j_arg,
assign = assign)
11: vectbl_as_col_location2(j, length(x), j_arg = j_arg)
10: tbl_subset2(x, j = i, j_arg = substitute(i))
9: [[.tbl_df(rplos::searchplos(q = paste0(“id:”, x), fl = “abstract”,
…)$data, 1)
8: rplos::searchplos(q = paste0(“id:”, x), fl = “abstract”, …)$data[[1]]
7: (function (x, …)
{
rplos::searchplos(q = paste0(“id:”, x), fl = “abstract”,
…)$data[[1]]
})(x = “10.1109/TEM.2019.2914408”, callopts = list())
6: do.call(plos_abstract, opts)
5: FUN(X[[i]], …)
4: lapply(ids, function(z) {
opts$x ← z
opts$callopts ← curlopts
list(doi = z, abstract = do.call(plos_abstract, opts))
})
3: plugin_abstract_plos(from, x, plosopts, …)
2: ft_abstract.character(x = dois)
1: ft_abstract(x = dois)
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] shiny_1.6.0 fulltext_1.7.0

loaded via a namespace (and not attached):
[1] httr_1.4.2 sass_0.3.1 jsonlite_1.7.2 bslib_0.2.4 assertthat_0.2.1
[6] triebeard_0.3.0 urltools_1.7.3 highr_0.8 rplos_0.9.0 yaml_2.2.1
[11] pillar_1.4.7 glue_1.4.2 digest_0.6.27 promises_1.2.0.1 microdemic_0.6.0
[16] colorspace_2.0-0 htmltools_0.5.1.1 httpuv_1.5.5 plyr_1.8.6 XML_3.99-0.5
[21] pkgconfig_2.0.3 httpcode_0.3.0 purrr_0.3.4 xtable_1.8-4 scales_1.1.1
[26] whisker_0.4 later_1.1.0.1 aRxiv_0.5.19 solrium_1.1.4 tibble_3.0.6
[31] generics_0.1.0 ggplot2_3.3.3 ellipsis_0.3.1 DT_0.17 cachem_1.0.4
[36] cli_2.3.0 magrittr_2.0.1 crayon_1.4.1 mime_0.10 evaluate_0.14
[41] storr_1.2.5 fansi_0.4.2 xml2_1.3.2 tools_4.0.3 data.table_1.13.6
[46] lifecycle_0.2.0 stringr_1.4.0 munsell_0.5.0 compiler_4.0.3 jquerylib_0.1.3
[51] tinytex_0.29 rlang_0.4.10 grid_4.0.3 rstudioapi_0.13 rappdirs_0.3.3
[56] htmlwidgets_1.5.3 crosstalk_1.1.1 miniUI_0.1.1.1 rmarkdown_2.6 gtable_0.3.0
[61] rentrez_1.2.3 DBI_1.1.1 curl_4.3 reshape2_1.4.4 R6_2.5.0
[66] lubridate_1.7.9.2 knitr_1.31 dplyr_1.0.4 fastmap_1.1.0 utf8_1.1.4
[71] rcrossref_1.1.0 hoardr_0.5.2 stringi_1.5.3 crul_1.0.0 Rcpp_1.0.6
[76] vctrs_0.3.6 tidyselect_1.1.0 xfun_0.21

Thanks for your question about fulltext

First, the default data source ft_abstract pulls data from is PLOS - which maybe should be changed to crossref - but anyway, for DOIs for IEEE, you don’t want to use PLOS as the data source. Try crossref, scopus, or semanticscholar.

If we try semanticscholar we can get two abstracts

library(fulltext)
x <- c('10.1109/TEM.2019.2914408', '10.1080/13511610.2020.1867519',
'10.1108/SEJ-11-2019-0081', '10.1080/13511610.2020.1870441',
'10.1002/bse.2707', '10.3390/su13010415',
'10.3390/en14010216', '10.15446/achsc.v48n1.91552',
'10.1016/j.forpol.2020.102335')
y <- ft_abstract(x, from = "semanticscholar")
lapply(y$semanticscholar, "[[", "abstract")

There’s definitely a bug when ft_abstract(..., from="plos"), so I’ll fix that.

To get full text you want to use ft_get(). All or most IEEE papers are paywalled so you’ll need to have access to those papers. Even with my university VPN I’ve not had much success downloading IEEE papers through this package - I think IEEE likely blocks programmatic access (access not through a browser), but you may have success.

The issue with ft_abstract is now fixed, can install the fix with remotes::install_github("ropensci/fulltext")