Hi, I originally wrote a script that used fulltext v0.18 but now when using v1.0.0 the same code seems to break.
An example of this can be seen trying to reproduce the chunks example from the fulltextmanual (https://ropensci.github.io/fulltext-book/chunks.html ):
x <- ft_get('10.1371/journal.pone.0086169', from='plos')
^ this works, but when I run the next line:
x %>% ft_collect %>% ft_chunks(what="authors")
I get the following error:
"Error in UseMethod(“read_xml”): no applicable method for ‘read_xml’ applied to an object of class “NULL”
Any thoughts? I have reverted back to v.0.18 for now so as to use my old code, but it would be nice to use the most recent package version
traceback() gives the following information:
12: xml2::read_xml(q)
11: FUN(X[[i]], …)
10: lapply(x[[i]]$data$data, function(q) {
qparsed <- if (inherits(q, “xml_document”))
q
else xml2::read_xml(q)
get_what(data = qparsed, what, names(x[i]))
})
9: ft_chunks(., what = “authors”)
8: function_list[k]
7: withVisible(function_list[k] )
6: freduce(value, _function_list
)
5: _fseq
(_lhs
)
4: eval(quote(_fseq
(_lhs
)), env, env)
3: eval(quote(_fseq
(_lhs
)), env, env)
2: withVisible(eval(quote(_fseq
(_lhs
)), env, env))
1: x %>% ft_collect %>% ft_chunks(what = “authors”)
sckott
February 5, 2018, 11:39pm
3
Thanks @maxfarrell
Can you share your sessionInfo()
?
sckott
February 5, 2018, 11:45pm
4
It’s possible ft_collect()
isn’t working. Can you make sure that fxn is working. e.g.,
x <- ft_get('10.1371/journal.pone.0086169', from='plos')
x$plos$data$data # this should be NULL
x <- ft_collect(x)
x$plos$data$data # this should have the text of the article
do you get the same?
x$plos$data$data
run after ft_get returns NULL as it is supposed to, but returns
$'10.1371/journal.pone.0086169’
NULL
after passing it through ft_collect…
Here is my sessionInfo():
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] fulltext_1.0.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 pillar_1.1.0 compiler_3.4.3 plyr_1.8.4
[5] bindr_0.1 tools_3.4.3 digest_0.6.14 lubridate_1.7.1
[9] gtable_0.2.0 jsonlite_1.5 tibble_1.4.1 rcrossref_0.8.0
[13] aRxiv_0.5.16 pkgconfig_2.0.1 rlang_0.1.6 bibtex_0.4.2
[17] shiny_1.0.5 crul_0.5.0 curl_3.1 bindrcpp_0.2
[21] storr_1.1.3 dplyr_0.7.4 httr_1.3.1 stringr_1.2.0
[25] xml2_1.2.0 rappdirs_0.3.1 grid_3.4.3 glue_1.2.0
[29] R6_2.2.2 rentrez_1.1.0 XML_3.98-1.9 solrium_1.0.0
[33] hoardr_0.2.0 whisker_0.3-2 reshape2_1.4.3 ggplot2_2.2.1
[37] magrittr_1.5 scales_0.5.0 rplos_0.8.0 htmltools_0.3.6
[41] microdemic_0.2.0 assertthat_0.2.0 colorspace_1.3-2 mime_0.5
[45] xtable_1.8-2 httpuv_1.3.5 stringi_1.1.6 miniUI_0.1.1
[49] lazyeval_0.2.1 munsell_0.4.3
sckott
February 6, 2018, 7:42pm
6
Thanks. Can you paste in the output of x$plos
after running x <- ft_get('10.1371/journal.pone.0086169', from='plos')
- which should show a file path to the file on your machine
(and can you try to put code in code blocks? See https://superuser.com/editing-help for help)
x$plos
$found
[1] 1
$dois
[1] "10.1371/journal.pone.0086169"
$data
$data$backend
[1] "ext"
$data$cache_path
[1] "/home/max/.cache/R/fulltext"
$data$path
$data$path$`10.1371/journal.pone.0086169`
$data$path$`10.1371/journal.pone.0086169`$path
[1] "/home/max/.cache/R/fulltext/10_1371_journal_pone_0086169.xml"
$data$path$`10.1371/journal.pone.0086169`$id
[1] "10.1371/journal.pone.0086169"
$data$path$`10.1371/journal.pone.0086169`$type
[1] "xml"
$data$path$`10.1371/journal.pone.0086169`$error
NULL
$data$data
$data$data$`10.1371/journal.pone.0086169`
NULL
$opts
$opts$doi
[1] "10.1371/journal.pone.0086169"
$opts$type
[1] "xml"
sckott
February 6, 2018, 9:52pm
8
Thanks. That worked as expected. Okay, now do exactly this, and paste in the output of x$plos
?
x <- ft_get('10.1371/journal.pone.0086169', from='plos')
x <- ft_collect(x)
x$plos
x <- ft_get('10.1371/journal.pone.0086169', from='plos')
x <- ft_collect(x)
x$plos
$found
[1] 1
$dois
[1] “10.1371/journal.pone.0086169”
$data
$data$backend
[1] “ext”
$data$cache_path
[1] “/home/max/.cache/R/fulltext”
$data$path
$data$path$10.1371/journal.pone.0086169
$data$path$10.1371/journal.pone.0086169
$path
[1] “/home/max/.cache/R/fulltext/10_1371_journal_pone_0086169.xml”
$data$path$10.1371/journal.pone.0086169
$id
[1] “10.1371/journal.pone.0086169”
$data$path$10.1371/journal.pone.0086169
$type
[1] “xml”
$data$path$10.1371/journal.pone.0086169
$error
NULL
$data$data
$data$data$10.1371/journal.pone.0086169
NULL
$opts
$opts$doi
[1] “10.1371/journal.pone.0086169”
$opts$type
[1] “xml”
sckott
February 7, 2018, 3:08pm
10
I think i’ve figured it out, it was a tiny bug, but had a big effect.
reinstall devtools::install_github("ropensci/fulltext")
- remember to restart the R session, then try again, let me know if it works or not
Seems to have worked!
$data is no longer NULL - thanks for the quick work on this, I’ll update my code to work with v1.0 and let you know how it goes!
1 Like
sckott
February 7, 2018, 5:52pm
12
Great, glad it worked. Will push this to CRAN soon so everyone has the fix.