do you specifically want the articles in the ft_data object, or is that only so you can use ft_chunks?
I’ve been working on https://github.com/ropensci/pubchunks - extracting the ft_chunks and related tools out of fulltext so they can be used outside of fulltext (but will be used in fulltext as well)
it’s not on cran yet, but please do try it. here’s an example:
remotes::install_github("ropensci/pubchunks")
x <- system.file("examples/10_1016_0021_8928_59_90156_x.xml",
package = "pubchunks")
y <- system.file("examples/10_1016_s1569_1993_15_30039_4.xml",
package = "pubchunks")
z <- list(x, y)
pub_chunks(z, "abstract")
pub_chunks(z, "title")
pub_chunks(z, "authors")
pub_chunks(z, c("abstract", "title", "refs"))
the general idea here follows from what ft_chunks was doing but just make it more general - and add more parsers for more publishers and section types