September 2018
Thanks Maëlle for your blog post! It reminds me two approachs built in pandoc
.
The first one is to use this pandoc
lua
filter: https://pandoc.org/lua-filters.html#extracting-information-about-links
After merging all the md
files in one all_files.md
, one can execute this kind of command:
rmarkdown::pandoc_convert("all_files.md", to = "markdown", output = "count.md", options = "--lua-filter=count_links.lua")
where count_links.lua
contains the referenced lua
script.
The second approach is to get a json
version of the pandoc
AST:
rmarkdown::pandoc_convert("all_files.md", to = "json", output = "allfiles_ast.json")
It is close to the XML commonmark version.
Regards,
Romain
3 replies
September 2018
▶ RLesur
Thanks Romain, this is very interesting! 
I am especially interested in the JSON approach since parsing JSON is well supported (e.g. rOpenSci has a jqr
package!) and something one needs to learn for other applications anyway.
Merci again! 
September 2018
▶ RLesur
In lieu of lua, you can also do
rmarkdown::pandoc_convert("all_files.md", to = "markdown", output = "count.md", options = "--filter=count_links.R")
Where count_links.R
is an arbitrary script that takes the JSON as stdin and emits JSON as stdout. In this case it’s R, but it could also be, say, a .jq
script.
1 reply
September 2018
▶ noamross
My mind is blown by all these nice ways to extract stuff from Markdown files 
September 2018
Just for info, I tried this on an R Markdown file, followed by writing it back to markdown, and I didn’t get the input file exactly.
It was worth trying though.
1 reply
September 2018
▶ maelle
September 2018
▶ RLesur
Thanks, will try again soon! 
September 2018