Pdftools 2.0: powerful pdf text extraction tools

Using pdftools::pdf_data, I’ve written a short script with a bunch of functions to help semi-automate extraction of complex tables (in my case tables with multiple lines per cell, spread over multiple pdf pages). The same process should work for any table. It is currently publically available on my github GitHub - lizlaw/pdf2complextable: Script to extract a complex table (here containing multiple lines per cell) from a pdf

2 Likes