rOpenSci package or resource used*
URL or code snippet for your use case*
Link to GitHub repo
- Has all the code and raw PDFs for users to test out themselves
- Vignette/Webpage with example code run
Raw example of the PDF table from the TTB. Notice there are inconsistent spaces between the table columns.
Field(s) of application
Forecasting, finance, econometrics, but really could be used for anything!
What did you do?
Goal: Read in non-trivially formatted tables from a PDF
- Read in PDF text/tables
- Split messy raw text into useful tabular format
- Combine into clean dataframes
- Apply across many similar PDFs
pdftools and love