pdftools for parsing tables from many .pdfs

Tags: #<Tag:0x00007f516f361fb0> #<Tag:0x00007f516f361ee8>

rOpenSci package or resource used*

pdftools

URL or code snippet for your use case*

Image

Raw example of the PDF table from the TTB. Notice there are inconsistent spaces between the table columns.

Sector

Finance, econometrics

Field(s) of application

Forecasting, finance, econometrics, but really could be used for anything!

What did you do?

Goal: Read in non-trivially formatted tables from a PDF

Outcome:

  • Read in PDF text/tables
  • Split messy raw text into useful tabular format
  • Combine into clean dataframes
  • Apply across many similar PDFs

Comments

Love pdftools and love ropensci!

Twitter handle

@thomas_mock

2 Likes

Thank you for posting this here! I’ve scheduled a tweet about this for Wednesday :slight_smile:

1 Like