One question, can anyone tell me what the “space” column value of TRUE or FALSE means precisely, when using the pdf_data function? I haven’t been able to locate any information on this searching on poppler, pdftools, etc. …
Thank you. I thought that this was the case (namely for a set of common “y” coordinate-valued rows forming a line, the maximum x value (rightmost word) would have space == FALSE). But I do get exceptions where common y-values have more than one FALSE value for “space”. Which leads me to think that the y-coordinate value cannot be thought of as a “line” strictly – or the “space” logical value signifies something more subtle?
I’ll search “hasSpaceAfter” for more information, thank you
item_dt <- pdf_data(pdf)[[7]]
Error in normalizePath(pdf, mustWork = TRUE) :
path[1]=" Federal, State, and Local Governments
2017 State and Local Government Finances
Technical Documentation
Individual Unit Data File (Public Use Format)
This is an ASCII fixed length text file. It contains amount for each finance item code within each
government unit for all respondents and non-respondents in the sample. This large file can be useful
for programming and database applications.
For 2017, the file name is 2017FinEstDAT_02202020modp_pu.txt and contains a standard 34-
character public-use format record layout. It is about 59 megabytes. Below is a detailed record
layout for the file.
This happens with every page, what does it mean? Thanks!
pdf_data expects a file path or raw vector. It looks like you probably passed in a character string instead, that is, your variable pdf is probably a string, correct? try passing a file path instead
This is not an issue or a suggestion. I just wanted to say this package has saved me hours of work. Thank you for all the effort. It really makes a difference.