Hello. Previously, pdftools encoded the hieroglyphs in this format <U+52D9>, and now like this 零. How can I go back to the first format?
thanks for your question @Alex
any ideas on this @jeroenooms ?
Can you please include example code and an example pdf and specify when was previously? Which version number of pdftools/r/windows?
Code is simple:
txt <- pdf_text(path)
previous configuration:
ubuntu 16.04
R version 3.4.4 (2018-03-15)
Pdftools 1.8,
now:
Ubuntu 18.04.2 LTS
R version 3.4.4 (2018-03-15)
Pdftools 2.1
I installed previous versions of libraries, it did not help.
Link on all files https://yadi.sk/d/gVPpSmpMzDyl2Q
I not found how attach it here.
I think the difference is in your locale, not the version of pdftools. R automatically escapes non-ascii strings when you are in C locale.
txt <- pdf_text(path)
print(txt[2)
And now try this:
Sys.setlocale(locale = "C")
print(txt[2)
However I would not recommend this. If you really want to get escape sequences you could use stringi:
stringi::stri_escape_unicode(x[2])
That should properly escape utf-8 characters on any locale.
Thank you. This Sys.setlocale(locale = “C”) help me.