How to specify certain parameters in package tesseract

Hello everyone,

I’m currently trying to use the R package tesseract. For now, my results are alright, however, I would like to improve the accuracy by specifying some of the available parameters.

This leads to my problem. I do not understand how to specify some parameters correctly.

In particular, I’d like to specify the following parameters:

  • tessedit_char_whitelist
  • user_patterns_file
  • user_words_file

I manage to correctly specify tessedit_char_whitelist but face problems with the other two parameters. I fail to understand how I have to insert the patterns and word file into R. My idea was to specify a character vector, operating as a list of patterns or words I’d like tesseract to make use of. However, this does not seem to work.

If I understand the documentation correctly, then I would have to specify a pattern and a words file separately. I do not understand how I would do that.

I’m more than glad for any suggestions and help. Unfortunately, I cannot share the pdf file due to data privacy.

Thank you very much.
Sincerely,
Albert

1 Like

@jeroenooms can you give some guidance on this?