NOTE on UTF-8 strings by `goodpractice::gp()`

kamapu · July 10, 2020, 9:50am

Thank you for your comments, @KevCaz

I got the issue of different outputs by different checks, but my question is getting more a philosophical one (see below).

First some experiments. As you said, the problem is the example data set and more specifically, the author names, which contain special characters in UTF-8:

library(stringi)
library(taxlist)

Names <- Easplist@taxonNames$AuthorName[c(5299, 5021, 5019)]
Names
#> [1] "Borsch, Kai Müll. & Eb.Fisch." "Ruiz & Pav."                  
#> [3] "Ség."

stri_enc_mark(Names)
#> [1] "UTF-8" "ASCII" "UTF-8"

iconv(Names, "utf8", "ascii")
#> [1] NA            "Ruiz & Pav." NA           

stri_enc_toascii(Names)
#> [1] "Borsch, Kai M\032ll. & Eb.Fisch." "Ruiz & Pav."                     
#> [3] "S\032g."                         

stri_trans_general(Names, "latin-ascii")
#> [1] "Borsch, Kai Mull. & Eb.Fisch." "Ruiz & Pav."                  
#> [3] "Seg."

In my opinion, the last could be the more convenient way to transform the problematic vectors but from the taxonomic point of view, this is part of the identity of a taxon usage name, thus a modification in the contained information (perhaps equivalent to round decimal numbers for numerical purposes).

If everyone see what I see in my screen disregarding on local settings (the code block in this message includes console outputs), I don’t know why I should modify this information.

One additional question: Did I get it right that the encoding declaration in the metadata of the package is only concerning the documentation but not the distributed data?

Topic		Replies	Views
(Generic function/package for) Mapping non-ascii characters to nearest ascii versions? Package Use Questions	10	3132	May 27, 2015
Language documentation for a package Software-Review	6	1026	May 18, 2021
pdftools converting hieroglyph Package Use Questions	5	783	May 12, 2019
Help with CRAN package submission - cryptic warning Package Development	4	333	April 24, 2023
Error During Package Build Package Development	2	63	October 21, 2024

NOTE on UTF-8 strings by `goodpractice::gp()`

Related topics