NOTE on UTF-8 strings by `goodpractice::gp()`

Thank you for your comments, @KevCaz

I got the issue of different outputs by different checks, but my question is getting more a philosophical one (see below).

First some experiments. As you said, the problem is the example data set and more specifically, the author names, which contain special characters in UTF-8:

library(stringi)
library(taxlist)

Names <- Easplist@taxonNames$AuthorName[c(5299, 5021, 5019)]
Names
#> [1] "Borsch, Kai Müll. & Eb.Fisch." "Ruiz & Pav."                  
#> [3] "Ség."

stri_enc_mark(Names)
#> [1] "UTF-8" "ASCII" "UTF-8"

iconv(Names, "utf8", "ascii")
#> [1] NA            "Ruiz & Pav." NA           

stri_enc_toascii(Names)
#> [1] "Borsch, Kai M\032ll. & Eb.Fisch." "Ruiz & Pav."                     
#> [3] "S\032g."                         

stri_trans_general(Names, "latin-ascii")
#> [1] "Borsch, Kai Mull. & Eb.Fisch." "Ruiz & Pav."                  
#> [3] "Seg."                         

In my opinion, the last could be the more convenient way to transform the problematic vectors but from the taxonomic point of view, this is part of the identity of a taxon usage name, thus a modification in the contained information (perhaps equivalent to round decimal numbers for numerical purposes).

If everyone see what I see in my screen disregarding on local settings (the code block in this message includes console outputs), I don’t know why I should modify this information.

One additional question: Did I get it right that the encoding declaration in the metadata of the package is only concerning the documentation but not the distributed data?

1 Like