I am planning to develop an R-package and submit it to Ropensci.
This package gives access to 2 different APIs whose idea is very similar: the user provides a text and retrieves a dataframe of the identified entities and their respective categories.
Since one of the two APIs has many other functions, I plan to cover one of the many endpoints available by that API.
Although both APIs have the same purpose, the use of two is very convenient since they identify different types of categorical entities. Therefore, these two APIs
complement each other.
In reviewing the list of packages on Ropensci, they are all (I believe) based on a single API.
Therefore, my question is whether the use of 2 APIs in a package is avoidable despite being justified (as I pointed out above).
@sckott has developed spocc that wraps several of his API wrappers for species occurrence data, which might be a good model to follow. It’d mean writing three packages (one for each API, a further one to wrap them and provide a consistent user interface).
Now, I’m not sure how it’d work with software review (I’m an editor but can’t speak for everyone). It doesn’t sound optimal for anyone involved to submit three packages in one go. (What did happen a few times is packages getting split as a result of reviews)
Furthermore, if I follow correctly, your tools have some overlap with tokenizers? I’m asking both in term of scope (i.e. do the APIs extend this functionality significantly) and of interface (if the APIs work the same input and output as tokenizers, the user might expect a similar interface). I might be completely off, not knowing what the APIs are.
At first, my idea was to develop a package whose user input is a text (or a PMID) and retrieves
the words and their categories as a dataframe object. Therefore, the scope of the package is an interface to Named-Entity recognition APIs of biomedical words.
The NCBI Text mining web service API fits pretty well for that task.
The second API (biolink) has many interesting functionalities. One of them, it’s the nlp/annotate endpoint. This API endpoint allows you to identify words based on
phenotypic information, a functionality not covered by NCBI Text mining API but, in my opinion, with a great potential.
Therefore, I found a good idea to integrate both of them on a single package. But as I said in the previous message, I’m not sure how this approach is considered in the software
review step.