Using DO.utils with reticulate and Python
use-reticulate-python.Rmd
What does Python add?
DO.utils is designed to balance functionality and ease of use, with a goal for reproducibility. Some functionality is just not available in R, or it’s less reproducible or harder to manage. Python fills in these gaps. Specifically using Python powers functions that:
- Execute SPARQL queries on local files.
- The R SPARQL engines require database management
(
virtuoso
) or do not work with SPARQL 1.1 (rdflib
).
- The R SPARQL engines require database management
(
- Extract information from git repos.
- To update data from releases and analyze unexpected changes to files.
- Predict term mappings.
- Utilizing the biopragmatics stack: bioregistry, biomappings, etc.; pyobo; and INDRA labs’ GILDA
Why use Python via reticulate?
Generally, the R-python connection allows data integration with R
where data frames can be more easily managed, explored, and analyzed.
More specifically, reticulate
should help manage
Python and python module dependencies and simplify user setup/use.
How to use python-dependent DO.utils functions?
These functions should work as standard R functions. In some cases internal data extractor functions have been written to eliminate the need for non-standard user data munging.
Python Setup
Although it has not been tested yet, use of reticulate
should handle installation of python and/or python modules
automatically and require little to no user input. The default is to
attempt to identify and use python that is already installed. Should
that not work, reticulate
should install python
and needed modules via miniconda.
Customization
More user control can be achieved over this installation (like using
pip and venv instead of miniconda) by using reticulate to install python
and create a virtual environment before calling
library(DO.utils)
for the first time. The basic procedure
would be to execute the following:
library(reticulate) # installs with DO.utils
py_version <- "3.9" # or whatever version you prefer
venv_name <- "r-reticulate"
py_path <- install_python(py_version)
virtualenv_create(venv_name, py_path, version = py_version)
A Note on Virtual Environments
reticulate
can create conda environments or
venv/virtualenv virtual environments. Non-conda virtual environment
options are described briefly below but similar options approaches are
available for conda environments via reticulate
.
r-reticulate
is the default virtual environment name
promoted by the developers of reticulate for ALL packages that
depend on reticulate
, with the goals of 1) promoting
interoperability between packages that depend on reticulate
and python and 2) to better manage python module dependencies, through
the use of a single virtual environment. Alternatives to this default
environment can be specified by modifying the first argument
envname
to virtualenv_create()
. A name
(without slashes) will create a virtual environment of that name in the
default location (see virtualenv_root()
); a relative or
absolute path (with slashes) will create a virtual environment in the
specified directory, which can be nice if you want a python virtual
environment associated with a specific project.
A Note on Python Module Installation
It is possible to specify python modules to install at the time a
virtual environment is created or later by activating the virtual
environment and executing virtualenv_install()
. If only
DO.utils dependencies are desired, this should not be necessary as the
first execution of library(DO.utils)
in an active virtual
environment should install the python module dependencies of
DO.utils
automatically. Should any python module need to be
installed by a user, the modules used by DO.utils
can be
found in the “Config/reticulate” section of it’s
DESCRIPTION
file.