Skip to contents

DO.utils is an R package primarily designed to support the operations of the Human Disease Ontology (DO; but with a number of capabilities that will be useful to the broader scientific community for:

  1. Assessing Resource Use & Impact (Bibliometrics/Scientometrics)

    J. Allen Baron, Lynn M Schriml, Assessing resource use: a case study with the Human Disease Ontology, Database, Volume 2023, 2023, baad007. PMID:36856688,

  2. Simplifying common R tasks (see General Utilities section).

Operations specific to the use, analysis, maintenance, and improvement of the ontology itself are described briefly in the DO Improvement & Analysis section.

DO.utils is work in progress. If you are interested in contributing, please reach out. Note that our goal is to work collaboratively to make functions as broadly useful as possible.


Installing Prerequisites

To use DO.utils you must first install R from CRAN. Installing RStudio can also be useful but is not required. The devtools package is also required and can be obtain by executing install.packages("devtools") within R.

Installing DO.utils

DO.utils can be installed from Github or from a persistent, open-access repository hosted by Zenodo.

To install from Github, run devtools::install_github("DiseaseOntology/DO.utils") within R.

To install from Zenodo, first download DO.utils (DOI: 10.5281/zenodo.7467668) to your local machine. Then, within R run devtools::install_git(<local_path_to_DO.utils>), replacing <local_path_to_DO.utils> with the local path to DO.utils.

Assessing Resource Use & Impact (Bibliometrics/Scientometrics)

DO.utils includes functions to assist in both assessing how a resource is used and in measuring the impact of that use. Most of these functions may be broadly useful to anyone trying to accomplish these tasks, while a much smaller number are specific to measuring the DO’s impact.

Components that will be broadly useful to any resource can:

  1. Identify scientific publications that use a resource from:
    1. Citations of one or more article(s) published by the resource (“cited by”; citedby_pubmed() and citedby_scopus()).
    2. PubMed or PubMed Central (PMC) search results (search_pubmed() and search_pmc()).
    3. A MyNCBI collection (read_pubmed_txt()).
  2. Identify matching publication records in different record sets (must be formatted data.frames; see match_citations()).

To those interested in Bioconductor package download statistics,get_bioc_pkg_stats() may be useful, while other measures of impact are designed specifically with the DO in mind (e.g. count_alliance_records()).

DO Improvement & Analysis

DO.utils provides the following capabilities used for improvement and analysis:

  1. Git repo management, iterative execution across git repository tags, and SPARQL queries implemented with wrappers (DOrepo(), owl_xml()) around the related pyDOID python package.
  2. Automation of updates, including:
  3. Definition source URL validation.
  4. Prediction of mappings/cross-references between other resources & DO, via PyOBO/GILDA or approximate string matching.
  5. Simplified system installation of the OBO tool ROBOT.

General Utilities

DO.utils includes general utilities to make programming in R easier including, for example, those that assist with: