Skip to contents

Counts records in data from the Alliance of Genome Resources. Counts can be ascribed to the species the record is associated with or the Model Organism Database (MOD) that curated it, optionally by object type. count_alliance_records() was primarily designed to count records in the Alliance Disease Associations File. There is no guarantee that any/all options will work for other files.

Usage

count_alliance_records(
  alliance_tbl,
  term_subset = NULL,
  by_type = TRUE,
  pivot = TRUE,
  record_lvl = "disease-object",
  assign_to = c("species", "curator")
)

Arguments

alliance_tbl

a dataframe derived from Alliance data (usually a downloaded .tsv file)

term_subset

character vector of DOIDs to limit counts to

by_type

logical indicating whether to count by Alliance object type (i.e. gene, allele, model)

pivot

logical indicating whether to pivot values to type columns; ignored if by_type = FALSE.

record_lvl

a string indicating the desired specificity of records.

assign_to

how to assign records when counting; one of "species" or "curator" (i.e. the organization responsible for curating the record)

Value

A summary tibble with the count of unique object annotations defined by record_lvl, aggregated according to species/curator (assign_to) and, optionally, object type (by_type).

Details

The type of record information to use in counting should be specified with record_lvl which accepts the following values:

  • "full_record" counts full non-duplicate records

  • "disease-object" counts unique disease-object combinations

  • "disease" counts unique diseases

  • "object" counts unique MOD objects (i.e. gene, allele, model identifiers)

NOTE

For disease-related data, some exact duplicates (reason unknown) and records that differ by seemingly unimportant information (e.g. only the date differs) have existed. These types of duplicates are removed prior to record counts.