Maintainer Guide
maintainer-guide.RmdThis article documents workflows for maintaining and rebuilding package datasets. It is intended for the package maintainer and contributors, not general users.
disease_eponyms — Eponym capitalization vector
disease_eponyms is a named character vector mapping
lowercase words to their correctly capitalized forms
(e.g. c("waardenburg" = "Waardenburg")). It is used as the
default eponyms argument to
parse_omim_name().
Source files
| File | Purpose |
|---|---|
data-raw/omim_eponyms.R |
Mines OMIM for capitalization candidates; updates
disease_eponyms_curated.tsv
|
data-raw/disease_eponyms_curated.tsv |
Shared hand-curated TSV; single authoritative source for all candidates regardless of provenance |
data-raw/build_disease_eponyms.R |
Reads the curated TSV and saves the .rda dataset |
Curated TSV columns
| Column | Description |
|---|---|
word_lower |
Lowercase word (primary key; unique across all sources) |
word_cap |
Correctly capitalized form; NA for contested words
until manually resolved |
alt_caps |
Competing capitalization forms with counts,
e.g. "MacLeod (48); Macleod (2)"
|
examples |
Up to 3 source names where the word appeared |
status |
"cap" (capitalize), "lower" (leave
lowercase), or "pending" (awaiting review) |
source |
Provenance of the candidate (e.g. "OMIM"); used to
scope refreshes per source |
notes |
Free-text annotation (optional) |
Workflow
-
Mine candidates — run a source-specific script to update
disease_eponyms_curated.tsv:source("data-raw/omim_eponyms.R") # requires OMIM API key; see ?download_omimNew candidates are appended with
status = "pending". Existing OMIM rows havealt_capsandexamplesrefreshed from the latest download. -
Review pending rows — open
data-raw/disease_eponyms_curated.tsvand for each"pending"row setstatusto:-
"cap"— this word should be capitalized in output -
"lower"— this word should stay lowercase
For contested words (
alt_capsis non-empty), verifyword_capbefore approving. Roman numerals and alphanumeric suffixes (e.g.IIb,2a) are handled automatically byfix_disease_caps()and should be marked"lower". -
-
Rebuild the dataset:
source("data-raw/build_disease_eponyms.R")
Adding a new source
Create a new mining script (e.g. data-raw/do_eponyms.R)
that:
- Generates a candidate data frame with columns
word_lower,word_cap,alt_caps,examples. - Reads
disease_eponyms_curated.tsv, appends rows for words not already present (withsource = "<SOURCE>"andstatus = "pending"), and writes the file back. - Does not touch rows belonging to other sources.
After curation, run data-raw/build_disease_eponyms.R as
above.
disease_cap_patterns — Phrase-level capitalization
pattern vector
disease_cap_patterns is a named character vector of
regex substitutions applied after disease_eponyms
by parse_omim_name(). Use it for words whose correct
capitalization depends on context (e.g. SHORT as an acronym
in SHORT syndrome vs short as an adjective
elsewhere).
Names are case-insensitive regex patterns; values are replacement
strings. Longer patterns take priority over shorter ones and override
conflicting disease_eponyms entries.