Fuzzy (Approximate) String Matching
match_fz.RdWraps stringdist::amatch() to perform "fuzzy" (approximate) string
matching while providing more informative output. Instead of an integer
vector of best match positions, this function returns a tibble with the
input, its corresponding best match, and the approximate string distance.
Arguments
- x
 elements to be approximately matched: will be coerced to
characterunless it is a list consisting ofintegervectors.- table
 lookup table for matching. Will be coerced to
characterunless it is a list consting ofintegervectors.- method
 Matching algorithm to use. See
stringdist-metrics.- maxDist
 Elements in
xwill not be matched with elements oftableif their distance is larger thanmaxDist. Note that the maximum distance between strings depends on the method: it should always be specified.- ...
 arguments passed on to
stringdist::amatch()
Value
A tibble with 3 columns:
xtable_match: the closest match ofxdist: the distance between x and its closest match (given the method selected
NOTES
Fuzzy string matching is SLOW. Expect this function to take >1 min for comparisons of more than 500 values for all methods.
For comparison of citation titles specifically, the "lcs" method is faster
than "osa" and seems to work better. Based on light experimentation, a good
setting for maxDist value for citation titles is between 80-115.