Suggest a Regular Expression That Will Match All Input

Collects the full set of characters found at each position across all strings in x and returns it as a quasi-regular expression. Letter and numbers will not be condensed to ranges in output, even if the full sets are present at a position.

Usage

suggest_regex(x, pivot = "wide")

Arguments

x: A character vector.
pivot: Whether the resulting tibble should be in "wide" (default) or "long" format.

Value

When pivot = "long", a tidy tibble with 3 columns and as many rows as the string length of the longest input:

position: indicating the position of the character set in the input.
regex: giving the character set (in brackets),
n: the count of input strings that have a character at that position.

When pivot = "wide" (default), a tibble with the same information organized into rows (1 header and 2 normal rows) corresponding to the 3 columns described.

Examples

x <- c("DNA", "MHC", "TAP1", "TAP2", "520", "ACD")

suggest_regex(x)
#> # A tibble: 2 × 5
#>   position `1`     `2`     `3`     `4`  
#>   <chr>    <chr>   <chr>   <chr>   <chr>
#> 1 regex    [5ADMT] [2ACHN] [0ACDP] [12] 
#> 2 n        4       4       4       2    
suggest_regex(x, "long")
#> # A tibble: 4 × 3
#>   position regex       n
#>      <int> <chr>   <int>
#> 1        1 [5ADMT]     4
#> 2        2 [2ACHN]     4
#> 3        3 [0ACDP]     4
#> 4        4 [12]        2