Count Delimited Columns
count_delim.RdCounts columns when values within them are delimited.
Usage
count_delim(
data,
...,
delim = "|",
sort = FALSE,
name = NULL,
trim = TRUE,
convert = FALSE
)Arguments
- data
A data.frame.
- ...
<
data-masking> Variables to first lengthen, then count by.- delim
A delimiter to split elements within specified columns by (default: "|").
- sort
If
TRUE, will show the largest groups at the top.- name
The name of the new column in the output.
If omitted, it will default to
n. If there's already a column calledn, it will usenn. If there's a column callednandnn, it'll usennn, and so on, addingns until it gets a new name.- trim
Whether to trim start/end whitespace, as a boolean (default:
TRUE).- convert
Whether to run
utils::type.convert()withas.is = TRUEon new columns. This is useful if the de-concatenated columns are integer, numeric or logical. NOTE: "NA" strings will always be converted toNAs.
Examples
.df <- tibble::tibble(
x = 1:3,
y = c("1|2", "1|3", "2"),
z = c("1", "2|3", "1|3")
)
# counts undelimited columns like dplyr::count()
count_delim(.df, x)
#> # A tibble: 3 × 2
#> x n
#> <chr> <int>
#> 1 1 1
#> 2 2 1
#> 3 3 1
# counts all delimited values
count_delim(.df, y)
#> # A tibble: 3 × 2
#> y n
#> <chr> <int>
#> 1 1 2
#> 2 2 2
#> 3 3 1
# works for multiple columns that use the same delimiter
count_delim(.df, y, z)
#> # A tibble: 7 × 3
#> y z n
#> <chr> <chr> <int>
#> 1 1 1 1
#> 2 1 2 1
#> 3 1 3 1
#> 4 2 1 2
#> 5 2 3 1
#> 6 3 2 1
#> 7 3 3 1
# but not those that use different delimiters
.df2 <- dplyr::mutate(.df, z = stringr::str_replace(z, "\\|", "%"))
.df2
#> # A tibble: 3 × 3
#> x y z
#> <int> <chr> <chr>
#> 1 1 1|2 1
#> 2 2 1|3 2%3
#> 3 3 2 1%3
count_delim(.df2, y, z, delim = "%")
#> # A tibble: 5 × 3
#> y z n
#> <chr> <chr> <int>
#> 1 1|2 1 1
#> 2 1|3 2 1
#> 3 1|3 3 1
#> 4 2 1 1
#> 5 2 3 1