Read a backslash-coded lexicon file into a tibble

read_lexicon(file, col_names = c("line", "data"), remove = FALSE, ...)

Arguments

file

either a path to a file, a connection, or literal data

col_names

names for line number and data columns (defaults to "line" and "data")

remove

Whether tidyr::extract should remove the "data" column

...

Further arguments passed to tidyr::extract

Examples

# Demo: Two literal backslash-coded lexemes read_lexicon("\\lx bonjour\n\\de hello\n\n\\lx au revoir\n\\de goodbye")
#> # A tibble: 5 x 2 #> line data #> <int> <chr> #> 1 1 "\\lx bonjour" #> 2 2 "\\de hello" #> 3 3 "" #> 4 4 "\\lx au revoir" #> 5 5 "\\de goodbye"
# Demo: Extract backslash code and line value from data read_lexicon("\\lx bonjour\n\\de hello\n\n\\lx au revoir\n\\de goodbye", regex = "\\\\([a-z]+)\\s(.*)", into = c("code", "value"))
#> # A tibble: 5 x 4 #> line data code value #> <int> <chr> <chr> <chr> #> 1 1 "\\lx bonjour" lx bonjour #> 2 2 "\\de hello" de hello #> 3 3 "" <NA> <NA> #> 4 4 "\\lx au revoir" lx au revoir #> 5 5 "\\de goodbye" de goodbye
# More typical usage (where file path to a lexicon is known): lexicon_file <- system.file("extdata", "mini-french.txt", package = "tidylex") read_lexicon(file = lexicon_file, regex = "\\\\([a-z]+)\\s(.*)", into = c("code", "value"))
#> # A tibble: 5 x 4 #> line data code value #> <int> <chr> <chr> <chr> #> 1 1 "\\lx bonjour" lx bonjour #> 2 2 "\\de hello" de hello #> 3 3 "" <NA> <NA> #> 4 4 "\\lx au revoir" lx au revoir #> 5 5 "\\de goodbye" de goodbye