Read a backslash-coded lexicon file into a tibble
read_lexicon(file, col_names = c("line", "data"), remove = FALSE, ...)
file | either a path to a file, a connection, or literal data |
---|---|
col_names | names for line number and data columns (defaults to "line" and "data") |
remove | Whether tidyr::extract should remove the "data" column |
... | Further arguments passed to tidyr::extract |
# Demo: Two literal backslash-coded lexemes read_lexicon("\\lx bonjour\n\\de hello\n\n\\lx au revoir\n\\de goodbye")#> # A tibble: 5 x 2 #> line data #> <int> <chr> #> 1 1 "\\lx bonjour" #> 2 2 "\\de hello" #> 3 3 "" #> 4 4 "\\lx au revoir" #> 5 5 "\\de goodbye"# Demo: Extract backslash code and line value from data read_lexicon("\\lx bonjour\n\\de hello\n\n\\lx au revoir\n\\de goodbye", regex = "\\\\([a-z]+)\\s(.*)", into = c("code", "value"))#> # A tibble: 5 x 4 #> line data code value #> <int> <chr> <chr> <chr> #> 1 1 "\\lx bonjour" lx bonjour #> 2 2 "\\de hello" de hello #> 3 3 "" NA NA #> 4 4 "\\lx au revoir" lx au revoir #> 5 5 "\\de goodbye" de goodbye# More typical usage (where file path to a lexicon is known): lexicon_file <- system.file("extdata", "mini-french.txt", package = "tidylex") read_lexicon(file = lexicon_file, regex = "\\\\([a-z]+)\\s(.*)", into = c("code", "value"))#> # A tibble: 5 x 4 #> line data code value #> <int> <chr> <chr> <chr> #> 1 1 "\\lx bonjour" lx bonjour #> 2 2 "\\de hello" de hello #> 3 3 "" NA NA #> 4 4 "\\lx au revoir" lx au revoir #> 5 5 "\\de goodbye" de goodbye