You are on page 1of 1

Data & Variable Descriptives and Summaries Recode and Transform Variables Summarise Variables and Cases

Transformation Most of the sjmisc functions (including recode- Recode functions add a suffix to new variables, The summary functions
with sjmisc Cheat Sheet functions) also work on grouped data frames: so original variables are preserved. mostly mimic base R
library(dplyr) By default, only the new created variables are equivalents, but are de-
efc %>% returned. Use append = TRUE to return the signed to work together
group_by(e16sex, c172code) %>% original input data frame as well. with pipes and dplyr.
sjmisc complements dplyr, and helps with data
transformation tasks and recoding variables. frq(e42dep)
rec(x, ..., rec, as.num = TRUE, var.label = row_sums(x, ..., na.rm = TRUE, var =
sjmisc works together "rowsums", append = FALSE)
seamlessly with dplyr Frequency Tables NULL, val.labels = NULL, append =
and pipes. All func- FALSE, suffix = "_r") Row sums of data frames.
tions are designed to row_sums(efc, c82cop1:c90cop9)
frq(x, ..., sort.frq = c("none", "asc", "desc"), Recode values, return result as numeric,
support labelled data. weight.by = NULL) character or categorical (factor).
Print frequency tables of (labelled) vectors. Uses rec(mtcars, carb, rec = "1,2=1; 3,4=2; else=3") row_means(x, ..., n, var = "rowmeans",
Design Philosophy variable labels as table header. append = FALSE)
data(efc); frq(efc, e42dep, c161sex) dicho(x, ..., dich.by = "median", as.num = Row means, for at least n valid (non-NA) values.
The design of sjmisc functions follows the FALSE, var.label = NULL, val.labels = NULL, row_means(efc, c82cop1:c90cop9, n = 7)
tidyverse-approach: first argument is always the Use this data set append = FALSE, suffix = "_d")
data (either a data frame or vector), followed by in examples!
variable names to be processed by the functions. Dichotomise variable by median, mean or row_count(x, ..., count, var = "rowcount",
specific value. append = FALSE)
flat_table(data, ..., margin = c("counts",
The returned object for each function equals the dicho(mtcars, disp) Row-wise count # of values in data frames.
"cell", "row", "col"), digits = 2,
type of the data-argument. Also col_count().
show.values = FALSE)
split_var(x, ..., n, as.num = FALSE, row_count(efc, c82cop1:c90cop9, count = 2)
Vector input Print contingency tables of (labelled) vectors.
• If the data-argument is a vector, functions Uses value labels. val.labels = NULL, var.label = NULL,
return a vector. flat_table(efc, e42dep, c172code, e16sex) inclusive = FALSE, append = FALSE, Other Useful Functions
suffix = "_g")
Split variable into equal sized groups. Unlike add_columns() and replace_columns() to
count_na(x, ...) dplyr::ntile(), does not split original categories combine data frames, but either replace or
rec(mtcars$carb, rec = "1,2=1; 3,4=2; else=3")
Print frequency table of tagged NA values. into different values (see examples in ?split_var). preserve existing columns.
library(haven); x <- labelled(c(1:3, split_var(mtcars, mpg, disp, n = 3) set_na() and replace_na() to convert regular
Data frame input tagged_na("a", "a", "z")), labels = into missing values, or vice versa. replace_na()
• If the data-argument is a data frame, functions c("Refused" = tagged_na("a"), "N/A" = also replaces specific tagged NA values only.
return a data frame. tagged_na("z"))) group_var(x, ..., size = 5, as.num = TRUE,
count_na(x) right.interval = FALSE, n = 30, append = remove_var() and var_rename() to remove
FALSE, suffix = "_gr") variables from data frames, or rename variables.
Split variable into groups with equal value range, group_str() to group similar string values. Useful
Descriptive Summary or into a max. # of groups (value range per group for variables with similar, but not identically
is adjusted to match # of groups).
rec(mtcars, carb, rec = "1,2=1; 3,4=2; else=3") descr(x, ..., max.length = NULL) group_var(mtcars, mpg, disp, size = 5) merge_df() to full join data frames and preserve
Descriptive summary of data frames, including group_var(mtcars, mpg, size = "auto", n = 4) value and variable labels.
variable labels in output. to_long() to gather multiple columns in data
-ellipses Argument descr(efc, contains("cop"), max.length = 20) frames from wide into long format.
std(x, ..., include.fac = TRUE, append =
Apply functions to a single variable, selected
variables or to a complete data frame.
FALSE, suffix = "_z")
Finding Variables in a Data Frame Z-standardise variables. Also center(). Use with %>% and dplyr
Variable selection is powered by select():
Separate variables with comma, or use Use find_var() to search for variables by names, std(efc, e17age, c160age) # use sjmisc-functions in pipes
select-helpers to select variables, e.g. ?rec: value or variable labels. Returns vector/data frame. mtcars %>% select(gear, carb) %>%
rec(rec = "min:3=1; 4:max=2")
# variables with in names and variable labels recode_to(x, ..., lowest = 0, highest = -1,
rec(mtcars, one_of(c("gear", "carb")), find_var(efc, pattern = "cop", out = "df" ) # use sjmisc-function inside mutate
append = FALSE, suffix = "_r0)
rec = "min:3=1; 4:max=2") mtcars %>% select(gear, carb) %>% mutate(
# variables with "level" in names and value labels
rec(mtcars, gear, carb, rec = "min:3=1; 4:max=2") find_var(efc, "level", search = "name_value") recode_to(mtcars$gear)

CC BY Daniel Lüdecke d.luedecke@uke.de github.com/strengejacke Learn more with browseVignettes("sjmisc") sjmisc 2.6.2 10/17

You might also like