Advanced features

Aaron Rudkin

More complicated level creation with variable numbers of observations

add_level() can be used to create more complicated patterns of nesting. For example, when creating lower level data, it is possible to use a different N for each of the values of the higher level data:

variable_data <-
  fabricate(
    cities = add_level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)),
    citizens = add_level(N = c(2, 4), age = runif(N, 18, 70))
  )
variable_data
cities elevation citizens age
1 1778 1 46
1 1778 2 50
2 1499 3 35
2 1499 4 65
2 1499 5 34
2 1499 6 23

Here, each city has a different number of citizens. And the value of N used to create the age variable automatically updates as needed. The result is a dataset with 6 citizens, 2 in the first city and 4 in the second. As long as N is either a number, or a vector of the same length of the current lowest level of the data, add_level() will know what to do.

It is also possible to provide a function to N, enabling a random number of citizens per city:

my_data <-
  fabricate(
    cities = add_level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)),
    citizens = add_level(N = sample(1:6, size = 2, replace = TRUE), age = runif(N, 18, 70))
  )
my_data
cities elevation citizens age
1 1850 1 53
2 1128 2 55
2 1128 3 45
2 1128 4 42
2 1128 5 47
2 1128 6 69

Here, each city is given a random number of citizens between 1 and 6. Since the sample() function returns a vector of length 2, this is like specifying 2 separate Ns as in the example above.

Finally, it is possible to define N on the basis of higher level variables themselves. Consider the following example:

variable_n <- fabricate(
  cities = add_level(N = 5, population = runif(N, 10, 200)),
  citizens = add_level(N = round(population * 0.3))
)
cities population citizens
1 90 001
1 90 002
1 90 003
1 90 004
1 90 005
1 90 006

Here, the city has a defined population, and the number of citizens in our simulated data reflects a sample of 30% of that population. Although we only display the first 6 rows for brevity’s sake, the first city would have 27 rows in total.

Tidyverse integration

Because the functions in fabricatr take data and return data, they are cross-compatible with a tidyverse workflow. Here is an example of using magrittr’s pipe operator (%>%) and dplyr’s group_by and mutate verbs to add new data.

library(dplyr)

my_data <-
  fabricate(
    cities = add_level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)),
    citizens = add_level(N = c(2, 3), age = runif(N, 18, 70))
  ) %>%
  group_by(cities) %>%
  mutate(pop = n())

my_data
cities elevation citizens age pop
1 1112 1 35 2
1 1112 2 31 2
2 1633 3 60 3
2 1633 4 59 3
2 1633 5 55 3

It is also possible to use the pipe operator (%>%) to direct the flow of data between fabricate() calls. Remember that every fabricate() call can import existing data frames, and every call returns a single data frame.

my_data <-
  data_frame(Y = sample(1:10, 2)) %>%
  fabricate(lower_level = add_level(N = 3, Y2 = Y + rnorm(N)))
my_data
Y lower_level Y2
6 1 6.2
6 2 5.9
6 3 6.1
5 4 5.6
5 5 4.5
5 6 4.3