add_level()
can be used to create more complicated patterns of nesting. For example, when creating lower level data, it is possible to use a different N
for each of the values of the higher level data:
variable_data <-
fabricate(
cities = add_level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)),
citizens = add_level(N = c(2, 4), age = runif(N, 18, 70))
)
variable_data
cities | elevation | citizens | age |
---|---|---|---|
1 | 1778 | 1 | 46 |
1 | 1778 | 2 | 50 |
2 | 1499 | 3 | 35 |
2 | 1499 | 4 | 65 |
2 | 1499 | 5 | 34 |
2 | 1499 | 6 | 23 |
Here, each city has a different number of citizens. And the value of N
used to create the age variable automatically updates as needed. The result is a dataset with 6 citizens, 2 in the first city and 4 in the second. As long as N is either a number, or a vector of the same length of the current lowest level of the data, add_level()
will know what to do.
It is also possible to provide a function to N, enabling a random number of citizens per city:
my_data <-
fabricate(
cities = add_level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)),
citizens = add_level(N = sample(1:6, size = 2, replace = TRUE), age = runif(N, 18, 70))
)
my_data
cities | elevation | citizens | age |
---|---|---|---|
1 | 1850 | 1 | 53 |
2 | 1128 | 2 | 55 |
2 | 1128 | 3 | 45 |
2 | 1128 | 4 | 42 |
2 | 1128 | 5 | 47 |
2 | 1128 | 6 | 69 |
Here, each city is given a random number of citizens between 1 and 6. Since the sample()
function returns a vector of length 2, this is like specifying 2 separate N
s as in the example above.
Finally, it is possible to define N
on the basis of higher level variables themselves. Consider the following example:
variable_n <- fabricate(
cities = add_level(N = 5, population = runif(N, 10, 200)),
citizens = add_level(N = round(population * 0.3))
)
cities | population | citizens |
---|---|---|
1 | 90 | 001 |
1 | 90 | 002 |
1 | 90 | 003 |
1 | 90 | 004 |
1 | 90 | 005 |
1 | 90 | 006 |
Here, the city has a defined population, and the number of citizens in our simulated data reflects a sample of 30% of that population. Although we only display the first 6 rows for brevity’s sake, the first city would have 27 rows in total.
Because the functions in fabricatr take data and return data, they are cross-compatible with a tidyverse
workflow. Here is an example of using magrittr’s pipe operator (%>%
) and dplyr’s group_by
and mutate
verbs to add new data.
library(dplyr)
my_data <-
fabricate(
cities = add_level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)),
citizens = add_level(N = c(2, 3), age = runif(N, 18, 70))
) %>%
group_by(cities) %>%
mutate(pop = n())
my_data
cities | elevation | citizens | age | pop |
---|---|---|---|---|
1 | 1112 | 1 | 35 | 2 |
1 | 1112 | 2 | 31 | 2 |
2 | 1633 | 3 | 60 | 3 |
2 | 1633 | 4 | 59 | 3 |
2 | 1633 | 5 | 55 | 3 |
It is also possible to use the pipe operator (%>%
) to direct the flow of data between fabricate()
calls. Remember that every fabricate()
call can import existing data frames, and every call returns a single data frame.
my_data <-
data_frame(Y = sample(1:10, 2)) %>%
fabricate(lower_level = add_level(N = 3, Y2 = Y + rnorm(N)))
my_data
Y | lower_level | Y2 |
---|---|---|
6 | 1 | 6.2 |
6 | 2 | 5.9 |
6 | 3 | 6.1 |
5 | 4 | 5.6 |
5 | 5 | 4.5 |
5 | 6 | 4.3 |