Purrrification of data: Some examples
A First Example
To generate a collection of 100 normally distributed random number, we use:
rnorm(100)
We can repeat this process indefinitely for n times, and when n=9, we have the following collections:
collection_1 <- rnorm(100)
collection_2 <- rnorm(100)
collection_3 <- rnorm(100)
collection_4 <- rnorm(100)
collection_5 <- rnorm(100)
collection_6 <- rnorm(100)
collection_7 <- rnorm(100)
collection_8 <- rnorm(100)
collection_9 <- rnorm(100)
which we can conveniently regroup into a larger collection of random numbers:
larger_collection <- c(
collection_1, collection_2, collection_3,
collection_4, collection_5, collection_6,
collection_7, collection_8, collection_9
)
Alternatively, we can use the rerun
shorthand to define that the aforementioned collection:
list_of_numbers <- rerun(9, rnorm(100))
Each of the vector x(d)=(x(d)1x(d)2…x(d)100),d=1,2,…,9 is a collection of 100 normally distributed random numbers:
ℓ={x(1),x(2),…,x(9)}={(x(1)1x(1)2⋮x(1)100),(x(2)1x(2)2⋮x(2)100),…,(x(9)1x(9)2⋮x(9)100)} This list can be conveniently repackaged into tidy format: df=distribution,dx′1x(1)11x(1)2⋮⋮1x(1)1002x(2)12x(2)2⋮⋮2x(2)100⋮⋮⋮⋮9x(9)19x(9)2⋮⋮9x(9)100 by using themap_df
function:
df <- list_of_numbers %>%
map_df( ~ tibble(x_ = .x), .id = "distribution")
A Second Example
The following shorthand defines a list of 9 duplets of random numbers. The duplets are packaged in tibble-format, t=(zw)
list_of_duplets <- rerun(9,
tibble(z = rnorm(100), w = rnorm(100))
)
To repackage this list to tidy format, use:
df <- list_of_duplets %>%
map_df(~ tibble(z_ = .x$z, w_ = .x$w), .id = "distribution")
df=distribution,dz′w′1z(1)1w(1)11z(1)2w(1)2⋮⋮⋮1z(1)100w(1)1002z(2)1w(2)12z(2)2w(2)2⋮⋮⋮2z(2)100w(2)100⋮⋮⋮⋮⋮⋮9z(9)1w(9)19z(9)2w(9)2⋮⋮⋮9z(9)100w(9)100
Now that the data is in tidy format, we can visualize both z′ and w′ with ggplot
.
df %>%
pivot_longer(
names_to = "category",
values_to = "value",
cols = z_:w_
) %>%
ggplot(aes(x = distribution, y = value)) +
geom_boxplot() +
stat_summary(
geom = "point", fun = mean,
size = 4, color = "red", alpha = 1/2
) +
facet_wrap(~ category, nrow = 2)
The result is shown below:
Comments