Purrrification of data: Some examples

A First Example

To generate a collection of 100 normally distributed random number, we use:


rnorm(100)

We can repeat this process indefinitely for n times, and when n=9, we have the following collections:

collection_1 <- rnorm(100)
collection_2 <- rnorm(100)
collection_3 <- rnorm(100)
collection_4 <- rnorm(100)
collection_5 <- rnorm(100)
collection_6 <- rnorm(100)
collection_7 <- rnorm(100)
collection_8 <- rnorm(100)
collection_9 <- rnorm(100)

which we can conveniently regroup into a larger collection of random numbers:

larger_collection <- c(
    collection_1, collection_2, collection_3,
    collection_4, collection_5, collection_6,
    collection_7, collection_8, collection_9
)
Alternatively, we can use the rerun shorthand to define that the aforementioned collection:


list_of_numbers <- rerun(9, rnorm(100))

Each of the vector x(d)=(x(d)1x(d)2x(d)100),d=1,2,,9 is a collection of 100 normally distributed random numbers:

={x(1),x(2),,x(9)}={(x(1)1x(1)2x(1)100),(x(2)1x(2)2x(2)100),,(x(9)1x(9)2x(9)100)} This list can be conveniently repackaged into tidy format: df=distribution,dx1x(1)11x(1)21x(1)1002x(2)12x(2)22x(2)1009x(9)19x(9)29x(9)100 by using the map_df function:

df <- list_of_numbers %>%
    map_df( ~ tibble(x_ = .x), .id = "distribution")

A Second Example

The following shorthand defines a list of 9 duplets of random numbers. The duplets are packaged in tibble-format, t=(zw)


list_of_duplets <- rerun(9, 
    tibble(z = rnorm(100), w = rnorm(100))
)

To repackage this list to tidy format, use:


df <- list_of_duplets %>%
    map_df(~ tibble(z_ = .x$z, w_ = .x$w), .id = "distribution")

df=distribution,dzw1z(1)1w(1)11z(1)2w(1)21z(1)100w(1)1002z(2)1w(2)12z(2)2w(2)22z(2)100w(2)1009z(9)1w(9)19z(9)2w(9)29z(9)100w(9)100

Now that the data is in tidy format, we can visualize both z and w with ggplot.


df %>%
    pivot_longer(
        names_to = "category", 
        values_to = "value", 
        cols = z_:w_
    ) %>% 
    ggplot(aes(x = distribution, y = value)) + 
    geom_boxplot() + 
    stat_summary(
        geom = "point", fun = mean, 
        size = 4, color = "red", alpha = 1/2
    ) +
    facet_wrap(~ category, nrow = 2)

The result is shown below:

Comments

Popular posts from this blog

Urusan Seri Paduka Baginda和金牌急腳遞

「日上三竿」到底是早上多少點?

《心經》裡面的「般若波羅蜜」一詞

The children of Yap Ah Loy sued their mum in court (1898 - 1904)

The Sang Kancil Story of Malacca