Purrrification of factory time-series
Suppose \((t, \dot{s}_{\ell}(t))\) is the time series of liquid sugar mass flow measurement in Line \(\ell\) of a certain factory. To compute the liquid sugar mass for a given interval \(t \in [t_{\rm start}, t_{\rm stop}]\), we can integrate the time series numerically: $$s(t_{\rm start}, t_{\rm stop}, \ell)=\int_{t_{\rm start}}^{t_{\rm stop}} \dot{s}_{\ell}(t)\,dt$$ For instance, the following line totalizes the liquid sugar flow in Line 1 between 11:00 am to 12:00 pm on June 29.
sugar_mass('2020-6-29 11:00', '2020-6-29 12:00', 'L1_sugar_massflow')
where sugar_mass
is totalizer function which parse the mass flow time series into a numerical value. Suppose you are given the task to compute the total sugar metered by flowmeters in Lines 1, 2, 4, and 6 for that same interval. A quick and dirty way would be to rewrite the original code like this:
start_clock <- '2020-6-29 11:00'
stop_clock <- '2020-6-29 12:00'
s1 <- sugar_mass(start_clock, stop_clock, 'L1_sugar_massflow')
s2 <- sugar_mass(start_clock, stop_clock, 'L2_sugar_massflow')
s4 <- sugar_mass(start_clock, stop_clock, 'L4_sugar_massflow')
s6 <- sugar_mass(start_clock, stop_clock, 'L6_sugar_massflow')
s <- s1 + s2 + s4 + s6
A slightly better approach is to use
start_clock <- '2020-6-29 11:00'
stop_clock <- '2020-6-29 12:00'
m <- all_sugar(start_clock, stop_clock, c(1:2, 4, 6))
where all_sugar
is a reusable function defined as follows:
all_sugar <- function(start_datetime, stop_datetime, ell){
df <- tibble(
line_number = ell,
sugar = pmap_dbl(
list(
start_datetime, stop_datetime,
paste0('L', line_number, '_sugar_massflow')
),
sugar_mass
)
sum(df$sugar)
}
As you will soon realize, functional approach is the cleanest way to tackle this type of problem. For example, to obtain the total liquid sugar mass from May 13 (13:00) to May 15 (8:00), for Lines 1, 5, 9, 10, and 11, one has to only write:
all_sugar('2020-5-13 13:00', '2020-5-15 8:00', c(1, 5, 9:11))
Comments