Solution for Is pre-allocation helpful for parallelizing with foreach in R?
is Given Below:
When working with for-loops, I have encountered very frequently the advice that one should pre-allocate an object and “fill it in”, rather than have the object “grow” within the for-loop. Does this rule of thumb apply to the use of foreach as well?
To give an absurdly simple example, let’s say I want to sample a vector of length 3 in each of my Monte Carlo draws and store each vector in each row of my output matrix. (In my actual code, I do a series of operations with each draw, count the number of times that the result appears in another matrix, and then save that number in my final output. But I think that is not relevant to my question.)
fn <- function(sample_size, J){
# Preallocate output matrix
output <- matrix(NA, nrow = sample_size, ncol = J)
foreach (i = 1:sample_size, .combine="rbind") %dopar% {
output[i, ] <- runif(J)
return(output)
}
}
# Execute function in parallel
system.cl = makeCluster(4)
registerDoParallel(system.cl)
fn(sample_size=100, J=3)
stopCluster(system.cl)
stopImplicitCluster()
Is the pre-allocation helping foreach in this case? My concern is that the output matrix is once created as a large matrix and then overwritten (as opposed to filled) by foreach, hence only wasting time and memory.