BPPARAM has no effect in perf()

Hi,

I’m having issues using parallelisation (with BiocParallel) in some of the mixOmics functions.
It might be a user mistake or a bug, I’m not sure but it looks like the ‘BPPARAM’ argument doesn’t have any effect on running time. At least in the perf() function.

Here is a fully reproducible example:

library(mixOmics) 
library(dplyr)
library(BiocParallel)

## -------------------------------------------------------------------------------------------------------------------
data(breast.TCGA) # load in the data
data = list(miRNA = breast.TCGA$data.train$mirna, # set a list of all the X dataframes
            mRNA = breast.TCGA$data.train$mrna,
            proteomics = breast.TCGA$data.train$protein)

Y = breast.TCGA$data.train$subtype # set the response variable as the Y dataframe

## -------------------------------------------------------------------------------------------------------------------
design = matrix(0.1, ncol = length(data), 
                nrow = length(data), # for square matrix filled with 0.1s
                dimnames = list(names(data), names(data)))
diag(design) = 0 # set diagonal to 0s

basic.diablo.model = block.splsda(X = data, Y = Y, ncomp = 5, design = design) # form basic DIABLO

## -------------------------------------------------------------------------------------------------------------------
# Benchmark
n_rep = 1
res <- list(
  "MulticoreParam(10)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                       folds = 10, nrepeat = 10,
                                       progressBar=FALSE,
                                       BPPARAM=MulticoreParam(workers = 10)), 
                                  times = n_rep), 
  "MulticoreParam(5)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                             folds = 10, nrepeat = 10,
                                             progressBar=FALSE,
                                             BPPARAM=MulticoreParam(workers = 5)), 
                                        times = n_rep),
  "MulticoreParam(2)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=MulticoreParam(workers = 2)), 
                                       times = n_rep),
  "SnowParam(10)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=BiocParallel::SnowParam(workers = 10)), 
                                       times = n_rep),
  "SnowParam(5)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=BiocParallel::SnowParam(workers = 5)), 
                                       times = n_rep),
  "SnowParam(2)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=BiocParallel::SnowParam(workers = 2)), 
                                       times = n_rep),
  "SerialParam(1)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=SerialParam()), 
                                       times = n_rep))
bind_rows(res)

The table below shows the results.

Unit: seconds
                                expr      min       lq     mean   median       uq
BPPARAM = MulticoreParam(workers = 10) 	25.17865 25.17865 25.17865 25.17865 25.17865
BPPARAM = MulticoreParam(workers = 5) 	25.37876 25.37876 25.37876 25.37876 25.37876
BPPARAM = MulticoreParam(workers = 2) 	25.19722 25.19722 25.19722 25.19722 25.19722
BPPARAM = SnowParam(workers = 10)) 		25.45244 25.45244 25.45244 25.45244 25.45244
BPPARAM = SnowParam(workers = 5)) 		25.81489 25.81489 25.81489 25.81489 25.81489
BPPARAM = SnowParam(workers = 2)) 		25.91184 25.91184 25.91184 25.91184 25.91184
BPPARAM = SerialParam()) 				25.55273 25.55273 25.55273 25.55273 25.55273

Regardless of the number of workers (10,5,2 or serial (1)), the running time is always the same. MulticoreParam or SnowParam provide similar results.
This was tested on a Mac (table above) and a linux cluster (results not shown here but they were similar).

The problem doesn’t come from BiocParallel

# Test on a simple function
FUN <- function(x) { round(sqrt(x), 4) }
n_rep = 10
resb <- list(
  "MulticoreParam(10)" = microbenchmark(BiocParallel::bplapply(1:10, FUN,
                                                 BPPARAM=MulticoreParam(workers = 10)), 
                                        times = n_rep), 
  "MulticoreParam(5)" = microbenchmark(BiocParallel::bplapply(1:10, FUN,
                                                BPPARAM=MulticoreParam(workers = 5)), 
                                       times = n_rep),
  "MulticoreParam(2)" = microbenchmark(BiocParallel::bplapply(1:10, FUN,
                                                BPPARAM=MulticoreParam(workers =  2)), 
                                       times = n_rep),
  "SerialParam(1)" = microbenchmark(BiocParallel::bplapply(1:10, FUN,
                                         BPPARAM=SerialParam()), 
                                    times = n_rep))
bind_rows(resb)
Unit: milliseconds
             expr       		 min         lq      mean     median         uq        
MulticoreParam(workers = 10)) 109.917966 112.847457 117.48494 117.635478 121.685909 
 MulticoreParam(workers = 5)) 105.232978 108.726998 111.34625 110.138341 111.077569 
 MulticoreParam(workers = 2)) 184.162119 184.523493 186.24020 185.903594 187.473689 
               SerialParam())   2.200429   2.234254   2.32217   2.266336   2.274926   

→ BiocParallel seems to work as expected with a regular R function.

Would you have an idea why the BPPARAM has no effect in the perf function ?

Thank you!

The problem has been reported on the package’s repository (#292)

I thought I could edit/remove this thread but I was wrong…apologies for the multiple posts

All good @mvhr,

We dont have a maintainer at this stage. Would it be helpful to prefilter your data the way we have discussed? and … by all means feel free to change the code if you can fix it :slight_smile:

Kim-Anh