Running tune.block.splsda across multiple computers

Hello,
Thank you so much for this amazing package – it’s been incredibly helpful!

I need to run Diablo on a large dataset and have access to multiple computers. I was wondering if it’s possible to run tune.block.splsda on two or more computers with the exact same data and configuration (to increase the number of cores used for parallelization) and later combine the outputs and interpret the results as if I had just run a higher number of replicates in a single command (but reducing overall runtime).

Thank you very much in advance!

Hi @Hector,

I would advise against running tune() across multiple different computers as that might end up being quite a lot of work to set up and combine, and may also end up introducing errors into your analysis.

As you have a DIABLO model, you want to tune both the number of components (ncomp) and number of variables to keep (test.keepX). If you are worried about this tuning taking too long because of the size of your dataset, there are a few things you can do to speed up the process:

  1. As you already mentioned, you can run tune.block.splsda() in parallel on multiple cores by setting the BPPARAM argument

  2. Think about how you choose your number of variables to test, remember the more test.keepX values you pass the longer the computation time. We recommend starting with a coarse grid first and then refining the grid e.g. first test c(10, 100, 1000) and then if you find performance is best around 10 you can run with c(5, 10, 15). This page has more details (see 'How to select test.keepX and test.keepY section).

  1. You can split up the tuning for components and variables: first run the function setting test.keepX = NULL to identify the optimal ncomp, then run again but set test.keepX to your grid of variables to test and ncomp to the chosen number of components. This avoids tuning for variables over an unnecessarily large number of components. Note that this functionality is currently only possible with the development version of mixOmics, see this page.

Hope that helps!
Cheers,
Eva

Hi @evahamrud
Okay, I understand. Thank you for the alternative, I will try it that way.

Thank you!
Hector