Tune.block.splsda() error

Hi,

I am analyzing three datasets together and I would like to tune the number of variables I need to include in the block splsda model. However, when I try to tune with tune.block.splsda() I get this error:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘bplapply’ for signature ‘"integer", "numeric"’

This is what I am trying to run:

test.keepX <- list(
  cbc = seq(5, 13, 1),
  luminex = seq(5, 13, 1),
  mlpa = seq(5, 13, 1)
) # e.g. the min dim of Y

tune.nfeatures <- tune.block.splsda(X, Y, ncomp = ncomp,
                              test.keepX = test.keepX, design = design,
                              validation = 'Mfold', folds = 10, nrepeat = 1,
                              BPPARAM = 4, dist = "centroids.dist")

I am just learning how to use DIablo and I have no idea what the problem can be.
If you need more information, please let me know.
Any thoughts?

Thank you in advance!!!
Ana

Morning @annaol,

The problem here is BPPARAM = 4. I’d suggest reading up a bit on BiocParallel.

The BPPARAM parameter is not equivalent to the depreciated cpus parameter. It needs to be passed a BiocParallel object and not an integer. Hence, what you’ll need is BPPARAM = MulticoreParam(workers = 4) (if you’re on Unix/Mac) or BPPARAM = SnowParam(workers = 4) (if on Windows). Ensure that you load the package first via library(BiocParallel).

This should resolve your issue. All the best

Cheers,
Max.

You are absolutely right! Thank you very much.

I started using the cpus parameter before (I am following the Case Study of DIABLO with Breast TCGA) and I received a warning (or error, do not remember now) to replace it with BPPARAM. I thought it was a simple replacement case. Once I applied your solution, it worked!

One more thing, in the first successful execution I got a message:

Warning message:
The SGCCA algorithm did not converge

But then I executed again and no warning was returned. What just happened?
I must make a disclaimer that I do not read about this SGCCA algorithm yet, so forgive me if it is a silly question.

Again, tyvw!

Best,
Ana

Are you following the Case Study on mixOmics.org or elsewhere?

Very understandable mistake to make mixing up cpus and BPPARAM. I’ll add an extra check which notifies users that cpus is depreciated so it’s a little more clear.

To address the warning; the DIABLO framework relies on sGCCA (Sparse Generalised Canonical Correlation Analysis). This algorithm will attempt to find the optimal model and does this iteratively. Sometimes, if the training set contains an unfavourable set of samples, sGCCA will not find the optimum in the given number of iterations. In tune.block.splsda(), every nrepeat iteration uses randomly selected folds, the split train/test data - hence it may cause this “nonconvergence”. If you see this, you can:

  1. Increase the max.iter parameter (defaults to 100). This will give the algorithm “more attempts” to converge on a more desirable model
  2. Run it again as it may converge on another attempt due to random training/testing sets
  3. Use set.seed(). This will result in the exact same train/test sets as it controls the selection process.

Also, a key point to note is you’ll definitely want to increase your nrepeat. Using only one repeat for tuning is not a reliable way to optimise your model. 5 or 10 is appropriate when just playing around but for generating more concrete conclusions, nrepeat = 100 is recommended.

Hope this all helped!

Cheers,
Max.

Both on mixomics.org and mixOmics vignette but the tuning part was on mixomics.org. I also got the book! Although I intend to read, for now, I needed a more quick way to start.

Thank you so much for the patience to explain. Indeed help me a lot.

Best,
Ana

1 Like