I am analyzing three datasets together and I would like to tune the number of variables I need to include in the block splsda model. However, when I try to tune with tune.block.splsda() I get this error:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘bplapply’ for signature ‘"integer", "numeric"’
This is what I am trying to run:
test.keepX <- list(
cbc = seq(5, 13, 1),
luminex = seq(5, 13, 1),
mlpa = seq(5, 13, 1)
) # e.g. the min dim of Y
tune.nfeatures <- tune.block.splsda(X, Y, ncomp = ncomp,
test.keepX = test.keepX, design = design,
validation = 'Mfold', folds = 10, nrepeat = 1,
BPPARAM = 4, dist = "centroids.dist")
I am just learning how to use DIablo and I have no idea what the problem can be.
If you need more information, please let me know.
Thank you in advance!!!
The problem here is
BPPARAM = 4. I’d suggest reading up a bit on BiocParallel.
BPPARAM parameter is not equivalent to the depreciated
cpus parameter. It needs to be passed a
BiocParallel object and not an integer. Hence, what you’ll need is
BPPARAM = MulticoreParam(workers = 4) (if you’re on Unix/Mac) or
BPPARAM = SnowParam(workers = 4) (if on Windows). Ensure that you load the package first via
This should resolve your issue. All the best
You are absolutely right! Thank you very much.
I started using the cpus parameter before (I am following the Case Study of DIABLO with Breast TCGA) and I received a warning (or error, do not remember now) to replace it with BPPARAM. I thought it was a simple replacement case. Once I applied your solution, it worked!
One more thing, in the first successful execution I got a message:
The SGCCA algorithm did not converge
But then I executed again and no warning was returned. What just happened?
I must make a disclaimer that I do not read about this SGCCA algorithm yet, so forgive me if it is a silly question.
Are you following the Case Study on mixOmics.org or elsewhere?
Very understandable mistake to make mixing up
BPPARAM. I’ll add an extra check which notifies users that
cpus is depreciated so it’s a little more clear.
To address the warning; the DIABLO framework relies on sGCCA (Sparse Generalised Canonical Correlation Analysis). This algorithm will attempt to find the optimal model and does this iteratively. Sometimes, if the training set contains an unfavourable set of samples, sGCCA will not find the optimum in the given number of iterations. In
nrepeat iteration uses randomly selected
folds, the split train/test data - hence it may cause this “nonconvergence”. If you see this, you can:
- Increase the
max.iter parameter (defaults to 100). This will give the algorithm “more attempts” to converge on a more desirable model
- Run it again as it may converge on another attempt due to random training/testing sets
set.seed(). This will result in the exact same train/test sets as it controls the selection process.
Also, a key point to note is you’ll definitely want to increase your
nrepeat. Using only one repeat for tuning is not a reliable way to optimise your model. 5 or 10 is appropriate when just playing around but for generating more concrete conclusions,
nrepeat = 100 is recommended.
Hope this all helped!
Both on mixomics.org and mixOmics vignette but the tuning part was on mixomics.org. I also got the book! Although I intend to read, for now, I needed a more quick way to start.
Thank you so much for the patience to explain. Indeed help me a lot.