Choosing keepX & warning messages in DIABLO

Hi all! Been working with and learning DIABLO for the past few months and I think it is a great tool.

I had a couple of questions about tuning the DIABLO model that I wanted to drop in and ask for help with. Before I begin, I want to say that I have a small n (sample size) and I work with a non model organism, so this data can be particularly noisy.

I am using tune.block.splsda() to tune my model. I have three 'omics data layers that I’m working with (2 expression level matrices and 1 ASV count matrix). I first wanted to ask how do you best determine the # of keepX to keep for each data block. Right now, I have

keepX <- list(
  ex_1 = c(5, 10, 15, 20, 25, 30),
  ex_2  = c(5, 10, 15, 20, 25, 30),
  asv_1 = c(5, 10, 15, 20, 25, 30))

With the ex_ being the 2 different expression matrices and asv_1 being the ASV matrix. To me, this feels like a good range of #s, as DIABLO is a biomarker tool at it’s core, and I really want it to hone in on biomarkers (in my field, for a disease we havbe no causitive agent for). However, I see others in this forum using up to 70-100. What is your opinion on this and what would you suggest if you were working with this type of data?

My other question has to do with the SGCCA convergence. After running a global performance model, I found that I should be using:

  1. max.dist
  2. BER
  3. and one component

So my tuning code looks like this:

# Tune two latent components (ncomp = 1) using 4-fold CV (smallest class = 4)
tune.res1 <- tune.block.splsda(
  X = X,                        # list of omics datasets 
  Y = Y,                        # response factor (healthy vs diseased samples)
  ncomp = 1,                     # maximum number of latent components to test (here: 1)
  test.keepX = keepX,            # list of possible keepX values to try for each block
  design = design,               # design matrix defining relationships between blocks
  validation = 'Mfold',          # type of cross-validation for tuning (M-fold CV)
  folds = 4,                     # number of folds in the M-fold 
  nrepeat = 100,                  # number of times to repeat cross-validation for stability
  dist = 'max.dist',             # distance metric used for classification
  measure = "BER",           # performance measure to optimize (here: balanced error rate)
  near.zero.var	= TRUE,          # deal with zero variance in microbiome layers
  progressBar = TRUE             # show a progress bar while tuning 
)

with the design set to 0.8 between blocks. When I run this, I run into this warning numerous times:

Warning: The SGCCA algorithm did not converge

My question is, how serious is this during the tuning stage? I thought I read somewhere when first starting out that, during tuning, its not a huge deal as the model is trying to find the optimal keepX anyways, so it’s bound to run into issues. I mostly want to make sure that wasn’t false information. From what I am seeing in this forum however, it may be an issue.

Any help you can offer me is greatly appreciated!