Choosing keepX & warning messages in DIABLO

rhoward · April 30, 2026, 7:05pm

Hi all! Been working with and learning DIABLO for the past few months and I think it is a great tool.

I had a couple of questions about tuning the DIABLO model that I wanted to drop in and ask for help with. Before I begin, I want to say that I have a small n (sample size) and I work with a non model organism, so this data can be particularly noisy.

I am using tune.block.splsda() to tune my model. I have three 'omics data layers that I’m working with (2 expression level matrices and 1 ASV count matrix). I first wanted to ask how do you best determine the # of keepX to keep for each data block. Right now, I have

keepX <- list(
  ex_1 = c(5, 10, 15, 20, 25, 30),
  ex_2  = c(5, 10, 15, 20, 25, 30),
  asv_1 = c(5, 10, 15, 20, 25, 30))

With the ex_ being the 2 different expression matrices and asv_1 being the ASV matrix. To me, this feels like a good range of #s, as DIABLO is a biomarker tool at it’s core, and I really want it to hone in on biomarkers (in my field, for a disease we havbe no causitive agent for). However, I see others in this forum using up to 70-100. What is your opinion on this and what would you suggest if you were working with this type of data?

My other question has to do with the SGCCA convergence. After running a global performance model, I found that I should be using:

max.dist
BER
and one component

So my tuning code looks like this:

# Tune two latent components (ncomp = 1) using 4-fold CV (smallest class = 4)
tune.res1 <- tune.block.splsda(
  X = X,                        # list of omics datasets 
  Y = Y,                        # response factor (healthy vs diseased samples)
  ncomp = 1,                     # maximum number of latent components to test (here: 1)
  test.keepX = keepX,            # list of possible keepX values to try for each block
  design = design,               # design matrix defining relationships between blocks
  validation = 'Mfold',          # type of cross-validation for tuning (M-fold CV)
  folds = 4,                     # number of folds in the M-fold 
  nrepeat = 100,                  # number of times to repeat cross-validation for stability
  dist = 'max.dist',             # distance metric used for classification
  measure = "BER",           # performance measure to optimize (here: balanced error rate)
  near.zero.var	= TRUE,          # deal with zero variance in microbiome layers
  progressBar = TRUE             # show a progress bar while tuning 
)

with the design set to 0.8 between blocks. When I run this, I run into this warning numerous times:

Warning: The SGCCA algorithm did not converge

My question is, how serious is this during the tuning stage? I thought I read somewhere when first starting out that, during tuning, its not a huge deal as the model is trying to find the optimal keepX anyways, so it’s bound to run into issues. I mostly want to make sure that wasn’t false information. From what I am seeing in this forum however, it may be an issue.

Any help you can offer me is greatly appreciated!

Topic		Replies	Views
Warning: The SGCCA algorithm did not converge Analysis	1	104	May 30, 2025
The SGCCA algorithm did not converge and length of variable selection Analysis	3	2766	February 1, 2024
Reducing number of features DIABLO Support	3	283	March 2, 2023
Tune.block.splsda() error Analysis	4	952	May 26, 2022
DIABLO - Questions about TCGA case study Support	5	645	August 2, 2021

Choosing keepX & warning messages in DIABLO

Related topics