The SGCCA algorithm did not converge and length of variable selection

Hi everyone,
I have a quick question about DIABLO. When running
tune.BBMncomp2 = tune.block.splsda(X = X, Y = Y, ncomp = 3,test.keepX = test.keepX, design = design, validation = ‘loo’,dist = “mahalanobis.dist”,BPPARAM=BPPARAM)
I keep getting the warning:
The SGCCA algorithm did not converge
When I plot the error rate by:
MyResult.diablo2 <- block.splsda(X, Y, ncomp=6, keepX=list.keepX, design= design)
perf.diablo = perf(MyResult.diablo2, validation = ‘loo’, BPPARAM=BPPARAM)
plot(perf.diablo, col = color.mixo(5:7), sd = TRUE, legend.position = “horizontal”)
the error rate at ncomp = 1 is 0 and at ncomp=2 it is 0.05

  1. So according to the error rates, the model is pretty good, right? But why do I get this warning? Can I trust the model?
    2). How do I deal with ncomp1 being the best ncomp? For plotIndiv I need to plot ncomp=2. That is just a simple visualisation and that is fine. For plotDIABLO I can chose comp=1. Great. But how about cim and circosPlot? I have not been able to plot just one comp. Also, the number of X to keep is pretty low. For the 3 datasets I include, it boils down to 10 variables in comp1 and an additional 5 in comp2. Is it advisable at all to try to plot circosPlot or networks with only 10 variables of comp1 or should I keep the other 5 from comp2 as well although the error rate increases (but is still overall pretty low)?

Thank you so much for your input. I really like this tool a lot!
Cheers,
Stef

1 Like

hi @stepra,

This is a warning message that may appear when the number of components is large and there is not much information to glean from, say, comp 3. Your performance results seem to suggest that you only need one component.

To add a second component, you can do this for graphical reasons, you just should not include any numerical / variable selection result from comp 2 if comp 2 adds more noise in the model, as it seems to be the case.

You are free to choose a list.keepX that starts a bit above 10, if you are not satisfied with the length of variable selection. It may just not be numerically optimal, but I suspect you have a small number of samples (since you used LOO CV), so even with keepX = 10 you are probably not completely optimal in terms of results. Also, based on this tuning you can decide on a larger selection size (e.g. keepX = 20) and then using the perf() function, assess the overall performance a posteriori.

Kim-Anh

2 Likes

Hello, I have encountered a similar problem and would like to ask for advice. I want to extract latent variables through block.splsda() for future research. My outcome variable categories are 2, with a total of 3 types of omics data. When ncomp=12, SGCCA begins to not converge, but my result improves with the increase of ncomp. For example, when ncomp=15, the result is better than that when ncomp=12. So can I ignore the warning: The SGCCA algorithm did not converge?
Thank you.

hi @ada

You can ignore the message, but it just means that the results form component 12 might be a bit unstable. I would not consider that many components in the analysis, as it would not reflect the ‘most important’ information after a few components already.

Kim-Anh

1 Like