Hi, I am using DIABLO to determine whether sets of predictors from two data sets predict a binary outcome variable. Based an exploratory sPLSregression analysis set the correlation in the design matrix between my two data sets as 0.22. The Diablo solution is 3 components. The prediction in an external test set yields a BER of 8%.

The correlation between the first, second and third components from each dataset are .22, .01, and.18, respectfully.

How would one interpret the lack of correlation between the second components from each data set?

Thank you,

Jen

Any thoughts on this question?

hi @jlabus,

Copy pasting the answer we discussed offline:

It means DIABLO is unable to extract correlation (and correlated components) between most data sets. You can run a PLS and consider pairwise data sets to confirm this would be the case (as PLS is unsupervised and does not take into account the sample groups).

Since DIABLO (and even PLS) aim to maximise correlation, a < 0.2 – 0.4 (even if it increases on the third component) is very low.

Your results suggest that on the second component, discrimination between sample groups goes ‘against’ correlation. But the discrimination is not good either!

Your BER of 8% on an external data set is interesting. Have you tried predicting based on each data set individually? I wonder if you data sets are lowly correlated, but there is some signal for discrimination.

Kim-Anh