I am having a question on how to choose the number of components based on the Q2 values. Below you see the Q2.total plot I got from the perf function (including a metabolomics and phosphoproteomics dataset). Based on the plot I thought to include 3 components, but I also read in another topic that components with negative Q2 values might not be predictive for other studies (but when I plot component 3 against component 1, it does seperate two groups in my data really well). And component one is above the 0.0975 line, is that a ‘bad’ thing? I read (in the same topic) that this has something to do with the significance, but that it is not that important for exploratory analysis, am I right?
I also have a question regarding the tune function for sPLS analysis. When I try to run this function I get the error:
Error in solve.default(t(Pmat) %*% Wmat) :
Lapack routine dgesv: system is exactly singular: U[2,2] = 0
I read that it could be due to too many components in the tune function or too many zeroes in the data, but with 1 component I already get this error, and when I tune in DIABLO with three datasets (including these two), it works (so, I do not think it is the zeroes). I thought it could also be due to high correlation between the datasets, is this true? And can I solve this, or would you recommend setting the numbers arbitrarily?